RE: parsing xml

Jesse Pelton Tue, 06 Feb 2007 10:15:15 -0800

I forgot to respond to your specific question: if you want to ignore
certain text, check whether the node you're processing is a text node,
and if so, skip processing it if you determine you don't care about its
contents.  DOMNode::getNodeType() allows you to determine the type of
any given node (DOMNode::TEXT_NODE indicates text).  According to the
API documentation, DOMText::getWholeText() will give you the text of the
current text node and any logically adjacent text nodes.  (Under some
circumstances, logically contiguous text may be split up over multiple
text nodes.)


Alternatively, you can write a DTD or schema for your documents and let
Xerces sort out which nodes are white space in element content.  You'll
still need to check whether the node you're processing is a text node;
if it is, DOMText::getIsWhitespaceInElementContent() will tell you
whether it's white space in element content.

-----Original Message-----
From: Jesse Pelton [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 06, 2007 1:02 PM
To: [email protected]
Subject: RE: parsing xml

I'm a specification reader myself, so the only materials I can suggest
are a) the XML specification (in particular
http://www.w3.org/TR/2006/REC-xml-20060816/#sec-white-space), b) the DOM
specifications (see http://www.w3.org/DOM/DOMTR; I'd probably start with
DOM Level 2 Core), c) the Xerces API documentation
(http://xml.apache.org/xerces-c/api.html), and d) the Xerces sample
applications.

I'm sure there are good introductions to XML technologies, possibly
including Xerces, but I'm not familiar with them.  O'Reilly's books
(http://www.oreilly.com) seem to be generally well-regarded.

Be warned that XML's simple appearance is deceiving.  There's a lot to
know, and ignorance today can cost you dearly tomorrow.

-----Original Message-----
From: varun.81 [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 06, 2007 12:28 PM
To: [email protected]
Subject: RE: parsing xml


or if u can suggest me some material on xerces

varun.81 wrote:
> 
> So can you suggest me how i can discard white spaces even if it is
present
> in the document ?
> 
> Jesse Pelton wrote:
>> 
>> Ah.  Now we're talking.  If you put non-discardable whitespace into a
>> document, it will be included in the DOM hierarchy as text nodes.  In
>> your document, the "input" element has three child nodes: text
composed
>> entirely of whitespace, a "child" element with a single attribute
node,
>> and another whitespace text node.  Xerces is doing the proper thing.
>> 
>> I'd suggest reading an XML primer and/or reviewing the DOM
>> specification. 
>> 
>> -----Original Message-----
>> From: varun.81 [mailto:[EMAIL PROTECTED] 
>> Sent: Tuesday, February 06, 2007 12:00 PM
>> To: [email protected]
>> Subject: RE: parsing xml
>> 
>> 
>> when there are no spaces in my xml it gives me the proper
>> result.....root
>> element as input and number of child element as 1 but i give the xml
>> file as
>> proper tabs it gives me child elements as 3 and thus it fails. So is
>> there a
>> method to read the xml file with tabs?
>> 
>> Jesse Pelton wrote:
>>> 
>>> Attachments are allowed.  This message should have two attachments:
>> one
>>> is an input document constructed from your text (except the quotes),
>> the
>>> other is the output from DOMPrint when the first is parsed.
>>> 
>>> Does DOMPrint fail if you save either of these to disk and parse it?
>> If
>>> so, with what message? 
>>> 
>>> -----Original Message-----
>>> From: varun.81 [mailto:[EMAIL PROTECTED] 
>>> Sent: Tuesday, February 06, 2007 11:45 AM
>>> To: [email protected]
>>> Subject: RE: parsing xml
>>> 
>>> 
>>> "<input>
>>>         <child name="Varun">hello</child>
>>> </input>"
>>> 
>>> Motti Shneor-2 wrote:
>>>> 
>>>> Gladly. Only all my attachments were rejected by the list. Is it at
>>> all
>>>> allowed to attach files ?
>>>> 
>>>>  
>>>> 
>>>> Motti Shneor
>>>> Software Engineer
>>>> 
>>>> Orbograph Ltd.
>>>> P.O.Box 215, Yavne 81102, Israel
>>>> Tel: 972-8-9322257 ext. 230
>>>> Fax: 972-8-9328857
>>>> [EMAIL PROTECTED]
>>>>
>>
<mailto:[EMAIL PROTECTED]/omailto:[EMAIL PROTECTED]>
>>> 
>>>> http://www.orbograph.com
>>>> <http://www.orbograph.com/ohttp:/www.orbograph.com/> 
>>>> 
>>>>  
>>>> 
>>>> ________________________________
>>>> 
>>>> From: Jesse Pelton [mailto:[EMAIL PROTECTED] 
>>>> Sent: Tuesday, February 06, 2007 6:31 PM
>>>> To: [email protected]
>>>> Subject: RE: parsing xml
>>>> 
>>>>  
>>>> 
>>>> Could you attach a sample document that fails to parse? If you
embed
>>> it
>>>> in a message, it's subject to rearrangement or misinterpretation.
For
>>>> instance, my mail client displays a text box instead of your sample
>>>> document.
>>>> 
>>>>  
>>>> 
>>>> Also, please note whether sample apps like DOMParse parse the
>> document
>>>> successfully.
>>>> 
>>>>  
>>>> 
>>>> ________________________________
>>>> 
>>>> From: varun.81 [mailto:[EMAIL PROTECTED] 
>>>> Sent: Tuesday, February 06, 2007 11:20 AM
>>>> To: [email protected]
>>>> Subject: parsing xml
>>>> 
>>>> hi i am new to use this xerces tool.....i have to parse an xml a
very
>>>> simple one say  hello i am able to do it through an xml file, but
it
>>>> throws me an error when i try to give xml with proper indentetion
but
>>> it
>>>> works if i give xml without spaces between the tags. i will be
>> obliged
>>>> if someone can help me finding out why it fails with indentation
thnx
>>>> ps: i am writting code in C++ 
>>>> 
>>>> ________________________________
>>>> 
>>>> View this message in context: parsing xml
>>>> <http://www.nabble.com/parsing-xml-tf3181498.html#a8828911> 
>>>> Sent from the Xerces - C - Dev
>>>> <http://www.nabble.com/Xerces---C---Dev-f282.html>  mailing list
>>> archive
>>>> at Nabble.com.
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/parsing-xml-tf3181498.html#a8829402
>>> Sent from the Xerces - C - Dev mailing list archive at Nabble.com.
>>> 
>>> 
>>>
---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>> 
>>> 
>>> <?xml version="1.0" encoding="UTF-8" standalone="no" ?><input>
>>>         <child name="Varun">hello</child>
>>> </input>
>>> <input>
>>>         <child name="Varun">hello</child>
>>> </input>
>>> 
>>>
---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>> 
>> 
>> -- 
>> View this message in context:
>> http://www.nabble.com/parsing-xml-tf3181498.html#a8829735
>> Sent from the Xerces - C - Dev mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>> 
>> 
>> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/parsing-xml-tf3181498.html#a8830164
Sent from the Xerces - C - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: parsing xml

Reply via email to