I forgot to respond to your specific question: if you want to ignore certain text, check whether the node you're processing is a text node, and if so, skip processing it if you determine you don't care about its contents. DOMNode::getNodeType() allows you to determine the type of any given node (DOMNode::TEXT_NODE indicates text). According to the API documentation, DOMText::getWholeText() will give you the text of the current text node and any logically adjacent text nodes. (Under some circumstances, logically contiguous text may be split up over multiple text nodes.)
Alternatively, you can write a DTD or schema for your documents and let Xerces sort out which nodes are white space in element content. You'll still need to check whether the node you're processing is a text node; if it is, DOMText::getIsWhitespaceInElementContent() will tell you whether it's white space in element content. -----Original Message----- From: Jesse Pelton [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 06, 2007 1:02 PM To: c-dev@xerces.apache.org Subject: RE: parsing xml I'm a specification reader myself, so the only materials I can suggest are a) the XML specification (in particular http://www.w3.org/TR/2006/REC-xml-20060816/#sec-white-space), b) the DOM specifications (see http://www.w3.org/DOM/DOMTR; I'd probably start with DOM Level 2 Core), c) the Xerces API documentation (http://xml.apache.org/xerces-c/api.html), and d) the Xerces sample applications. I'm sure there are good introductions to XML technologies, possibly including Xerces, but I'm not familiar with them. O'Reilly's books (http://www.oreilly.com) seem to be generally well-regarded. Be warned that XML's simple appearance is deceiving. There's a lot to know, and ignorance today can cost you dearly tomorrow. -----Original Message----- From: varun.81 [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 06, 2007 12:28 PM To: c-dev@xerces.apache.org Subject: RE: parsing xml or if u can suggest me some material on xerces varun.81 wrote: > > So can you suggest me how i can discard white spaces even if it is present > in the document ? > > Jesse Pelton wrote: >> >> Ah. Now we're talking. If you put non-discardable whitespace into a >> document, it will be included in the DOM hierarchy as text nodes. In >> your document, the "input" element has three child nodes: text composed >> entirely of whitespace, a "child" element with a single attribute node, >> and another whitespace text node. Xerces is doing the proper thing. >> >> I'd suggest reading an XML primer and/or reviewing the DOM >> specification. >> >> -----Original Message----- >> From: varun.81 [mailto:[EMAIL PROTECTED] >> Sent: Tuesday, February 06, 2007 12:00 PM >> To: c-dev@xerces.apache.org >> Subject: RE: parsing xml >> >> >> when there are no spaces in my xml it gives me the proper >> result.....root >> element as input and number of child element as 1 but i give the xml >> file as >> proper tabs it gives me child elements as 3 and thus it fails. So is >> there a >> method to read the xml file with tabs? >> >> Jesse Pelton wrote: >>> >>> Attachments are allowed. This message should have two attachments: >> one >>> is an input document constructed from your text (except the quotes), >> the >>> other is the output from DOMPrint when the first is parsed. >>> >>> Does DOMPrint fail if you save either of these to disk and parse it? >> If >>> so, with what message? >>> >>> -----Original Message----- >>> From: varun.81 [mailto:[EMAIL PROTECTED] >>> Sent: Tuesday, February 06, 2007 11:45 AM >>> To: c-dev@xerces.apache.org >>> Subject: RE: parsing xml >>> >>> >>> "<input> >>> <child name="Varun">hello</child> >>> </input>" >>> >>> Motti Shneor-2 wrote: >>>> >>>> Gladly. Only all my attachments were rejected by the list. Is it at >>> all >>>> allowed to attach files ? >>>> >>>> >>>> >>>> Motti Shneor >>>> Software Engineer >>>> >>>> Orbograph Ltd. >>>> P.O.Box 215, Yavne 81102, Israel >>>> Tel: 972-8-9322257 ext. 230 >>>> Fax: 972-8-9328857 >>>> [EMAIL PROTECTED] >>>> >> <mailto:[EMAIL PROTECTED]/omailto:[EMAIL PROTECTED]> >>> >>>> http://www.orbograph.com >>>> <http://www.orbograph.com/ohttp:/www.orbograph.com/> >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> From: Jesse Pelton [mailto:[EMAIL PROTECTED] >>>> Sent: Tuesday, February 06, 2007 6:31 PM >>>> To: c-dev@xerces.apache.org >>>> Subject: RE: parsing xml >>>> >>>> >>>> >>>> Could you attach a sample document that fails to parse? If you embed >>> it >>>> in a message, it's subject to rearrangement or misinterpretation. For >>>> instance, my mail client displays a text box instead of your sample >>>> document. >>>> >>>> >>>> >>>> Also, please note whether sample apps like DOMParse parse the >> document >>>> successfully. >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> From: varun.81 [mailto:[EMAIL PROTECTED] >>>> Sent: Tuesday, February 06, 2007 11:20 AM >>>> To: c-dev@xerces.apache.org >>>> Subject: parsing xml >>>> >>>> hi i am new to use this xerces tool.....i have to parse an xml a very >>>> simple one say hello i am able to do it through an xml file, but it >>>> throws me an error when i try to give xml with proper indentetion but >>> it >>>> works if i give xml without spaces between the tags. i will be >> obliged >>>> if someone can help me finding out why it fails with indentation thnx >>>> ps: i am writting code in C++ >>>> >>>> ________________________________ >>>> >>>> View this message in context: parsing xml >>>> <http://www.nabble.com/parsing-xml-tf3181498.html#a8828911> >>>> Sent from the Xerces - C - Dev >>>> <http://www.nabble.com/Xerces---C---Dev-f282.html> mailing list >>> archive >>>> at Nabble.com. >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/parsing-xml-tf3181498.html#a8829402 >>> Sent from the Xerces - C - Dev mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> <?xml version="1.0" encoding="UTF-8" standalone="no" ?><input> >>> <child name="Varun">hello</child> >>> </input> >>> <input> >>> <child name="Varun">hello</child> >>> </input> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >> >> -- >> View this message in context: >> http://www.nabble.com/parsing-xml-tf3181498.html#a8829735 >> Sent from the Xerces - C - Dev mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> > > -- View this message in context: http://www.nabble.com/parsing-xml-tf3181498.html#a8830164 Sent from the Xerces - C - Dev mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]