Hi Alberto, thank you very much for your help. I integrated the patch in 3.0.1 and it worked. There is no exception any more. But there is still one problem. The usage of memory is still of the same size. I think if a node is rejected from the tree the usage of memory should also decrease. Is my conclusion correct?
Mirko -------- Original-Nachricht -------- > Datum: Fri, 04 Sep 2009 16:12:16 +0200 > Von: Alberto Massari <[email protected]> > An: [email protected] > Betreff: Re: method startElement() from class DOMLSParserFilter > In effect I am seeing so many problems with that code that the only > suggestion I have is to get the latest 3.0 from the trunk and work with > what I have just committed (or get the patch from > http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 > code). This version should support your original code. > > Alberto > > > Mirko Braun wrote: > > Hi Alberto, > > > > yes, i'm still using the method startElement(). Is it better > > to use the method acceptNode() to reject the DATA node from > > the DOM or is there any other possibility? > > > > Mirko > > > > > > -------- Original-Nachricht -------- > > > >> Datum: Fri, 04 Sep 2009 15:41:54 +0200 > >> Von: Alberto Massari <[email protected]> > >> An: [email protected] > >> Betreff: Re: method startElement() from class DOMLSParserFilter > >> > > > > > >> Hi Mirko, > >> are you still using startElement()? That API would mess with the > current > >> parent, so it would break the parsing at a certain point. > >> > >> Alberto > >> > >> Mirko Braun wrote: > >> > >>> Hi Alberto, > >>> > >>> yes i'm sure that DATA is not a root node. I debugged a little bit. > >>> The exception occurs after the sixth time this DATA node was found. > >>> > >>> Mirko > >>> > >>> -------- Original-Nachricht -------- > >>> > >>> > >>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200 > >>>> Von: Alberto Massari <[email protected]> > >>>> An: [email protected] > >>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>> > >>>> > >>> > >>> > >>>> Hi Mirko, > >>>> are you sure that your root node isn't one of those DATA elements? In > >>>> this case the document node would see more than one root element. > >>>> > >>>> Alberto > >>>> > >>>> Mirko Braun wrote: > >>>> > >>>> > >>>>> Hi Alberto, > >>>>> > >>>>> thank you for you answer. I integrated the changes you > >>>>> suggested, but the result is still the same: > >>>>> > >>>>> DOM Error during parsing: > >>>>> > >>>>> > >>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>> DOMException code is: 3 > >>>>> Message is: attempt is made to insert a node where it is not > permitted > >>>>> > >>>>> Best regards, > >>>>> Mirko > >>>>> > >>>>> -------- Original-Nachricht -------- > >>>>> > >>>>> > >>>>> > >>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 > >>>>>> Von: Alberto Massari <[email protected]> > >>>>>> An: [email protected] > >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Hi Mirko, > >>>>>> I think the current implementation of the DOMLSParserFilter doesn't > >>>>>> > >>>>>> > >>>> work > >>>> > >>>> > >>>>>> nicely with your code, as the rejected nodes are not recycled and > the > >>>>>> memory will grow to the same level as before. > >>>>>> Anyhow, you should instead override acceptNode like this: > >>>>>> > >>>>>> DOMParserFilter::FilterAction > DOMParserFilter::acceptNode(DOMElement* > >>>>>> node) > >>>>>> { > >>>>>> // for element whose name is "DATA", skip it > >>>>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && > >>>>>> XMLString::compareString(node->getNodeName(), element_data)==0) > >>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>> else > >>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>> } > >>>>>> > >>>>>> Then, change DOMLSParserImpl::endElement to add a call to > >>>>>> origNode->release() after the call to removeChild(). > >>>>>> > >>>>>> Alberto > >>>>>> > >>>>>> > >>>>>> Mirko Braun wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hello everybody, > >>>>>>> > >>>>>>> i would like to parse a quite large XML file (about 180 MB). > >>>>>>> I used the DOM interface because i need the tree for further > >>>>>>> processing of the data the xml file contains. Of course there > >>>>>>> is a lot of memory used during parsing the file and i got an > >>>>>>> "Out of memory" exception. > >>>>>>> > >>>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc > >>>>>>> > >> C++ > >> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during > >>>>>> > >>>>>> > >>>> parsing. > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> That is perfect for me because one XML-Element in my large file > >>>>>>> contains most of the data. This XML-Element is called DATA and > >>>>>>> appears serveral time in my XML file. > >>>>>>> So i had the idea to reject this XML-Element from the DOM tree > >>>>>>> during parsing to reduce the used memory by using the method > >>>>>>> startElement() of the DOMLSParserFilter class. After that i would > >>>>>>> use a SAX parser and just get all XML-Elements DATA with their > >>>>>>> > >> values. > >> > >>>>>>> But it does not work. > >>>>>>> I integregated my code into the DOMPrint example which comes along > >>>>>>> with Xercesc C++ 3.0.1. The following error message occurred: > >>>>>>> > >>>>>>> DOM Error during parsing: > >>>>>>> > >>>>>>> > >>>>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> DOMException code is: 3 > >>>>>>> Message is: attempt is made to insert a node where it is not > >>>>>>> > >> permitted > >> > >>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter > class > >>>>>>> and its method startElement? > >>>>>>> It is possible to realize my idea with the help of this class? Did > >>>>>>> i something wrong with in my code (please have a look below)? > >>>>>>> > >>>>>>> I would be very grateful for any help. > >>>>>>> > >>>>>>> Thanks in advanced, > >>>>>>> Mirko > >>>>>>> > >>>>>>> > >>>>>>> DOMPrintFilter.hpp: > >>>>>>> -------------------- > >>>>>>> > >>>>>>> > >>>>>>> class DOMParserFilter : public DOMLSParserFilter { > >>>>>>> public: > >>>>>>> > >>>>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> DOMNodeFilter::SHOW_ALL); > >>>>>> > >>>>>> > >>>>>> > >>>>>>> ~DOMParserFilter(){}; > >>>>>>> > >>>>>>> virtual FilterAction startElement(DOMElement* node); > >>>>>>> virtual FilterAction acceptNode(DOMNode* node){return > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> DOMParserFilter::FILTER_ACCEPT;}; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> fWhatToShow;}; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> private: > >>>>>>> DOMNodeFilter::ShowType fWhatToShow; > >>>>>>> }; > >>>>>>> > >>>>>>> > >>>>>>> DOMPrintFilter.cpp: > >>>>>>> -------------------- > >>>>>>> > >>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType > whatToShow) > >>>>>>> :fWhatToShow(whatToShow) > >>>>>>> {} > >>>>>>> > >>>>>>> DOMParserFilter::FilterAction > >>>>>>> > >>>>>>> > >>>> DOMParserFilter::startElement(DOMElement* > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> node) > >>>>>> > >>>>>> > >>>>>> > >>>>>>> { > >>>>>>> // for element whose name is "DATA", skip it > >>>>>>> if (XMLString::compareString(node->getNodeName(), > >>>>>>> > >> element_data)==0) > >> > >>>>>>> return DOMParserFilter::FILTER_REJECT; > >>>>>>> else > >>>>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> DOMPrint.cpp: > >>>>>>> --------------- > >>>>>>> > >>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, > xercesc::chLatin_S, > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> xercesc::chNull }; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> xercesc::DOMImplementation *implParser = > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > >>>>>> > >>>>>> > >>>>>> > >>>>>>> xercesc::DOMLSParser* parser = > >>>>>>> > >>>>>>> > >>>>>>> > >> > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, > 0); > >> > >>>> > >>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > >>>>>>> > >>>>>>> > >>>>>>> > >> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > >> > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> errReporter); > >>>>>> > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > >>>>>>> parser->setFilter(pDOMParserFilter); > >>>>>>> > >>>>>>> > >>>>>>> // > >>>>>>> // Parse the XML file, catching any XML exceptions that might > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> propogate > >>>>>> > >>>>>> > >>>>>> > >>>>>>> // out of it. > >>>>>>> // > >>>>>>> bool errorsOccured = false; > >>>>>>> DOMDocument *doc = NULL; > >>>>>>> > >>>>>>> try > >>>>>>> { > >>>>>>> doc = parser->parseURI(gXmlFile); > >>>>>>> } > >>>>>>> catch (const OutOfMemoryException&) > >>>>>>> { > >>>>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> XERCES_STD_QUALIFIER endl; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> errorsOccured = true; > >>>>>>> } > >>>>>>> catch (const XMLException& e) > >>>>>>> { > >>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>> > >>>>>>> > >>>> parsing\n > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Message: " > >>>>>> > >>>>>> > >>>>>> > >>>>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > >>>>>>> errorsOccured = true; > >>>>>>> } > >>>>>>> > >>>>>>> catch (const DOMException& e) > >>>>>>> { > >>>>>>> const unsigned int maxChars = 2047; > >>>>>>> XMLCh errText[maxChars + 1]; > >>>>>>> > >>>>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" > >>>>>>> > >> << > >> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> gXmlFile << "'\n" > >>>>>> > >>>>>> > >>>>>> > >>>>>>> << "DOMException code is: " << e.code << > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> XERCES_STD_QUALIFIER endl; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> maxChars)) > >>>>>> > >>>>>> > >>>>>> > >>>>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << > >>>>>>> > >>>>>>> > >>>> StrX(errText) > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> << XERCES_STD_QUALIFIER endl; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> errorsOccured = true; > >>>>>>> } > >>>>>>> > >>>>>>> catch (...) > >>>>>>> { > >>>>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>>>> > >>>>>>> > >>>> parsing\n > >>>> > >>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> " << XERCES_STD_QUALIFIER endl; > >>>>>> > >>>>>> > >>>>>> > >>>>>>> errorsOccured = true; > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > >
