Hi Alberto, yes, i'm still using the method startElement(). Is it better to use the method acceptNode() to reject the DATA node from the DOM or is there any other possibility?
Mirko -------- Original-Nachricht -------- > Datum: Fri, 04 Sep 2009 15:41:54 +0200 > Von: Alberto Massari <[email protected]> > An: [email protected] > Betreff: Re: method startElement() from class DOMLSParserFilter > Hi Mirko, > are you still using startElement()? That API would mess with the current > parent, so it would break the parsing at a certain point. > > Alberto > > Mirko Braun wrote: > > Hi Alberto, > > > > yes i'm sure that DATA is not a root node. I debugged a little bit. > > The exception occurs after the sixth time this DATA node was found. > > > > Mirko > > > > -------- Original-Nachricht -------- > > > >> Datum: Fri, 04 Sep 2009 14:21:15 +0200 > >> Von: Alberto Massari <[email protected]> > >> An: [email protected] > >> Betreff: Re: method startElement() from class DOMLSParserFilter > >> > > > > > >> Hi Mirko, > >> are you sure that your root node isn't one of those DATA elements? In > >> this case the document node would see more than one root element. > >> > >> Alberto > >> > >> Mirko Braun wrote: > >> > >>> Hi Alberto, > >>> > >>> thank you for you answer. I integrated the changes you > >>> suggested, but the result is still the same: > >>> > >>> DOM Error during parsing: > >>> > >>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>> DOMException code is: 3 > >>> Message is: attempt is made to insert a node where it is not permitted > >>> > >>> Best regards, > >>> Mirko > >>> > >>> -------- Original-Nachricht -------- > >>> > >>> > >>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200 > >>>> Von: Alberto Massari <[email protected]> > >>>> An: [email protected] > >>>> Betreff: Re: method startElement() from class DOMLSParserFilter > >>>> > >>>> > >>> > >>> > >>>> Hi Mirko, > >>>> I think the current implementation of the DOMLSParserFilter doesn't > >>>> > >> work > >> > >>>> nicely with your code, as the rejected nodes are not recycled and the > >>>> memory will grow to the same level as before. > >>>> Anyhow, you should instead override acceptNode like this: > >>>> > >>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > >>>> node) > >>>> { > >>>> // for element whose name is "DATA", skip it > >>>> if (node->getNodeType()==DOMNode::ELEMENT_NODE && > >>>> XMLString::compareString(node->getNodeName(), element_data)==0) > >>>> return DOMParserFilter::FILTER_REJECT; > >>>> else > >>>> return DOMParserFilter::FILTER_ACCEPT; > >>>> } > >>>> > >>>> Then, change DOMLSParserImpl::endElement to add a call to > >>>> origNode->release() after the call to removeChild(). > >>>> > >>>> Alberto > >>>> > >>>> > >>>> Mirko Braun wrote: > >>>> > >>>> > >>>>> Hello everybody, > >>>>> > >>>>> i would like to parse a quite large XML file (about 180 MB). > >>>>> I used the DOM interface because i need the tree for further > >>>>> processing of the data the xml file contains. Of course there > >>>>> is a lot of memory used during parsing the file and i got an > >>>>> "Out of memory" exception. > >>>>> > >>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc > C++ > >>>>> > >>>>> > >>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during > >>>> > >> parsing. > >> > >>>> > >>>> > >>>>> That is perfect for me because one XML-Element in my large file > >>>>> contains most of the data. This XML-Element is called DATA and > >>>>> appears serveral time in my XML file. > >>>>> So i had the idea to reject this XML-Element from the DOM tree > >>>>> during parsing to reduce the used memory by using the method > >>>>> startElement() of the DOMLSParserFilter class. After that i would > >>>>> use a SAX parser and just get all XML-Elements DATA with their > values. > >>>>> But it does not work. > >>>>> I integregated my code into the DOMPrint example which comes along > >>>>> with Xercesc C++ 3.0.1. The following error message occurred: > >>>>> > >>>>> DOM Error during parsing: > >>>>> > >>>>> > >> > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > >> > >>>> > >>>> > >>>>> DOMException code is: 3 > >>>>> Message is: attempt is made to insert a node where it is not > permitted > >>>>> > >>>>> > >>>>> Did i misunderstand the functionality of the DOMLSParserFilter class > >>>>> and its method startElement? > >>>>> It is possible to realize my idea with the help of this class? Did > >>>>> i something wrong with in my code (please have a look below)? > >>>>> > >>>>> I would be very grateful for any help. > >>>>> > >>>>> Thanks in advanced, > >>>>> Mirko > >>>>> > >>>>> > >>>>> DOMPrintFilter.hpp: > >>>>> -------------------- > >>>>> > >>>>> > >>>>> class DOMParserFilter : public DOMLSParserFilter { > >>>>> public: > >>>>> > >>>>> DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > >>>>> > >>>>> > >>>> DOMNodeFilter::SHOW_ALL); > >>>> > >>>> > >>>>> ~DOMParserFilter(){}; > >>>>> > >>>>> virtual FilterAction startElement(DOMElement* node); > >>>>> virtual FilterAction acceptNode(DOMNode* node){return > >>>>> > >>>>> > >>>> DOMParserFilter::FILTER_ACCEPT;}; > >>>> > >>>> > >>>>> virtual DOMNodeFilter::ShowType getWhatToShow() const {return > >>>>> > >>>>> > >>>> fWhatToShow;}; > >>>> > >>>> > >>>>> private: > >>>>> DOMNodeFilter::ShowType fWhatToShow; > >>>>> }; > >>>>> > >>>>> > >>>>> DOMPrintFilter.cpp: > >>>>> -------------------- > >>>>> > >>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > >>>>> :fWhatToShow(whatToShow) > >>>>> {} > >>>>> > >>>>> DOMParserFilter::FilterAction > >>>>> > >> DOMParserFilter::startElement(DOMElement* > >> > >>>>> > >>>>> > >>>> node) > >>>> > >>>> > >>>>> { > >>>>> // for element whose name is "DATA", skip it > >>>>> if (XMLString::compareString(node->getNodeName(), > element_data)==0) > >>>>> return DOMParserFilter::FILTER_REJECT; > >>>>> else > >>>>> return DOMParserFilter::FILTER_ACCEPT; > >>>>> } > >>>>> > >>>>> > >>>>> DOMPrint.cpp: > >>>>> --------------- > >>>>> > >>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, > >>>>> > >>>>> > >>>> xercesc::chNull }; > >>>> > >>>> > >>>>> xercesc::DOMImplementation *implParser = > >>>>> > >>>>> > >>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > >>>> > >>>> > >>>>> xercesc::DOMLSParser* parser = > >>>>> > >>>>> > >> > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, > 0); > >> > >>>> > >>>> > >>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > >>>>> > >>>>> > >> > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > >> > >>>>> > >>>>> > >>>> errReporter); > >>>> > >>>> > >>>>> > >>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > >>>>> parser->setFilter(pDOMParserFilter); > >>>>> > >>>>> > >>>>> // > >>>>> // Parse the XML file, catching any XML exceptions that might > >>>>> > >>>>> > >>>> propogate > >>>> > >>>> > >>>>> // out of it. > >>>>> // > >>>>> bool errorsOccured = false; > >>>>> DOMDocument *doc = NULL; > >>>>> > >>>>> try > >>>>> { > >>>>> doc = parser->parseURI(gXmlFile); > >>>>> } > >>>>> catch (const OutOfMemoryException&) > >>>>> { > >>>>> XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > >>>>> > >>>>> > >>>> XERCES_STD_QUALIFIER endl; > >>>> > >>>> > >>>>> errorsOccured = true; > >>>>> } > >>>>> catch (const XMLException& e) > >>>>> { > >>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>> > >> parsing\n > >> > >>>>> > >>>>> > >>>> Message: " > >>>> > >>>> > >>>>> << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > >>>>> errorsOccured = true; > >>>>> } > >>>>> > >>>>> catch (const DOMException& e) > >>>>> { > >>>>> const unsigned int maxChars = 2047; > >>>>> XMLCh errText[maxChars + 1]; > >>>>> > >>>>> XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" > << > >>>>> > >>>>> > >>>> gXmlFile << "'\n" > >>>> > >>>> > >>>>> << "DOMException code is: " << e.code << > >>>>> > >>>>> > >>>> XERCES_STD_QUALIFIER endl; > >>>> > >>>> > >>>>> if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > >>>>> > >>>>> > >>>> maxChars)) > >>>> > >>>> > >>>>> XERCES_STD_QUALIFIER cerr << "Message is: " << > >>>>> > >> StrX(errText) > >> > >>>>> > >>>>> > >>>> << XERCES_STD_QUALIFIER endl; > >>>> > >>>> > >>>>> errorsOccured = true; > >>>>> } > >>>>> > >>>>> catch (...) > >>>>> { > >>>>> XERCES_STD_QUALIFIER cerr << "An error occurred during > >>>>> > >> parsing\n > >> > >>>>> > >>>>> > >>>> " << XERCES_STD_QUALIFIER endl; > >>>> > >>>> > >>>>> errorsOccured = true; > >>>>> } > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > >
