Thank you for the reply. I did some additional test on 2.7 and 2.8 to allocate 10k memory a time and came up with pretty close numbers for both parsing and without parsing. I believe fragmentation of the memory is the answer I am looking for. The other libraries used in my project requires allocation of big chuck of memory and I have to pay more attention to this.
PingShan Li -----Original Message----- From: Alberto Massari [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 17, 2007 3:42 PM To: [email protected] Subject: RE: question about xerces memory usage Hi, so when you say "iterations" you always refer to the std::string-building code? (with or without the previous parsing) The only thing that this test proves is that Xerces 2.8 generates less fragmentation in the heap, so it leaves a bigger contiguous chunk of memory (15 * 48Mb instead of 10 * 48Mb). That could be allowed by a change in the memory pooling code, that allocates 128Kb chunks instead of 64Kb. Is this what you wanted to test? Alberto At 14.19 17/10/2007 -0500, Li, PingShan \(Kansas City\) wrote: >Thank you for your email. > >I am posting the main function I modified in DOMCount.cpp to clarify >my question: > >1. I only parse the file using Xerces once. >2. All the initialization and release code is the same in >DOMCount.cpp, I did not change it. >3. The code I added is only trying to read from the file and append >to a string to get exception. > >My question is that there should be no difference in the iteration >number if I run the following code >and then comment out other code and just run the added code. But my >real point is that even I use >Xerces and release it as directed in the sample code provided by >xerces project, it may have impact on >the other part of my project on memory usage. > >Thanks > >PingShan Li > >int main(int argC, char* argV[]) >{ > // Check command line and extract arguments. > if (argC < 2) > { > usage(); > return 1; > } > > const char* xmlFile = 0; > AbstractDOMParser::ValSchemes valScheme = AbstractDOMParser::Val_Auto; > bool doNamespaces = false; > bool doSchema = false; > bool schemaFullChecking = false; > bool doList = false; > bool errorOccurred = false; > bool recognizeNEL = false; > bool printOutEncounteredEles = false; > char localeStr[64]; > memset(localeStr, 0, sizeof localeStr); > > int argInd; > for (argInd = 1; argInd < argC; argInd++) > { > // Break out on first parm not starting with a dash > if (argV[argInd][0] != '-') > break; > > // Watch for special case help request > if (!strcmp(argV[argInd], "-?")) > { > usage(); > return 2; > } > else if (!strncmp(argV[argInd], "-v=", 3) > || !strncmp(argV[argInd], "-V=", 3)) > { > const char* const parm = &argV[argInd][3]; > > if (!strcmp(parm, "never")) > valScheme = AbstractDOMParser::Val_Never; > else if (!strcmp(parm, "auto")) > valScheme = AbstractDOMParser::Val_Auto; > else if (!strcmp(parm, "always")) > valScheme = AbstractDOMParser::Val_Always; > else > { > XERCES_STD_QUALIFIER cerr << "Unknown -v= value: " > << parm << XERCES_STD_QUALIFIER >endl; > return 2; > } > } > else if (!strcmp(argV[argInd], "-n") > || !strcmp(argV[argInd], "-N")) > { > doNamespaces = true; > } > else if (!strcmp(argV[argInd], "-s") > || !strcmp(argV[argInd], "-S")) > { > doSchema = true; > } > else if (!strcmp(argV[argInd], "-f") > || !strcmp(argV[argInd], "-F")) > { > schemaFullChecking = true; > } > else if (!strcmp(argV[argInd], "-l") > || !strcmp(argV[argInd], "-L")) > { > doList = true; > } > else if (!strcmp(argV[argInd], "-special:nel")) > { > // turning this on will lead to non-standard compliance behaviour > // it will recognize the unicode character 0x85 as new > line character > // instead of regular character as specified in XML 1.0 > // do not turn this on unless really necessary > > recognizeNEL = true; > } > else if (!strcmp(argV[argInd], "-p") > || !strcmp(argV[argInd], "-P")) > { > printOutEncounteredEles = true; > } > else if (!strncmp(argV[argInd], "-locale=", 8)) > { > // Get out the end of line > strcpy(localeStr, &(argV[argInd][8])); > } > else > { > XERCES_STD_QUALIFIER cerr << "Unknown option '" << argV[argInd] > << "', ignoring it\n" << XERCES_STD_QUALIFIER endl; > } > } > > // > // There should be only one and only one parameter left, and that > // should be the file name. > // > if (argInd != argC - 1) > { > usage(); > return 1; > } > > // Initialize the XML4C system > try > { > if (strlen(localeStr)) > { > XMLPlatformUtils::Initialize(localeStr); > } > else > { > XMLPlatformUtils::Initialize(); > } > > if (recognizeNEL) > { > XMLPlatformUtils::recognizeNEL(recognizeNEL); > } > } > > catch (const XMLException& toCatch) > { > XERCES_STD_QUALIFIER cerr << "Error during initialization! :\n" > << StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl; > return 1; > } > > // Instantiate the DOM parser. > static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull }; > DOMImplementation *impl = > DOMImplementationRegistry::getDOMImplementation(gLS); > DOMBuilder *parser = >((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS, > >0); > > parser->setFeature(XMLUni::fgDOMNamespaces, doNamespaces); > parser->setFeature(XMLUni::fgXercesSchema, doSchema); > parser->setFeature(XMLUni::fgXercesSchemaFullChecking, > schemaFullChecking); > > if (valScheme == AbstractDOMParser::Val_Auto) > { > parser->setFeature(XMLUni::fgDOMValidateIfSchema, true); > } > else if (valScheme == AbstractDOMParser::Val_Never) > { > parser->setFeature(XMLUni::fgDOMValidation, false); > } > else if (valScheme == AbstractDOMParser::Val_Always) > { > parser->setFeature(XMLUni::fgDOMValidation, true); > } > > // enable datatype normalization - default is off > parser->setFeature(XMLUni::fgDOMDatatypeNormalization, true); > > // And create our error handler and install it > DOMCountErrorHandler errorHandler; > parser->setErrorHandler(&errorHandler); > > // > // Get the starting time and kick off the parse of the indicated > // file. Catch any exceptions that might propogate out of it. > // > unsigned long duration; > > bool more = true; > XERCES_STD_QUALIFIER ifstream fin; > > // the input is a list file > if (doList) > fin.open(argV[argInd]); > > if (fin.fail()) { > XERCES_STD_QUALIFIER cerr <<"Cannot open the list file: " > << argV[argInd] << >XERCES_STD_QUALIFIER endl; > return 2; > } > > while (more) > { > char fURI[1000]; > //initialize the array to zeros > memset(fURI,0,sizeof(fURI)); > > if (doList) { > if (! fin.eof() ) { > fin.getline (fURI, sizeof(fURI)); > if (!*fURI) > continue; > else { > xmlFile = fURI; > XERCES_STD_QUALIFIER cerr << "==Parsing== " << > xmlFile << XERCES_STD_QUALIFIER >endl; > } > } > else > break; > } > else { > xmlFile = argV[argInd]; > more = false; > } > > //reset error count first > errorHandler.resetErrors(); > > XERCES_CPP_NAMESPACE_QUALIFIER DOMDocument *doc = 0; > > try > { > // reset document pool > parser->resetDocumentPool(); > > const unsigned long startMillis = > XMLPlatformUtils::getCurrentMillis(); > doc = parser->parseURI(xmlFile); > const unsigned long endMillis = > XMLPlatformUtils::getCurrentMillis(); > duration = endMillis - startMillis; > } > > catch (const XMLException& toCatch) > { > XERCES_STD_QUALIFIER cerr << "\nError during parsing: > '" << xmlFile << "'\n" > << "Exception message is: \n" > << StrX(toCatch.getMessage()) << "\n" << > XERCES_STD_QUALIFIER endl; > errorOccurred = true; > continue; > } > catch (const DOMException& toCatch) > { > const unsigned int maxChars = 2047; > XMLCh errText[maxChars + 1]; > > XERCES_STD_QUALIFIER cerr << "\nDOM Error during > parsing: '" << xmlFile << "'\n" > << "DOMException code is: " << toCatch.code << > XERCES_STD_QUALIFIER endl; > > if > (DOMImplementation::loadDOMExceptionMsg(toCatch.code, errText, maxChars)) > XERCES_STD_QUALIFIER cerr << "Message is: " << > StrX(errText) << XERCES_STD_QUALIFIER >endl; > > errorOccurred = true; > continue; > } > catch (...) > { > XERCES_STD_QUALIFIER cerr << "\nUnexpected exception > during parsing: '" << xmlFile << >"'\n"; > errorOccurred = true; > continue; > } > > // > // Extract the DOM tree, get the list of all the elements > and report the > // length as the count of elements. > // > if (errorHandler.getSawErrors()) > { > XERCES_STD_QUALIFIER cout << "\nErrors occurred, no > output available\n" << >XERCES_STD_QUALIFIER endl; > errorOccurred = true; > } > else > { > unsigned int elementCount = 0; > if (doc) { > elementCount = > countChildElements((DOMNode*)doc->getDocumentElement(), >printOutEncounteredEles); > // test getElementsByTagName and getLength > XMLCh xa[] = {chAsterisk, chNull}; > if (elementCount != > doc->getElementsByTagName(xa)->getLength()) { > XERCES_STD_QUALIFIER cout << "\nErrors > occurred, element count is wrong\n" << >XERCES_STD_QUALIFIER endl; > errorOccurred = true; > } > } > > // Print out the stats that we collected and time taken. > XERCES_STD_QUALIFIER cout << xmlFile << ": " << > duration << " ms (" > << elementCount << " elems)." << XERCES_STD_QUALIFIER endl; > } > } > > // > // Delete the parser itself. Must be done prior to calling > Terminate, below. > // > parser->release(); > > // And call the termination method > XMLPlatformUtils::Terminate(); > > >///////////////////////////////////////////////////////////////////////////// > // Code added for testing > std::string test; > int stringSize( 0 ); > for ( int i = 0; i < 100; ++i ) > { > _sleep( 10 ); > std::cout << i << " " << stringSize << std::endl; > > FILE *hFile = hFile = fopen( "C:\\Test\\test.xml", "rb" ); > if ( hFile ) > { > // Get the file size so we can allocate our buffer. > fseek( hFile, 0, SEEK_END ); > const int nLength = ftell( hFile ); > fseek( hFile, 0, SEEK_SET ); > char* pszBuffer = new char[ nLength + 1 ]; > fread( pszBuffer, sizeof( char ), nLength, hFile ); > pszBuffer[ nLength ] = '\0'; > test += std::string( pszBuffer ); > stringSize += nLength; > delete [] pszBuffer; > fclose( hFile ); > } > } > // End of code added for testing > >//////////////////////////////////////////////////////////////////////////////// > > if (doList) > fin.close(); > > if (errorOccurred) > return 4; > else > return 0; >} > > > > >-----Original Message----- >From: Alberto Massari [mailto:[EMAIL PROTECTED] >Sent: Wednesday, October 17, 2007 1:54 PM >To: [email protected] >Subject: Re: question about xerces memory usage > >Hi, >I don't get the point of this experiment: if v 2.7 can only parse 10 >times the 48Mb file, while v 2.8 can do it for 15 times, it means 2.8 >uses less memory for the same DOM tree (as I guess you are not >releasing the DOM tree between the parse() operations, so keeping >them all in memory). As for your added code, you are concatenating >the input file in a std::string, so it makes sense that 16 times * >48Mb = 768Mb crashes your application (btw, the fact that you have >2Gb of memory doesn't imply that the program can find a contiguous >chunk of memory of 800Mb). > >Alberto > >At 13.38 17/10/2007 -0500, Li, PingShan \(Kansas City\) wrote: > >We use Xerces in our C++ project to load XML file as DOM tree. > > > >We have one question related to the memory usage of Xerces C++ > >version. I made small modification to > >the sample DOMCount project provided by Xerces to demonstrate the question. > > > >Operating system is Windows xp professional. Visual studio 2003 > >VC7.1 is used for the testing. > > > > > > > >The program is tested on a box with 2G RAM. > > > > > > > >Test.xml used in here is a 48M xml file. > > > >For xerces 2.7: > > > >If I add the following code to DOMCount.cpp, I can run 10 iterations > >before I got "out of memory" > >exception. But if I commented out other code and only run the added > >code, I can run up to 16 > >iterations before I got "out of memory" exception. I would expect after > >"XMLPlatformUtils::Terminate()" is called, there should be no > >difference on the number of iterations > >for the added code to get the "out of memory" exception. > > > >We used process explorer (HYPERLINK > >"http://www.microsoft.com/technet/sysinternals/utilities/processexp > lorer.mspx" > >\nhttp://www.microsoft.com/technet/sysinternals/utilities/processex > plorer.mspx) > >to help us find out > >the memory usage of the program. The only thing came to our > >attention is the virtual memory used by > >Xerces. Physical memory is released after > >XMLPlatformUtils::Terminate, but virtual memory stays at the > >same level. > > > >Then I think I can try the same code with Xerces 2.8. To my > >surprise, I can run up to 15 iterations > >before I got the out of memory exception. If I only run the added > >code, it will throw out of memory > >exception on the 16th iteration. > > > >Is there anything that the 2.7 user need to pay attention to? Could > >anybody please tell me why there > >is a difference on the number of iterations before I got the "out of > >memory" exception in 2.7? > > > >Thank you > > > >PingShan Li > > > > > > // > > // Delete the parser itself. Must be done prior to calling > > Terminate, below. > > // > > > > parser->release(); > > > > // And call the termination method > > XMLPlatformUtils::Terminate(); > > > > > > > >/////////////////////////////////////////////////////////////////// > ////////// > > // Code added for testing > > std::string test; > > int stringSize( 0 ); > > for ( int i = 0; i < 100; ++i ) > > { > > _sleep( 10 ); > > std::cout << i << " " << stringSize << std::endl; > > > > FILE *hFile = hFile = fopen( "C:\\Test\\test.xml", "rb" ); > > if ( hFile ) > > { > > // Get the file size so we can allocate our buffer. > > fseek( hFile, 0, SEEK_END ); > > const int nLength = ftell( hFile ); > > fseek( hFile, 0, SEEK_SET ); > > char* pszBuffer = new char[ nLength + 1 ]; > > fread( pszBuffer, sizeof( char ), nLength, hFile ); > > pszBuffer[ nLength ] = '\0'; > > test += std::string( pszBuffer ); > > stringSize += nLength; > > delete [] pszBuffer; > > fclose( hFile ); > > } > > } > > > >/////////////////////////////////////////////////////////////////// > ///////////// > > > > > > > > > > > > > >No virus found in this outgoing message. > >Checked by AVG Free Edition. > >Version: 7.5.488 / Virus Database: 269.14.13/1075 - Release Date: > >10/17/2007 9:38 AM > > > > >No virus found in this incoming message. >Checked by AVG Free Edition. >Version: 7.5.488 / Virus Database: 269.14.13/1075 - Release Date: >10/17/2007 9:38 AM > > >No virus found in this outgoing message. >Checked by AVG Free Edition. >Version: 7.5.488 / Virus Database: 269.14.13/1075 - Release Date: >10/17/2007 9:38 AM > No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.488 / Virus Database: 269.14.13/1075 - Release Date: 10/17/2007 9:38 AM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.488 / Virus Database: 269.15.0/1077 - Release Date: 10/18/2007 9:54 AM
