RE: question about xerces memory usage

Li, PingShan (Kansas City) Thu, 18 Oct 2007 08:49:03 -0700

I still have one question in my mind. After I call the following code in the 
test program, the memory
used by Xerces should be marked as available for future use, even there is 
fragmented memory caused by
Xerses, it should have no effect on the code following "Terminate()" function. 
I know this is going to
the memory management of the OS level, but I would like to ask if anybody has 
answer for this.


     parser->release();
     // And call the termination method
     XMLPlatformUtils::Terminate();



-----Original Message-----
From: Li, PingShan (Kansas City) [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 18, 2007 10:32 AM
To: [email protected]
Subject: RE: question about xerces memory usage

Thank you for the reply.

I did some additional test on 2.7 and 2.8 to allocate 10k memory a time and 
came up with pretty close
numbers for both parsing and without parsing. I believe fragmentation of the 
memory is the answer I am
looking for. The other libraries used in my project requires allocation of big 
chuck of memory and I
have to pay more attention to this.

PingShan Li


-----Original Message-----
From: Alberto Massari [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 17, 2007 3:42 PM
To: [email protected]
Subject: RE: question about xerces memory usage

Hi,
so when you say "iterations" you always refer to the 
std::string-building code? (with or without the previous parsing)
The only thing that this test proves is that Xerces 2.8 generates 
less fragmentation in the heap, so it leaves a bigger contiguous 
chunk of memory (15 * 48Mb instead of 10 * 48Mb). That could be 
allowed by a change in the memory pooling code, that allocates 128Kb 
chunks instead of 64Kb.
Is this what you wanted to test?

Alberto

At 14.19 17/10/2007 -0500, Li, PingShan \(Kansas City\) wrote:
>Thank you for your email.
>
>I am posting the main function I modified in DOMCount.cpp to clarify 
>my question:
>
>1. I only parse the file using Xerces once.
>2. All the initialization and release code is the same in 
>DOMCount.cpp, I did not change it.
>3. The code I added is only trying to read from the file and append 
>to a string to get exception.
>
>My question is that there should be no difference in the iteration 
>number if I run the following code
>and then comment out other code and just run the added code. But my 
>real point is that even I use
>Xerces and release it as directed in the sample code provided by 
>xerces project, it may have impact on
>the other part of my project on memory usage.
>
>Thanks
>
>PingShan Li
>
>int main(int argC, char* argV[])
>{
>     // Check command line and extract arguments.
>     if (argC < 2)
>     {
>         usage();
>         return 1;
>     }
>
>     const char*                xmlFile = 0;
>     AbstractDOMParser::ValSchemes valScheme = AbstractDOMParser::Val_Auto;
>     bool                       doNamespaces       = false;
>     bool                       doSchema           = false;
>     bool                       schemaFullChecking = false;
>     bool                       doList = false;
>     bool                       errorOccurred = false;
>     bool                       recognizeNEL = false;
>     bool                       printOutEncounteredEles = false;
>     char                       localeStr[64];
>     memset(localeStr, 0, sizeof localeStr);
>
>     int argInd;
>     for (argInd = 1; argInd < argC; argInd++)
>     {
>         // Break out on first parm not starting with a dash
>         if (argV[argInd][0] != '-')
>             break;
>
>         // Watch for special case help request
>         if (!strcmp(argV[argInd], "-?"))
>         {
>             usage();
>             return 2;
>         }
>          else if (!strncmp(argV[argInd], "-v=", 3)
>               ||  !strncmp(argV[argInd], "-V=", 3))
>         {
>             const char* const parm = &argV[argInd][3];
>
>             if (!strcmp(parm, "never"))
>                 valScheme = AbstractDOMParser::Val_Never;
>             else if (!strcmp(parm, "auto"))
>                 valScheme = AbstractDOMParser::Val_Auto;
>             else if (!strcmp(parm, "always"))
>                 valScheme = AbstractDOMParser::Val_Always;
>             else
>             {
>                 XERCES_STD_QUALIFIER cerr << "Unknown -v= value: " 
> << parm << XERCES_STD_QUALIFIER
>endl;
>                 return 2;
>             }
>         }
>          else if (!strcmp(argV[argInd], "-n")
>               ||  !strcmp(argV[argInd], "-N"))
>         {
>             doNamespaces = true;
>         }
>          else if (!strcmp(argV[argInd], "-s")
>               ||  !strcmp(argV[argInd], "-S"))
>         {
>             doSchema = true;
>         }
>          else if (!strcmp(argV[argInd], "-f")
>               ||  !strcmp(argV[argInd], "-F"))
>         {
>             schemaFullChecking = true;
>         }
>          else if (!strcmp(argV[argInd], "-l")
>               ||  !strcmp(argV[argInd], "-L"))
>         {
>             doList = true;
>         }
>          else if (!strcmp(argV[argInd], "-special:nel"))
>         {
>             // turning this on will lead to non-standard compliance behaviour
>             // it will recognize the unicode character 0x85 as new 
> line character
>             // instead of regular character as specified in XML 1.0
>             // do not turn this on unless really necessary
>
>              recognizeNEL = true;
>         }
>          else if (!strcmp(argV[argInd], "-p")
>               ||  !strcmp(argV[argInd], "-P"))
>         {
>             printOutEncounteredEles = true;
>         }
>          else if (!strncmp(argV[argInd], "-locale=", 8))
>         {
>              // Get out the end of line
>              strcpy(localeStr, &(argV[argInd][8]));
>         }
>          else
>         {
>             XERCES_STD_QUALIFIER cerr << "Unknown option '" << argV[argInd]
>                  << "', ignoring it\n" << XERCES_STD_QUALIFIER endl;
>         }
>     }
>
>     //
>     //  There should be only one and only one parameter left, and that
>     //  should be the file name.
>     //
>     if (argInd != argC - 1)
>     {
>         usage();
>         return 1;
>     }
>
>     // Initialize the XML4C system
>     try
>     {
>         if (strlen(localeStr))
>         {
>             XMLPlatformUtils::Initialize(localeStr);
>         }
>         else
>         {
>             XMLPlatformUtils::Initialize();
>         }
>
>         if (recognizeNEL)
>         {
>             XMLPlatformUtils::recognizeNEL(recognizeNEL);
>         }
>     }
>
>     catch (const XMLException& toCatch)
>     {
>          XERCES_STD_QUALIFIER cerr << "Error during initialization! :\n"
>               << StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;
>          return 1;
>     }
>
>     // Instantiate the DOM parser.
>     static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
>     DOMImplementation *impl = 
> DOMImplementationRegistry::getDOMImplementation(gLS);
>     DOMBuilder        *parser =
>((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS,
> 
>0);
>
>     parser->setFeature(XMLUni::fgDOMNamespaces, doNamespaces);
>     parser->setFeature(XMLUni::fgXercesSchema, doSchema);
>     parser->setFeature(XMLUni::fgXercesSchemaFullChecking, 
> schemaFullChecking);
>
>     if (valScheme == AbstractDOMParser::Val_Auto)
>     {
>         parser->setFeature(XMLUni::fgDOMValidateIfSchema, true);
>     }
>     else if (valScheme == AbstractDOMParser::Val_Never)
>     {
>         parser->setFeature(XMLUni::fgDOMValidation, false);
>     }
>     else if (valScheme == AbstractDOMParser::Val_Always)
>     {
>         parser->setFeature(XMLUni::fgDOMValidation, true);
>     }
>
>     // enable datatype normalization - default is off
>     parser->setFeature(XMLUni::fgDOMDatatypeNormalization, true);
>
>     // And create our error handler and install it
>     DOMCountErrorHandler errorHandler;
>     parser->setErrorHandler(&errorHandler);
>
>     //
>     //  Get the starting time and kick off the parse of the indicated
>     //  file. Catch any exceptions that might propogate out of it.
>     //
>     unsigned long duration;
>
>     bool more = true;
>     XERCES_STD_QUALIFIER ifstream fin;
>
>     // the input is a list file
>     if (doList)
>         fin.open(argV[argInd]);
>
>     if (fin.fail()) {
>         XERCES_STD_QUALIFIER cerr <<"Cannot open the list file: " 
> << argV[argInd] <<
>XERCES_STD_QUALIFIER endl;
>         return 2;
>     }
>
>     while (more)
>     {
>         char fURI[1000];
>         //initialize the array to zeros
>         memset(fURI,0,sizeof(fURI));
>
>         if (doList) {
>             if (! fin.eof() ) {
>                 fin.getline (fURI, sizeof(fURI));
>                 if (!*fURI)
>                     continue;
>                 else {
>                     xmlFile = fURI;
>                     XERCES_STD_QUALIFIER cerr << "==Parsing== " << 
> xmlFile << XERCES_STD_QUALIFIER
>endl;
>                 }
>             }
>             else
>                 break;
>         }
>         else {
>             xmlFile = argV[argInd];
>             more = false;
>         }
>
>         //reset error count first
>         errorHandler.resetErrors();
>
>         XERCES_CPP_NAMESPACE_QUALIFIER DOMDocument *doc = 0;
>
>         try
>         {
>             // reset document pool
>             parser->resetDocumentPool();
>
>             const unsigned long startMillis = 
> XMLPlatformUtils::getCurrentMillis();
>             doc = parser->parseURI(xmlFile);
>             const unsigned long endMillis = 
> XMLPlatformUtils::getCurrentMillis();
>             duration = endMillis - startMillis;
>         }
>
>         catch (const XMLException& toCatch)
>         {
>             XERCES_STD_QUALIFIER cerr << "\nError during parsing: 
> '" << xmlFile << "'\n"
>                  << "Exception message is:  \n"
>                  << StrX(toCatch.getMessage()) << "\n" << 
> XERCES_STD_QUALIFIER endl;
>             errorOccurred = true;
>             continue;
>         }
>         catch (const DOMException& toCatch)
>         {
>             const unsigned int maxChars = 2047;
>             XMLCh errText[maxChars + 1];
>
>             XERCES_STD_QUALIFIER cerr << "\nDOM Error during 
> parsing: '" << xmlFile << "'\n"
>                  << "DOMException code is:  " << toCatch.code << 
> XERCES_STD_QUALIFIER endl;
>
>             if 
> (DOMImplementation::loadDOMExceptionMsg(toCatch.code, errText, maxChars))
>                  XERCES_STD_QUALIFIER cerr << "Message is: " << 
> StrX(errText) << XERCES_STD_QUALIFIER
>endl;
>
>             errorOccurred = true;
>             continue;
>         }
>         catch (...)
>         {
>             XERCES_STD_QUALIFIER cerr << "\nUnexpected exception 
> during parsing: '" << xmlFile <<
>"'\n";
>             errorOccurred = true;
>             continue;
>         }
>
>         //
>         //  Extract the DOM tree, get the list of all the elements 
> and report the
>         //  length as the count of elements.
>         //
>         if (errorHandler.getSawErrors())
>         {
>             XERCES_STD_QUALIFIER cout << "\nErrors occurred, no 
> output available\n" <<
>XERCES_STD_QUALIFIER endl;
>             errorOccurred = true;
>         }
>          else
>         {
>             unsigned int elementCount = 0;
>             if (doc) {
>                 elementCount = 
> countChildElements((DOMNode*)doc->getDocumentElement(),
>printOutEncounteredEles);
>                 // test getElementsByTagName and getLength
>                 XMLCh xa[] = {chAsterisk, chNull};
>                 if (elementCount != 
> doc->getElementsByTagName(xa)->getLength()) {
>                     XERCES_STD_QUALIFIER cout << "\nErrors 
> occurred, element count is wrong\n" <<
>XERCES_STD_QUALIFIER endl;
>                     errorOccurred = true;
>                 }
>             }
>
>             // Print out the stats that we collected and time taken.
>             XERCES_STD_QUALIFIER cout << xmlFile << ": " << 
> duration << " ms ("
>                  << elementCount << " elems)." << XERCES_STD_QUALIFIER endl;
>         }
>     }
>
>     //
>     //  Delete the parser itself.  Must be done prior to calling 
> Terminate, below.
>     //
>     parser->release();
>
>     // And call the termination method
>     XMLPlatformUtils::Terminate();
>
> 
>/////////////////////////////////////////////////////////////////////////////
>      // Code added for testing
>      std::string test;
>      int stringSize( 0 );
>      for ( int i = 0; i < 100; ++i )
>      {
>        _sleep( 10 );
>        std::cout << i << " " << stringSize << std::endl;
>
>        FILE *hFile = hFile = fopen( "C:\\Test\\test.xml", "rb" );
>        if ( hFile )
>        {
>          // Get the file size so we can allocate our buffer.
>          fseek( hFile, 0, SEEK_END );
>          const int nLength = ftell( hFile );
>          fseek( hFile, 0, SEEK_SET );
>          char* pszBuffer = new char[ nLength + 1 ];
>          fread( pszBuffer, sizeof( char ), nLength, hFile );
>          pszBuffer[ nLength ] = '\0';
>          test += std::string( pszBuffer );
>          stringSize += nLength;
>          delete [] pszBuffer;
>          fclose( hFile );
>        }
>      }
>      // End of code added for testing
> 
>////////////////////////////////////////////////////////////////////////////////
>
>     if (doList)
>         fin.close();
>
>     if (errorOccurred)
>         return 4;
>     else
>         return 0;
>}
>
>
>
>
>-----Original Message-----
>From: Alberto Massari [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, October 17, 2007 1:54 PM
>To: [email protected]
>Subject: Re: question about xerces memory usage
>
>Hi,
>I don't get the point of this experiment: if v 2.7 can only parse 10
>times the 48Mb file, while v 2.8 can do it for 15 times, it means 2.8
>uses less memory for the same DOM tree (as I guess you are not
>releasing the DOM tree between the parse() operations, so keeping
>them all in memory). As for your added code, you are concatenating
>the input file in a std::string, so it makes sense that 16 times *
>48Mb = 768Mb crashes your application (btw, the fact that you have
>2Gb of memory doesn't imply that the program can find a contiguous
>chunk of memory of 800Mb).
>
>Alberto
>
>At 13.38 17/10/2007 -0500, Li, PingShan \(Kansas City\) wrote:
> >We use Xerces in our C++ project to load XML file as DOM tree.
> >
> >We have one question related to the memory usage of Xerces C++
> >version. I made small modification to
> >the sample DOMCount project provided by Xerces to demonstrate the question.
> >
> >Operating system is Windows xp professional. Visual studio 2003
> >VC7.1 is used for the testing.
> >
> >
> >
> >The program is tested on a box with 2G RAM.
> >
> >
> >
> >Test.xml used in here is a 48M xml file.
> >
> >For xerces 2.7:
> >
> >If I add the following code to DOMCount.cpp, I can run 10 iterations
> >before I got "out of memory"
> >exception. But if I commented out other code and only run the added
> >code, I can run up to 16
> >iterations before I got "out of memory" exception. I would expect after
> >"XMLPlatformUtils::Terminate()" is called, there should be no
> >difference on the number of iterations
> >for the added code to get the "out of memory" exception.
> >
> >We used process explorer (HYPERLINK
> >"http://www.microsoft.com/technet/sysinternals/utilities/processexp 
> lorer.mspx"
> >\nhttp://www.microsoft.com/technet/sysinternals/utilities/processex 
> plorer.mspx)
> >to help us find out
> >the memory usage of the program. The only thing came to our
> >attention is the virtual memory used by
> >Xerces. Physical memory is released after
> >XMLPlatformUtils::Terminate, but virtual memory stays at the
> >same level.
> >
> >Then I think I can try the same code with Xerces 2.8. To my
> >surprise, I can run up to 15 iterations
> >before I got the out of memory exception. If I only run the added
> >code, it will throw out of memory
> >exception on the 16th iteration.
> >
> >Is there anything that the 2.7 user need to pay attention to? Could
> >anybody please tell me why there
> >is a difference on the number of iterations before I got the "out of
> >memory" exception in 2.7?
> >
> >Thank you
> >
> >PingShan Li
> >
> >
> >     //
> >     //  Delete the parser itself.  Must be done prior to calling
> > Terminate, below.
> >     //
> >
> >     parser->release();
> >
> >     // And call the termination method
> >     XMLPlatformUtils::Terminate();
> >
> >
> >
> >/////////////////////////////////////////////////////////////////// 
> //////////
> >     // Code added for testing
> >     std::string test;
> >     int stringSize( 0 );
> >     for ( int i = 0; i < 100; ++i )
> >     {
> >       _sleep( 10 );
> >       std::cout << i << " " << stringSize << std::endl;
> >
> >       FILE *hFile = hFile = fopen( "C:\\Test\\test.xml", "rb" );
> >       if ( hFile )
> >       {
> >         // Get the file size so we can allocate our buffer.
> >         fseek( hFile, 0, SEEK_END );
> >         const int nLength = ftell( hFile );
> >         fseek( hFile, 0, SEEK_SET );
> >         char* pszBuffer = new char[ nLength + 1 ];
> >         fread( pszBuffer, sizeof( char ), nLength, hFile );
> >         pszBuffer[ nLength ] = '\0';
> >         test += std::string( pszBuffer );
> >         stringSize += nLength;
> >         delete [] pszBuffer;
> >         fclose( hFile );
> >       }
> >     }
> >
> >/////////////////////////////////////////////////////////////////// 
> /////////////
> >
> >
> >
> >
> >
> >
> >No virus found in this outgoing message.
> >Checked by AVG Free Edition.
> >Version: 7.5.488 / Virus Database: 269.14.13/1075 - Release Date:
> >10/17/2007 9:38 AM
> >
>
>
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.5.488 / Virus Database: 269.14.13/1075 - Release Date: 
>10/17/2007 9:38 AM
>
>
>No virus found in this outgoing message.
>Checked by AVG Free Edition.
>Version: 7.5.488 / Virus Database: 269.14.13/1075 - Release Date: 
>10/17/2007 9:38 AM
>


No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.14.13/1075 - Release Date: 10/17/2007 
9:38 AM
 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.15.0/1077 - Release Date: 10/18/2007 
9:54 AM
 

No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.15.0/1077 - Release Date: 10/18/2007 
9:54 AM
 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.15.0/1077 - Release Date: 10/18/2007 
9:54 AM

RE: question about xerces memory usage

Reply via email to