Magnus Lycka wrote: > We're using DOM to create XML files that describes fairly > complex calculations. The XML is structured as a big tree, > where elements in the beginning have values that depend on > other values further down in the tree. Imagine something > like below, but much bigger and much more complex: > > <node sum="15"> > <node sum="10"> > <leaf>7</leaf'> > <node sum="3"> > <leaf>2</leaf'> > <leaf>1</leaf> > </node> > </node> > <node sum="5"> > <leaf>5</leaf> > </node> > </node> > > We have to stick with this XML structure for now. > > In some cases, building up a DOM tree in memory takes up > several GB of RAM, which is a real showstopper. The actual > file is maybe a magnitute smaller than the DOM tree. The > app is using libxml2. It's actually written in C++. Some > library that used much less memory overhead could be > sufficient. > > We've thought of writing a file that looks like this... > > <node sum="#1"> > <node sum="#1.1"> > <leaf>7</leaf'> > <node sum="#1.1.1"> > <leaf>2</leaf'> > <leaf>1</leaf> > </node> > </node> > <node sum="#1.2"> > <leaf>5</leaf> > </node> > </node> > > ...and store {"#": "15", "#1.1", "10" ... } in a map > and then read in a piece at a time and performs some > simple change and replace to get the correct values in. > Then we need something that allows parts of the XML file > to be written to file and purged from RAM to avoid the > memory problem. > > Suggestions for solutions are appreciated.
An idea. Put spaces in your names <node sum="#1 "> Store { "#1": <its offset in the result file> }. When you have collected all your final data, go throught your stored #, seek at their position in the file, and replace the data (writting same amount of chars than reserved). A kind of direct access to nodes in an XML document file. A+ Laurent. -- http://mail.python.org/mailman/listinfo/python-list