> -----Original Message----- > From: John Austin [mailto:[EMAIL PROTECTED] > > On Wed, 2003-12-17 at 15:56, J.Pietschmann wrote: > > I've got a lot of ideas myself, perhaps too many. What the > > project needs is *working* *code*. > > Amen! > > [but a short one, not drawn out like the final chorus of Messiah!]
Well, just helps if you have ideas to share these from time to time, whether it's working code or not. It has proven to provide very interesting clues when it comes to getting pointers on where to look for which particular mechanism, and on how these could be improved, possibly at very low-level. So to add some : > -----Original Message----- > From: J.Pietschmann [mailto:[EMAIL PROTECTED] > > I wondered why I got a OutOfMemory already during *parsing*... > Which is done by {which parser?} ... Just asking because on Xerces-J's feature page ( http://xml.apache.org/xerces2-j/features.html ), I saw a little note on 'http://xml.org/sax/features/string-interning' (--with all the rant on String.intern() a while ago, this _may_ provide a clue to some; or may well be a well-known fact, perhaps already explored ). Anyway, it defaults to 'true' for any parser that is derived from the Xerces default parser (you can't even unset it) Perhaps (--a long shot) the earlier attempts to try and use this blocked on the internalizing being doubled in some way? > ... In a real world file I benchmarked (rendered to 58 pages), > the FO tree for the second page sequence run up to >70MB due to a table with > lots of small cells, which generated more than 80k FOs alone. 80k? For how many fo:* approx. in the file? Guess that's the counterweight for verbosity mandated by the spec... a fo:block could consist of only one node, an fo:table still takes at least five (six in the exotic case you actually need to place some content in the cell, for testing purposes ;) ) Problem seems to be one of nested little objects, no longer 'needed', but still referenced by their parent, which is still 'needed' --btw: What exactly are the criteria by means of which it is possible to decide that a given FO object, no matter how deeply nested, can safely be 'discarded' from the tree? I mean not solely from the spec point-of-view: it would of course be possible for an object to refer to another defined at the start of the page-sequence, but does that necessarily mean having to keep a reference to all of the latter object's descendants? [Another option (--also a very long shot maybe) would be to try and, ahem, _steal_ a little of the PDF approach... implement a form of (binary) compression on the FO tree storage in memory? Since zipping objects already has the known benefit of saving bandwidth, why not try and use it to reduce footprint? Compress in static form, decompress the objects (and their descendants) as-and-when they get needed by Layout/Area tree. In cases as mentioned above this would already decrease memory fp by, what? 30%? Taking into account you still have to have uncompressed instances of objects needed by the other running processes (apart from tree building). Would it weigh up to the processing cost?] Just an idea... Cheers, Andreas