> -----Original Message-----
> From: John Austin [mailto:[EMAIL PROTECTED]
>
> On Wed, 2003-12-17 at 15:56, J.Pietschmann wrote:
> > I've got a lot of ideas myself, perhaps too many. What the
> > project needs is *working* *code*.
>
> Amen!
>
> [but a short one, not drawn out like the final chorus of Messiah!]

Well, just helps if you have ideas to share these from time to time, whether
it's working code or not. It has proven to provide very interesting clues
when it comes to getting pointers on where to look for which particular
mechanism, and on how these could be improved, possibly at very low-level.

So to add some :

> -----Original Message-----
> From: J.Pietschmann [mailto:[EMAIL PROTECTED]
>
> I wondered why I got a OutOfMemory already during *parsing*...
>

Which is done by {which parser?} ... Just asking because on Xerces-J's
feature page
( http://xml.apache.org/xerces2-j/features.html ), I saw a little note on
'http://xml.org/sax/features/string-interning' (--with all the rant on
String.intern() a while ago, this _may_ provide a clue to some; or may well
be a well-known fact, perhaps already explored ).
Anyway, it defaults to 'true' for any parser that is derived from the Xerces
default parser (you can't even unset it)

Perhaps (--a long shot) the earlier attempts to try and use this blocked on
the internalizing being doubled in some way?

> ... In a real world file I benchmarked (rendered to 58 pages),
> the FO tree for the second page sequence run up to >70MB due to a table
with
> lots of small cells, which generated more than 80k FOs alone.

80k? For how many fo:* approx. in the file? Guess that's the counterweight
for verbosity mandated by the spec... a fo:block could consist of only one
node, an fo:table still takes at least five (six in the exotic case you
actually need to place some content in the cell, for testing purposes ;) )

Problem seems to be one of nested little objects, no longer 'needed', but
still referenced by their parent, which is still 'needed' --btw: What
exactly are the criteria by means of which it is possible to decide that a
given FO object, no matter how deeply nested, can safely be 'discarded' from
the tree? I mean not solely from the spec point-of-view: it would of course
be possible for an object to refer to another defined at the start of the
page-sequence, but does that necessarily mean having to keep a reference to
all of the latter object's descendants?

[Another option (--also a very long shot maybe) would be to try and, ahem,
_steal_ a little of the PDF approach... implement a form of (binary)
compression on the FO tree storage in memory? Since zipping objects already
has the known benefit of saving bandwidth, why not try and use it to reduce
footprint? Compress in static form, decompress the objects (and their
descendants) as-and-when they get needed by Layout/Area tree. In cases as
mentioned above this would already decrease memory fp by, what? 30%? Taking
into account you still have to have uncompressed instances of objects needed
by the other running processes (apart from tree building). Would it weigh up
to the processing cost?]

Just an idea...


Cheers,

Andreas

Reply via email to