Properties Implementation and Canonical Mappings

John Austin Sat, 29 Nov 2003 12:25:25 -0800

In the interest of contributing (instead of just
trashing) to the proposed implementation, I wrote 
a simple Perl script to get some counts out of a 
real-world XSL-FO file.


Input: The XSL-FO file produced from a DocBook file
I have left from a dormant project. The perl program 
counts the number of properties in the source file.

PDF size:           130 Pages  // some users have a lot more
FO file size:       1.2M bytes
Properties:         22,815
Unique prop names:  89      // bounded by the spec
Unique prop values: 2,227   // bounded by the real world

Note that storing the property name and value refs supplied
to the Property constructor will use 45,620 strings. If the
Property implementation employs canonical mapping to ensure
that only one copy of each unique string is stored, then just
over 2,300 strings are required. 

The property strings are given to the Property object
constructor by some path beginning with a SAX parser.
It is reasonable to assume that the SAX parser loses
refs to most of these strings and that the Property
implementation retains the only references to these 
String objects.

How big are String Objects ? 
At least 16 bytes plus storage for characters. 

What does this save us ? 
Probably only about 1,600,000 bytes for this file. 
CPU cost of creating strings is probably similar to 
cost of checking string table for a copy.

What does it buy for us ?
Bounds a source of current Order(n) memory growth. 
It gets us in the habit of using another good technique.

I am all ready thinking along the lines of:
The property lists for these FO's are usually generated by
programs and will be the repeated many times. Perhaps we
could use larger, faster working Property Lists consolidated with
Canonical Mappings to save both time and space.

I am thinking again along the lines of handling properties more
like C++ virtual function table (vTable). This object is larger
than Peter's ordered Property array, but would be faster. 
That's a reason C++ has fast virtual function dispatching.
-- 
John Austin <[EMAIL PROTECTED]>

Properties Implementation and Canonical Mappings

Reply via email to