Hi

I just instrumented PdfRefCountedBuffer to print out the visible (used)
size and underlying buffer size whenever it deallocates a buffer.
Chucking that data into PostgreSQL (just 'cos that's what I'm used to)
and running some quick stats on it, it was pretty easy to see that at
least when parsing common content streams there's a very strong tendency
toward very small allocations (skip to the bottom if you don't care for
the details):

  size   | % vis  | %alloc
---------+--------+--------
       1 |   0.00 |   0.00
       2 |   0.00 |   0.00
       4 |  26.21 |  26.21
       8 |  59.22 |  59.22
      16 |  88.59 |  88.59
      32 |  98.39 |  98.39
      64 |  99.57 |  99.57
     128 |  99.68 |  99.68
     256 |  99.68 |  99.68
     512 |  99.68 |  99.68
    1024 |  99.68 |  99.68
    2048 |  99.68 |  99.68
    4096 |  99.84 |  99.84
    8192 |  99.84 |  99.84
   16384 |  99.84 |  99.84
   32768 |  99.84 |  99.84
   65536 |  99.84 |  99.84
  131072 |  99.84 |  99.84
  262144 |  99.84 |  99.84
  524288 |  99.89 |  99.89
 1048576 | 100.00 |  99.89
 2097152 | 100.00 | 100.00

>From the above, for parsing content streams nearly 60% of allocations
are less than or equal to 8 bytes, and nearly 99% are lte 32 bytes.

Stats for creationtest look pretty different:

  size   | % vis | %alloc
---------+-------+--------
       1 |  0.30 |   0.30
       2 |  0.30 |   0.30
       4 |  0.30 |   0.30
       8 |  2.42 |   2.42
      16 | 43.20 |  43.20
      32 | 65.26 |  65.26
      64 | 67.98 |  67.98
     128 | 70.69 |  70.69
     256 | 73.72 |  73.72
     512 | 77.34 |  77.34
    1024 | 80.36 |  80.36
    2048 | 81.87 |  81.87
    4096 | 85.20 |  85.20
    8192 | 86.71 |  86.40
   16384 | 89.12 |  89.12
   32768 | 96.37 |  96.37
   65536 | 97.58 |  97.58
  131072 | 98.19 |  97.58
  262144 | 99.70 |  98.49
  524288 | 99.70 |  99.70
 1048576 | 99.70 |  99.70
 2097152 | 99.70 |  99.70

in that it does lots of bigger allocations. Even so, 65% or so are <= 32
bytes.

For ParserTest (reading PDF 1.4 reference, discarding result) the
results are more weighted toward big allocations:

  size   | % vis  | %alloc
---------+--------+--------
       1 |   0.00 |   0.00
       2 |   0.00 |   0.00
       4 |   1.29 |   1.29
       8 |   1.48 |   1.48
      16 |   5.49 |   5.49
      32 |  25.87 |  25.87
      64 |  35.75 |  35.75
     128 |  36.71 |  36.71
     256 |  38.47 |  38.47
     512 |  40.43 |  40.43
    1024 |  45.54 |  45.54
    2048 |  55.13 |  55.13
    4096 |  91.60 |  91.60
    8192 |  99.38 |  99.38
   16384 |  99.62 |  99.62
   32768 |  99.76 |  99.76
   65536 |  99.86 |  99.86
  131072 | 100.00 | 100.00
  262144 | 100.00 | 100.00
  524288 | 100.00 | 100.00
 1048576 | 100.00 | 100.00
 2097152 | 100.00 | 100.00

... but with the PDF 1.6 reference they tend to be tiny (why?):

  size   | % vis  | %alloc
---------+--------+--------
       1 |   0.00 |   0.00
       2 |   0.00 |   0.00
       4 |   0.04 |   0.04
       8 |   1.32 |   1.32
      16 |  92.83 |  92.83
      32 |  93.40 |  93.40
      64 |  95.43 |  95.43
     128 |  96.81 |  96.81
     256 |  97.14 |  97.14
     512 |  97.67 |  97.67
    1024 |  98.37 |  98.37
    2048 |  98.82 |  98.82
    4096 |  99.42 |  99.42
    8192 |  99.94 |  99.94
   16384 |  99.97 |  99.97
   32768 |  99.99 |  99.99
   65536 | 100.00 | 100.00
  131072 | 100.00 | 100.00
  262144 | 100.00 | 100.00
  524288 | 100.00 | 100.00
 1048576 | 100.00 | 100.00
 2097152 | 100.00 | 100.00

The stats don't change much when write-out of the parsed PDFs is enabled.

A parser run on one of the POST's pages:

  size   | % vis  | %alloc
---------+--------+--------
       1 |   0.00 |   0.00
       2 |   0.00 |   0.00
       4 |   0.00 |   0.00
       8 |   3.09 |   3.09
      16 |   9.28 |   9.28
      32 |  30.93 |  30.93
      64 |  40.21 |  40.21
     128 |  46.39 |  46.39
     256 |  50.52 |  50.52
     512 |  63.92 |  63.92
    1024 |  67.01 |  67.01
    2048 |  73.20 |  73.20
    4096 |  90.72 |  90.72
    8192 |  93.81 |  93.81
   16384 |  96.91 |  96.91
   32768 |  97.94 |  97.94
   65536 |  97.94 |  97.94
  131072 |  97.94 |  97.94
  262144 |  98.97 |  98.97
  524288 |  98.97 |  98.97
 1048576 | 100.00 | 100.00
 2097152 | 100.00 | 100.00

Even excluding the PDF 1.6 results, the aggregate still lands up with
just over 60% of allocations under 32 bytes. When parsing content
streams that'll almost always be enough memory, and when parsing object
data it'll still make a dent. For object data parsing, if the data
doesn't fit in 32 bytes, it's often going to be big enough that 32 bytes
here or there doesn't make much difference.

Given that, I think it might well be worth giving
PdfRefCountedBuffer::TRefCountedBuffer its own internal in-object buffer
for the first 32 or 64 bytes of object data. If it overran that, it'd
allocate a replacement on the heap and switch to it. Since clients can't
safely assume the buffer start pointer remains the same across
Resize(...) calls anyway, that's quite safe. We waste 32 bytes for big
buffers, but those are not *that* common and the 32 bytes quickly
becomes irrelevant. (If we *really* cared we could even save some bytes
by storing the char* buffer pointer as a union with the char[32]
internal buffer, and selecting between them using the stored buffer
size. Eeew!).

Keeping a small in-object buffer avoids a whole pile of small heap
allocations - which are expensive in terms of malloc()/free() cost and
in memory fragmentation terms.

Unfortunately, PdfRefCountedBuffer still has to allocate its private
TRefCountedBuffer on the heap so it can be shared and have a lifetime
independent of the PdfRefCountedBuffer instance. Since these are all
uniform in size and lack any need for a dtor I almost wonder if it's
worth using a simple custom allocator to store them in a single block
or, for that matter, use a vector<TRefCountedBuffer> and a free bitmap
behind the scenes.

Anyone see any reason not to have a play with this and see if it helps?
Too complicated? Some obvious fault ?

I've been idly wondering about PoDoFo's effect on memory fragmentation
and the large number of allocations/deallocations from small objects for
a while. It's OK in short lived batch processes, but I suspect it'll
become a problem for anything long-running. I'm also hoping we'll get a
boost out of this.

--
Craig Ringer

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to