Yes - custom allocators OR static allocators will DEFINITELY help...

Leonard

On Nov 12, 2007, at 5:21 AM, Craig Ringer wrote:

> Hi
>
> I just instrumented PdfRefCountedBuffer to print out the visible  
> (used)
> size and underlying buffer size whenever it deallocates a buffer.
> Chucking that data into PostgreSQL (just 'cos that's what I'm used to)
> and running some quick stats on it, it was pretty easy to see that at
> least when parsing common content streams there's a very strong  
> tendency
> toward very small allocations (skip to the bottom if you don't care  
> for
> the details):
>
>   size   | % vis  | %alloc
> ---------+--------+--------
>        1 |   0.00 |   0.00
>        2 |   0.00 |   0.00
>        4 |  26.21 |  26.21
>        8 |  59.22 |  59.22
>       16 |  88.59 |  88.59
>       32 |  98.39 |  98.39
>       64 |  99.57 |  99.57
>      128 |  99.68 |  99.68
>      256 |  99.68 |  99.68
>      512 |  99.68 |  99.68
>     1024 |  99.68 |  99.68
>     2048 |  99.68 |  99.68
>     4096 |  99.84 |  99.84
>     8192 |  99.84 |  99.84
>    16384 |  99.84 |  99.84
>    32768 |  99.84 |  99.84
>    65536 |  99.84 |  99.84
>   131072 |  99.84 |  99.84
>   262144 |  99.84 |  99.84
>   524288 |  99.89 |  99.89
>  1048576 | 100.00 |  99.89
>  2097152 | 100.00 | 100.00
>
>> From the above, for parsing content streams nearly 60% of allocations
> are less than or equal to 8 bytes, and nearly 99% are lte 32 bytes.
>
> Stats for creationtest look pretty different:
>
>   size   | % vis | %alloc
> ---------+-------+--------
>        1 |  0.30 |   0.30
>        2 |  0.30 |   0.30
>        4 |  0.30 |   0.30
>        8 |  2.42 |   2.42
>       16 | 43.20 |  43.20
>       32 | 65.26 |  65.26
>       64 | 67.98 |  67.98
>      128 | 70.69 |  70.69
>      256 | 73.72 |  73.72
>      512 | 77.34 |  77.34
>     1024 | 80.36 |  80.36
>     2048 | 81.87 |  81.87
>     4096 | 85.20 |  85.20
>     8192 | 86.71 |  86.40
>    16384 | 89.12 |  89.12
>    32768 | 96.37 |  96.37
>    65536 | 97.58 |  97.58
>   131072 | 98.19 |  97.58
>   262144 | 99.70 |  98.49
>   524288 | 99.70 |  99.70
>  1048576 | 99.70 |  99.70
>  2097152 | 99.70 |  99.70
>
> in that it does lots of bigger allocations. Even so, 65% or so are  
> <= 32
> bytes.
>
> For ParserTest (reading PDF 1.4 reference, discarding result) the
> results are more weighted toward big allocations:
>
>   size   | % vis  | %alloc
> ---------+--------+--------
>        1 |   0.00 |   0.00
>        2 |   0.00 |   0.00
>        4 |   1.29 |   1.29
>        8 |   1.48 |   1.48
>       16 |   5.49 |   5.49
>       32 |  25.87 |  25.87
>       64 |  35.75 |  35.75
>      128 |  36.71 |  36.71
>      256 |  38.47 |  38.47
>      512 |  40.43 |  40.43
>     1024 |  45.54 |  45.54
>     2048 |  55.13 |  55.13
>     4096 |  91.60 |  91.60
>     8192 |  99.38 |  99.38
>    16384 |  99.62 |  99.62
>    32768 |  99.76 |  99.76
>    65536 |  99.86 |  99.86
>   131072 | 100.00 | 100.00
>   262144 | 100.00 | 100.00
>   524288 | 100.00 | 100.00
>  1048576 | 100.00 | 100.00
>  2097152 | 100.00 | 100.00
>
> ... but with the PDF 1.6 reference they tend to be tiny (why?):
>
>   size   | % vis  | %alloc
> ---------+--------+--------
>        1 |   0.00 |   0.00
>        2 |   0.00 |   0.00
>        4 |   0.04 |   0.04
>        8 |   1.32 |   1.32
>       16 |  92.83 |  92.83
>       32 |  93.40 |  93.40
>       64 |  95.43 |  95.43
>      128 |  96.81 |  96.81
>      256 |  97.14 |  97.14
>      512 |  97.67 |  97.67
>     1024 |  98.37 |  98.37
>     2048 |  98.82 |  98.82
>     4096 |  99.42 |  99.42
>     8192 |  99.94 |  99.94
>    16384 |  99.97 |  99.97
>    32768 |  99.99 |  99.99
>    65536 | 100.00 | 100.00
>   131072 | 100.00 | 100.00
>   262144 | 100.00 | 100.00
>   524288 | 100.00 | 100.00
>  1048576 | 100.00 | 100.00
>  2097152 | 100.00 | 100.00
>
> The stats don't change much when write-out of the parsed PDFs is  
> enabled.
>
> A parser run on one of the POST's pages:
>
>   size   | % vis  | %alloc
> ---------+--------+--------
>        1 |   0.00 |   0.00
>        2 |   0.00 |   0.00
>        4 |   0.00 |   0.00
>        8 |   3.09 |   3.09
>       16 |   9.28 |   9.28
>       32 |  30.93 |  30.93
>       64 |  40.21 |  40.21
>      128 |  46.39 |  46.39
>      256 |  50.52 |  50.52
>      512 |  63.92 |  63.92
>     1024 |  67.01 |  67.01
>     2048 |  73.20 |  73.20
>     4096 |  90.72 |  90.72
>     8192 |  93.81 |  93.81
>    16384 |  96.91 |  96.91
>    32768 |  97.94 |  97.94
>    65536 |  97.94 |  97.94
>   131072 |  97.94 |  97.94
>   262144 |  98.97 |  98.97
>   524288 |  98.97 |  98.97
>  1048576 | 100.00 | 100.00
>  2097152 | 100.00 | 100.00
>
> Even excluding the PDF 1.6 results, the aggregate still lands up with
> just over 60% of allocations under 32 bytes. When parsing content
> streams that'll almost always be enough memory, and when parsing  
> object
> data it'll still make a dent. For object data parsing, if the data
> doesn't fit in 32 bytes, it's often going to be big enough that 32  
> bytes
> here or there doesn't make much difference.
>
> Given that, I think it might well be worth giving
> PdfRefCountedBuffer::TRefCountedBuffer its own internal in-object  
> buffer
> for the first 32 or 64 bytes of object data. If it overran that, it'd
> allocate a replacement on the heap and switch to it. Since clients  
> can't
> safely assume the buffer start pointer remains the same across
> Resize(...) calls anyway, that's quite safe. We waste 32 bytes for big
> buffers, but those are not *that* common and the 32 bytes quickly
> becomes irrelevant. (If we *really* cared we could even save some  
> bytes
> by storing the char* buffer pointer as a union with the char[32]
> internal buffer, and selecting between them using the stored buffer
> size. Eeew!).
>
> Keeping a small in-object buffer avoids a whole pile of small heap
> allocations - which are expensive in terms of malloc()/free() cost and
> in memory fragmentation terms.
>
> Unfortunately, PdfRefCountedBuffer still has to allocate its private
> TRefCountedBuffer on the heap so it can be shared and have a lifetime
> independent of the PdfRefCountedBuffer instance. Since these are all
> uniform in size and lack any need for a dtor I almost wonder if it's
> worth using a simple custom allocator to store them in a single block
> or, for that matter, use a vector<TRefCountedBuffer> and a free bitmap
> behind the scenes.
>
> Anyone see any reason not to have a play with this and see if it  
> helps?
> Too complicated? Some obvious fault ?
>
> I've been idly wondering about PoDoFo's effect on memory fragmentation
> and the large number of allocations/deallocations from small  
> objects for
> a while. It's OK in short lived batch processes, but I suspect it'll
> become a problem for anything long-running. I'm also hoping we'll  
> get a
> boost out of this.
>
> --
> Craig Ringer
>
> ---------------------------------------------------------------------- 
> ---
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a  
> browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Podofo-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/podofo-users
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to