[ 
https://issues.apache.org/jira/browse/PDFBOX-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Doswald updated PDFBOX-3433:
------------------------------------
    Attachment: pdfbox-performance-PDFBOX-3433.zip
                
PDFBOX-3433_Optimize_image_conversion_in_LosslessFactory_rev1.patch

The proposed patch (rev1) contains changes for LosslessFactory and COSStream. 

* LosslessFactory: Read pixels line by line instead of pixel by pixel
* LosslessFactory: Pre-size buffer for grayscale images
* LosslessFactory: For RGB images create byte-buffer directly, without 
ByteArrayOutputStream. This prevents unnecessary copying of the resulting data 
array
* LosslessFactory: Pre-size the buffer for the FLATE_DECODE output
* COSStream: Overwrite the write(byte[],int,int) method for the 
FilterOutputStreams created by the class. Otherwise the default implementation 
loops over the byte array and calls write(int) for each byte

The attached JMH benchmark contains two methods to benchmark the speed of RGB 
and B/W images. The performance numbers on my systems are as follows:

Desktop RGB:
OLD: PdfBoxBenchmark.convertImage    avgt   129.281 ± 1.926  ms/op
NEW: PdfBoxBenchmark.convertImage    avgt   106.143 ± 1.425  ms/op

Desktop B/W:
OLD: PdfBoxBenchmark.convertImageBW  avgt   37.467 ± 0.516  ms/op
NEW: PdfBoxBenchmark.convertImageBW  avgt   29.554 ± 1.176  ms/op

Embedded RGB:
OLD: PdfBoxBenchmark.convertImage    avgt   1600.929 ± 12.577  ms/op
NEW: PdfBoxBenchmark.convertImage    avgt   1126.266 ± 42.487  ms/op

Embedded B/W:
OLD: PdfBoxBenchmark.convertImageBW  avgt  1011.356 ± 29.348  ms/op
NEW: PdfBoxBenchmark.convertImageBW  avgt  975.063 ± 35.642  ms/op

Because the patch pre-sizes the buffers and prevents unneccessary copying the 
allocation rate was also reduced (measurements from desktop):

OLD:
PdfBoxBenchmark.convertImage:·gc.alloc.rate    avgt   352.563 ±        6.565  
MB/sec
PdfBoxBenchmark.convertImage:·gc.alloc.rate.norm   avgt  48880952.800 ±   
243056.403    B/op
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate      avgt   518.062 ±        
9.545  MB/sec
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate.norm   avgt  20213248.643 ±      
215.935    B/op

NEW: 
PdfBoxBenchmark.convertImage:·gc.alloc.rate     avgt   153.795 ±       2.445  
MB/sec
PdfBoxBenchmark.convertImage:·gc.alloc.rate.norm   avgt  17121575.040 ±  
108565.130    B/op
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate      avgt      40.888 ±       
0.594  MB/sec
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate.norm   avgt   1268892.004 ±   
76947.484    B/op

I'm curious about your opinions.

> Optimize image conversion in LosslessFactory
> --------------------------------------------
>
>                 Key: PDFBOX-3433
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3433
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel
>    Affects Versions: 2.0.2
>         Environment: Ubuntu 14.04.4 LTS
>            Reporter: Michael Doswald
>            Priority: Trivial
>              Labels: optimization, performance
>         Attachments: 
> PDFBOX-3433_Optimize_image_conversion_in_LosslessFactory_rev1.patch, 
> pdfbox-performance-PDFBOX-3433.zip
>
>
> Conversion of BufferedImage objects into PDImageXObject objects could be 
> optimized by
> * Pre-sizing the buffers
> * Reading whole lines of pixels instead of pixel-by-pixel
> * Prevent unnecessary copying of byte arrays



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to