[
https://issues.apache.org/jira/browse/PDFBOX-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Doswald updated PDFBOX-3433:
------------------------------------
Attachment: pdfbox-performance-PDFBOX-3433.zip
PDFBOX-3433_Optimize_image_conversion_in_LosslessFactory_rev1.patch
The proposed patch (rev1) contains changes for LosslessFactory and COSStream.
* LosslessFactory: Read pixels line by line instead of pixel by pixel
* LosslessFactory: Pre-size buffer for grayscale images
* LosslessFactory: For RGB images create byte-buffer directly, without
ByteArrayOutputStream. This prevents unnecessary copying of the resulting data
array
* LosslessFactory: Pre-size the buffer for the FLATE_DECODE output
* COSStream: Overwrite the write(byte[],int,int) method for the
FilterOutputStreams created by the class. Otherwise the default implementation
loops over the byte array and calls write(int) for each byte
The attached JMH benchmark contains two methods to benchmark the speed of RGB
and B/W images. The performance numbers on my systems are as follows:
Desktop RGB:
OLD: PdfBoxBenchmark.convertImage avgt 129.281 ± 1.926 ms/op
NEW: PdfBoxBenchmark.convertImage avgt 106.143 ± 1.425 ms/op
Desktop B/W:
OLD: PdfBoxBenchmark.convertImageBW avgt 37.467 ± 0.516 ms/op
NEW: PdfBoxBenchmark.convertImageBW avgt 29.554 ± 1.176 ms/op
Embedded RGB:
OLD: PdfBoxBenchmark.convertImage avgt 1600.929 ± 12.577 ms/op
NEW: PdfBoxBenchmark.convertImage avgt 1126.266 ± 42.487 ms/op
Embedded B/W:
OLD: PdfBoxBenchmark.convertImageBW avgt 1011.356 ± 29.348 ms/op
NEW: PdfBoxBenchmark.convertImageBW avgt 975.063 ± 35.642 ms/op
Because the patch pre-sizes the buffers and prevents unneccessary copying the
allocation rate was also reduced (measurements from desktop):
OLD:
PdfBoxBenchmark.convertImage:·gc.alloc.rate avgt 352.563 ± 6.565
MB/sec
PdfBoxBenchmark.convertImage:·gc.alloc.rate.norm avgt 48880952.800 ±
243056.403 B/op
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate avgt 518.062 ±
9.545 MB/sec
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate.norm avgt 20213248.643 ±
215.935 B/op
NEW:
PdfBoxBenchmark.convertImage:·gc.alloc.rate avgt 153.795 ± 2.445
MB/sec
PdfBoxBenchmark.convertImage:·gc.alloc.rate.norm avgt 17121575.040 ±
108565.130 B/op
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate avgt 40.888 ±
0.594 MB/sec
PdfBoxBenchmark.convertImageBW:·gc.alloc.rate.norm avgt 1268892.004 ±
76947.484 B/op
I'm curious about your opinions.
> Optimize image conversion in LosslessFactory
> --------------------------------------------
>
> Key: PDFBOX-3433
> URL: https://issues.apache.org/jira/browse/PDFBOX-3433
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 2.0.2
> Environment: Ubuntu 14.04.4 LTS
> Reporter: Michael Doswald
> Priority: Trivial
> Labels: optimization, performance
> Attachments:
> PDFBOX-3433_Optimize_image_conversion_in_LosslessFactory_rev1.patch,
> pdfbox-performance-PDFBOX-3433.zip
>
>
> Conversion of BufferedImage objects into PDImageXObject objects could be
> optimized by
> * Pre-sizing the buffers
> * Reading whole lines of pixels instead of pixel-by-pixel
> * Prevent unnecessary copying of byte arrays
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]