[
https://issues.apache.org/jira/browse/PDFBOX-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-6030:
------------------------------------
Attachment: PDFBOX-6030 (2).diff
> JPEGFactory: createImage and setOptimizeHuffmanTables
> -----------------------------------------------------
>
> Key: PDFBOX-6030
> URL: https://issues.apache.org/jira/browse/PDFBOX-6030
> Project: PDFBox
> Issue Type: Wish
> Affects Versions: 2.0.34, 3.0.5 PDFBox
> Reporter: Zer Jun Eng
> Priority: Minor
> Labels: JPEG, JPG, jpeg
> Fix For: 2.0.35, 3.0.6 PDFBox, 4.0.0
>
> Attachments: PDFBOX-6030 (2).diff, PDFBOX-6030.diff,
> zoo-711050_1920.jpg
>
>
> Dear PDFBox developers,
> I'm writing to request an enhancement to the JPEGFactory class, specifically
> concerning the createFromImage(PDDocument document, BufferedImage image,
> float quality, int dpi) method.
> Currently, when using this method, there isn't a direct way to enable the
> setOptimizeHuffmanTables option of JPEGImageWriteParam. This optimization can
> be quite beneficial for reducing file size.
> To work around this, my team currently has to copy the JPEGFactory source
> code into our project and modify the private encodeImageToJPEGStream method.
> This approach isn't ideal as it makes maintenance more difficult and prevents
> us from easily updating to new PDFBox versions.
> Would you consider exposing this setOptimizeHuffmanTables option, perhaps as
> an additional parameter to the createFromImage method or through a separate
> setter on JPEGFactory? This would allow users to leverage this optimization
> without resorting to workarounds.
> Thank you for considering this request.
> —
> Replying to the email thread:
> https://lists.apache.org/thread/pgo0m1r8vgxd12zl3499fv38s163mpm3
> I wrote a minimal benchmark code that compares the difference between the
> output file size and execution time with and without setOptimizeHuffmanTables:
> {code:java}
> import java.awt.image.BufferedImage;
> import java.io.ByteArrayOutputStream;
> import java.io.File;
> import java.io.IOException;
> import java.time.Duration;
> import java.time.Instant;
> import java.util.Iterator;
> import javax.imageio.IIOImage;
> import javax.imageio.ImageIO;
> import javax.imageio.ImageTypeSpecifier;
> import javax.imageio.ImageWriteParam;
> import javax.imageio.ImageWriter;
> import javax.imageio.metadata.IIOMetadata;
> import javax.imageio.plugins.jpeg.JPEGImageWriteParam;
> import javax.imageio.stream.ImageOutputStream;
> import org.w3c.dom.Element;
> class Huffman {
> private static ImageWriter getJPEGImageWriter() throws IOException {
> Iterator<ImageWriter> writers = ImageIO.getImageWritersBySuffix("jpeg");
> while (writers.hasNext()) {
> ImageWriter writer = writers.next();
> if (writer == null) {
> continue;
> }
> // PDFBOX-3566: avoid CLibJPEGImageWriter, which is not a
> JPEGImageWriteParam
> if (writer.getDefaultWriteParam() instanceof JPEGImageWriteParam) {
> return writer;
> }
> writer.dispose();
> }
> throw new IOException("No ImageWriter found for JPEG format");
> }
> public static byte[] encodeImageToJPEGStream(BufferedImage image, float
> quality, int dpi,
> boolean optimizeHuffman)
> throws IOException {
> ImageWriter imageWriter = getJPEGImageWriter(); // find JAI writer
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> try (ImageOutputStream ios = ImageIO.createImageOutputStream(baos)) {
> imageWriter.setOutput(ios);
> // add compression
> JPEGImageWriteParam jpegParam = (JPEGImageWriteParam)
> imageWriter.getDefaultWriteParam();
> jpegParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
> jpegParam.setCompressionQuality(quality);
> jpegParam.setOptimizeHuffmanTables(optimizeHuffman);
> // add metadata
> ImageTypeSpecifier imageTypeSpecifier = new ImageTypeSpecifier(image);
> IIOMetadata data =
> imageWriter.getDefaultImageMetadata(imageTypeSpecifier, jpegParam);
> Element tree = (Element) data.getAsTree("javax_imageio_jpeg_image_1.0");
> Element jfif = (Element) tree.getElementsByTagName("app0JFIF").item(0);
> String dpiString = Integer.toString(dpi);
> jfif.setAttribute("Xdensity", dpiString);
> jfif.setAttribute("Ydensity", dpiString);
> jfif.setAttribute("resUnits", "1"); // 1 = dots/inch
> // write
> imageWriter.write(data, new IIOImage(image, null, null), jpegParam);
> return baos.toByteArray();
> } finally {
> imageWriter.dispose();
> }
> }
> public static long benchmark(BufferedImage img, boolean optimizeHuffman)
> throws IOException {
> final float quality = 0.75f;
> final int dpi = 72;
> Instant i1 = Instant.now();
> int length = encodeImageToJPEGStream(img, quality, dpi,
> optimizeHuffman).length;
> Instant i2 = Instant.now();
> long executionTime = Duration.between(i1, i2).toMillis();
> System.out.printf("optimize Huffman = %b: %d bytes, execution time %d
> ms%n",
> optimizeHuffman, length, executionTime);
> return executionTime;
> }
> public static void main(String[] args) throws IOException {
> final int runs = 100;
> long totalOptimizedExecutionTime = 0L;
> long totalUnoptimizedExecutionTime = 0L;
> BufferedImage img = ImageIO.read(new File("zoo-711050_1920.jpg"));
> for (int i = 0; i < runs; i++) {
> totalOptimizedExecutionTime += benchmark(img, true);
> totalUnoptimizedExecutionTime += benchmark(img, false);
> }
>
> float avgOptimizedExecutionTime = (float) totalOptimizedExecutionTime /
> runs;
> float avgUnoptimizedExecutionTime = (float) totalUnoptimizedExecutionTime
> / runs;
> System.out.printf("Average optimized execution time: %f ms%n",
> avgOptimizedExecutionTime);
> System.out.printf("Average unoptimized execution time: %f ms%n",
> avgUnoptimizedExecutionTime);
> }
> }
> {code}
> {code:sh}
> ...
> optimize Huffman = true: 580768 bytes, execution time 192 ms
> optimize Huffman = false: 589050 bytes, execution time 167 ms
> Average optimized execution time: 192.729996 ms
> Average unoptimized execution time: 167.929993 ms
> {code}
> I used an image I randomly picked from https://pixabay.com/ (attached below).
> The results show that enabling setOptimizeHuffmanTables produces a slightly
> smaller file size but takes longer to execute.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]