Zer Jun Eng created PDFBOX-6030:
-----------------------------------
Summary: JPEGFactory: createImage and setOptimizeHuffmanTables
Key: PDFBOX-6030
URL: https://issues.apache.org/jira/browse/PDFBOX-6030
Project: PDFBox
Issue Type: Wish
Reporter: Zer Jun Eng
Attachments: zoo-711050_1920.jpg
Dear PDFBox developers,
I'm writing to request an enhancement to the JPEGFactory class, specifically
concerning the createFromImage(PDDocument document, BufferedImage image, float
quality, int dpi) method.
Currently, when using this method, there isn't a direct way to enable the
setOptimizeHuffmanTables option of JPEGImageWriteParam. This optimization can
be quite beneficial for reducing file size.
To work around this, my team currently has to copy the JPEGFactory source code
into our project and modify the private encodeImageToJPEGStream method. This
approach isn't ideal as it makes maintenance more difficult and prevents us
from easily updating to new PDFBox versions.
Would you consider exposing this setOptimizeHuffmanTables option, perhaps as an
additional parameter to the createFromImage method or through a separate setter
on JPEGFactory? This would allow users to leverage this optimization without
resorting to workarounds.
Thank you for considering this request.
—
Replying to the email thread:
https://lists.apache.org/thread/pgo0m1r8vgxd12zl3499fv38s163mpm3
I wrote a minimal benchmark code that compares the difference between the
output file size and execution time with and without setOptimizeHuffmanTables:
{code:java}
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.time.Duration;
import java.time.Instant;
import java.util.Iterator;
import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageTypeSpecifier;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;
import javax.imageio.metadata.IIOMetadata;
import javax.imageio.plugins.jpeg.JPEGImageWriteParam;
import javax.imageio.stream.ImageOutputStream;
import org.w3c.dom.Element;
class Huffman {
private static ImageWriter getJPEGImageWriter() throws IOException {
Iterator<ImageWriter> writers = ImageIO.getImageWritersBySuffix("jpeg");
while (writers.hasNext()) {
ImageWriter writer = writers.next();
if (writer == null) {
continue;
}
// PDFBOX-3566: avoid CLibJPEGImageWriter, which is not a
JPEGImageWriteParam
if (writer.getDefaultWriteParam() instanceof JPEGImageWriteParam) {
return writer;
}
writer.dispose();
}
throw new IOException("No ImageWriter found for JPEG format");
}
public static byte[] encodeImageToJPEGStream(BufferedImage image, float
quality, int dpi,
boolean optimizeHuffman)
throws IOException {
ImageWriter imageWriter = getJPEGImageWriter(); // find JAI writer
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (ImageOutputStream ios = ImageIO.createImageOutputStream(baos)) {
imageWriter.setOutput(ios);
// add compression
JPEGImageWriteParam jpegParam = (JPEGImageWriteParam)
imageWriter.getDefaultWriteParam();
jpegParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
jpegParam.setCompressionQuality(quality);
jpegParam.setOptimizeHuffmanTables(optimizeHuffman);
// add metadata
ImageTypeSpecifier imageTypeSpecifier = new ImageTypeSpecifier(image);
IIOMetadata data =
imageWriter.getDefaultImageMetadata(imageTypeSpecifier, jpegParam);
Element tree = (Element) data.getAsTree("javax_imageio_jpeg_image_1.0");
Element jfif = (Element) tree.getElementsByTagName("app0JFIF").item(0);
String dpiString = Integer.toString(dpi);
jfif.setAttribute("Xdensity", dpiString);
jfif.setAttribute("Ydensity", dpiString);
jfif.setAttribute("resUnits", "1"); // 1 = dots/inch
// write
imageWriter.write(data, new IIOImage(image, null, null), jpegParam);
return baos.toByteArray();
} finally {
imageWriter.dispose();
}
}
public static long benchmark(BufferedImage img, boolean optimizeHuffman)
throws IOException {
final float quality = 0.75f;
final int dpi = 72;
Instant i1 = Instant.now();
int length = encodeImageToJPEGStream(img, quality, dpi,
optimizeHuffman).length;
Instant i2 = Instant.now();
long executionTime = Duration.between(i1, i2).toMillis();
System.out.printf("optimize Huffman = %b: %d bytes, execution time %d ms%n",
optimizeHuffman, length, executionTime);
return executionTime;
}
public static void main(String[] args) throws IOException {
final int runs = 100;
long totalOptimizedExecutionTime = 0L;
long totalUnoptimizedExecutionTime = 0L;
BufferedImage img = ImageIO.read(new File("zoo-711050_1920.jpg"));
for (int i = 0; i < runs; i++) {
totalOptimizedExecutionTime += benchmark(img, true);
totalUnoptimizedExecutionTime += benchmark(img, false);
}
float avgOptimizedExecutionTime = (float) totalOptimizedExecutionTime /
runs;
float avgUnoptimizedExecutionTime = (float) totalUnoptimizedExecutionTime /
runs;
System.out.printf("Average optimized execution time: %f ms%n",
avgOptimizedExecutionTime);
System.out.printf("Average unoptimized execution time: %f ms%n",
avgUnoptimizedExecutionTime);
}
}
{code}
{code:sh}
...
optimize Huffman = true: 580768 bytes, execution time 192 ms
optimize Huffman = false: 589050 bytes, execution time 167 ms
Average optimized execution time: 192.729996 ms
Average unoptimized execution time: 167.929993 ms
{code}
I used an image I randomly picked from https://pixabay.com/ (attached below).
The results show that enabling setOptimizeHuffmanTables produces a slightly
smaller file size but takes longer to execute.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]