Jason created PDFBOX-4514:
-----------------------------
Summary: inefficient use of synchronized in PDICCBased.java
Key: PDFBOX-4514
URL: https://issues.apache.org/jira/browse/PDFBOX-4514
Project: PDFBox
Issue Type: Bug
Reporter: Jason
PDICCBased.java uses synchronized with static variable, e.g. synchronized (LOG)
. It doesn't look to me it really needs to do it this way. This is very
inefficient when multiple threads process different PDF at the same time.
Change it to synchronized (this) will improve the performance.
[https://github.com/apache/pdfbox/blob/3b16f3b4f42c61dd5fe990c586f60465f83a8ef8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDICCBased.java#L191]
Sample code simulates multiple threads process different PDF at the same time:
{code:java}
public static void main(String[] args) throws IOException {
for (int i = 0; i < 10; i++) { // just run multiple time
doWork();
}
}
private static void doWork() throws IOException {
long startTime = System.currentTimeMillis();
String pdfFilename = "<absolute path to your pdf file>"; // replace this with
your test file
System.setProperty("sun.java2d.cmm",
"sun.java2d.cmm.kcms.KcmsServiceProvider");
PDDocument document = PDDocument.load(new File(pdfFilename));
List<PDDocument> pdfPages = new Splitter().split(document);
Map<Integer, PDDocument> pdfPagesWithIndex = new HashMap<>();
for (int i = 0; i < pdfPages.size(); i++) {
pdfPagesWithIndex.put(i, pdfPages.get(i));
}
// multiple threads running in parallel
pdfPagesWithIndex.entrySet().parallelStream().forEach(entry -> {
try {
processPDF(entry.getKey(), entry.getValue());
} catch (Exception e) {
System.out.println(e);
}
});
System.out.println("Convertion time: " + (System.currentTimeMillis() -
startTime));
try {
document.close();
} catch (IOException ignored) {
}
}
private static void processPDF(int index, PDDocument pdfPage) throws
IOException {
PDFRenderer renderer = new PDFRenderer(pdfPage);
try {
renderer.renderImageWithDPI(0, 180, ImageType.RGB);
} catch (IOException e) {
System.out.println(e);
}
try {
pdfPage.close();
} catch (IOException ignored) {
}
}
{code}
I observed by changing synchronized (LOG) to synchronized (this), the above
code can have maybe 20-30% reduction in latency. If I do a thread dump, I can
see many threads are blocked on synchronized (LOG).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]