Jason created PDFBOX-4514:
-----------------------------

             Summary: inefficient use of synchronized in PDICCBased.java
                 Key: PDFBOX-4514
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4514
             Project: PDFBox
          Issue Type: Bug
            Reporter: Jason


PDICCBased.java uses synchronized with static variable, e.g. synchronized (LOG) 
. It doesn't look to me it really needs to do it this way. This is very 
inefficient when multiple threads process different PDF at the same time. 
Change it to synchronized (this) will improve the performance.

[https://github.com/apache/pdfbox/blob/3b16f3b4f42c61dd5fe990c586f60465f83a8ef8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDICCBased.java#L191]

Sample code simulates multiple threads process different PDF at the same time:

 
{code:java}
public static void main(String[] args) throws IOException {
  for (int i = 0; i < 10; i++) { // just run multiple time
    doWork();
  }
}

private static void doWork() throws IOException {
  long startTime = System.currentTimeMillis();
  String pdfFilename = "<absolute path to your pdf file>"; // replace this with 
your test file
  System.setProperty("sun.java2d.cmm", 
"sun.java2d.cmm.kcms.KcmsServiceProvider");

  PDDocument document = PDDocument.load(new File(pdfFilename));
  List<PDDocument> pdfPages = new Splitter().split(document);

  Map<Integer, PDDocument> pdfPagesWithIndex = new HashMap<>();

  for (int i = 0; i < pdfPages.size(); i++) {
    pdfPagesWithIndex.put(i, pdfPages.get(i));
  }

  // multiple threads running in parallel
  pdfPagesWithIndex.entrySet().parallelStream().forEach(entry -> {
    try {
      processPDF(entry.getKey(), entry.getValue());
    } catch (Exception e) {
      System.out.println(e);
    }
  });

  System.out.println("Convertion time: " + (System.currentTimeMillis() - 
startTime));

  try {
    document.close();
  } catch (IOException ignored) {
  }
}

private static void processPDF(int index, PDDocument pdfPage) throws 
IOException {
  PDFRenderer renderer = new PDFRenderer(pdfPage);
  try {
    renderer.renderImageWithDPI(0, 180, ImageType.RGB);
  } catch (IOException e) {
    System.out.println(e);
  }

  try {
    pdfPage.close();
  } catch (IOException ignored) {
  }
}
{code}
I observed by changing synchronized (LOG) to synchronized (this), the above 
code can have maybe 20-30% reduction in latency. If I do a thread dump, I can 
see many threads are blocked on synchronized (LOG). 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to