DO NOT REPLY [Bug 51664] [PATCH] Tagged PDF performance improvement + tests

2012-03-28 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=51664

Peter Hancock peter.hanc...@gmail.com changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 51664] [PATCH] Tagged PDF performance improvement + tests

2011-10-05 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=51664

--- Comment #4 from Chris Bowditch bowditch_ch...@hotmail.com 2011-10-05 
09:55:37 UTC ---
Thanks for the patch Mehdi. Looking at [1] I realised that you haven't yet
filed an ICLA. The ASF encourages all contributors to do so and it is a
pre-requisite to becoming a committer. Since you have filed  a number of
patches its about time you filed in the ICLA as described in [2]. Thanks

[1] http://people.apache.org/committer-index.html
[2] http://www.apache.org/licenses/

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 51664] [PATCH] Tagged PDF performance improvement + tests

2011-10-05 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=51664

--- Comment #5 from Mehdi Houshmand med1...@gmail.com 2011-10-05 10:38:18 UTC 
---
(In reply to comment #4)
 Thanks for the patch Mehdi. Looking at [1] I realised that you haven't yet
 filed an ICLA. The ASF encourages all contributors to do so and it is a
 pre-requisite to becoming a committer. Since you have filed  a number of
 patches its about time you filed in the ICLA as described in [2]. Thanks
 
 [1] http://people.apache.org/committer-index.html
 [2] http://www.apache.org/licenses/

I have faxed the appropriate documents.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 51664] [PATCH] Tagged PDF performance improvement + tests

2011-08-16 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=51664

--- Comment #1 from Mehdi Houshmand med1...@gmail.com 2011-08-16 13:45:56 UTC 
---
Created attachment 27391
  -- https://issues.apache.org/bugzilla/attachment.cgi?id=27391
patch

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 51664] [PATCH] Tagged PDF performance improvement + tests

2011-08-16 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=51664

Mehdi Houshmand med1...@gmail.com changed:

   What|Removed |Added

  Attachment #27391|0   |1
is obsolete||

--- Comment #2 from Mehdi Houshmand med1...@gmail.com 2011-08-16 14:11:13 UTC 
---
Created attachment 27392
  -- https://issues.apache.org/bugzilla/attachment.cgi?id=27392
More tests

This patch has been separated since it is purely unit tests.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 51664] [PATCH] Tagged PDF performance improvement + tests

2011-08-16 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=51664

--- Comment #3 from Jeremias Maerki jerem...@apache.org 2011-08-16 14:39:23 
UTC ---
Just some background on the problem:

It was found that enabling accessibility (tagged PDF) decreases PDF production
performance considerably.

I've profiled FOP with an FO file (about 10 pages). I ran both FO-PDF and
FO-IF-PDF scenarios to isolate the bulk of the lost time. It turns out that
the FO-IF stage doesn't lose a lot of performance due to the additional work.
So I concentrated on IF-PDF.

The VisualVM profiler highlighted PDFDocument.getWriterFor() and
BufferedOutputStream.flush() as hot spots in the accessibility case. Most of
that is caused by PDFDictionary, PDFArray and PDFName. And the strong weight on
these two is actually expected since Tagged PDF structures are all dictionaries
and arrays. Lots of them.

Look at the PDF sizes:
- Normal PDF: 105 KB (65 PDF Objects)
- Tagged PDF: 868 KB (6462 PDF Objects)

That's A LOT of additional content. All dictionaries and arrays that cannot be
compressed (in PDF 1.4). That also means a big increase in I/O output. So it's
in nature of tagged PDF that it must be considerably slower.

What I've tried now is to address the hot spot I found above. I got rid of the
Writers for encoding text output. Instead I switched to a StringBuilder that is
flushed to the OutputStream when necessary. That decreases the average
processing time after warm-up (IF-PDF case) from 775ms to 460ms (normal PDF
from 355ms to 325ms). That is a speed-up of:

(460 - 325) / (775 - 355) = 135 / 420 = 0.32 = -68%
So it cuts the tagged PDF penalty to a third.

That was the IF-PDF case. Here are the measurements for the FO-PDF case (the
same test document:

normal PDF: 772ms -- 712ms
tagged PDF: 1472ms -- 1042ms

normal PDF: 712 / 772 = 0.92 (-8%)
tagged PDF: 1042 / 1472 = 0.71 (-29%)
tagged PDF penalty: (1042 - 712) / (1472 - 772) = 330 / 700 = 0.47 (-53%)

There's a catch: This optimization requires a backwards-incompatible change in
the PDF library. The PDFWritable interface changes from
void outputInline(OutputStream out, Writer writer) throws IOException;
to
void outputInline(OutputStream out, StringBuilder textBuffer) throws
IOException;

The same applies to PDFObject.formatObject(). Both are very central parts of
the PDF library. It could invalidate pending patches or private additions from
third-parties. But it doesn't seem to be easy enough to write adapter code to
work around this.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.