[
https://issues.apache.org/jira/browse/PDFBOX-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Doswald updated PDFBOX-3432:
------------------------------------
Attachment: pdfbox-performance-PDFBOX-3432.zip
PDFBOX-3432_Optimize_CID_to_GlyphId_mapping_rev1.patch
This is my proposed implementation of the IntIntMap class. The patch also
replaces the Map<Integer,Integer> instance variable from CmapSubtable.
The attached JMH benchmark simply parses the DejaVuSans.ttf font with the
TTFParser. With the simple changes to the CmapSubtable done so far, I've got
the following performance numbers:
Desktop
OLD: PdfBoxBenchmark.leadTTFFont avgt 6.326 ± 0.119 ms/op
NEW: PdfBoxBenchmark.leadTTFFont avgt 5.849 ± 0.156 ms/op
Embedded (i.MX6DL)
OLD: PdfBoxBenchmark.leadTTFFont avgt 65.112 ± 1.368 ms/op
NEW: PdfBoxBenchmark.leadTTFFont avgt 54.661 ± 2.402 ms/op
Since the code does no longer use autoboxing/unboxing, the allocation rate also
dropped (measurements from my desktop):
OLD:
PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate avgt 771.634 ± 18.420 MB/sec
PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate.norm avgt 5109556.121 ±
1020.975 B/op
NEW:
PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate avgt 506.081 ± 17.222
MB/sec
PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate.norm avgt 3117169.547 ±
7449.283 B/op
The potential for more optimizations of this kind is not fully exploited with
this patch. Some more areas that I could investigate (by just skimming the
code):
* CmapSubtable.getCharacterCode also returns a boxed Integer. This seems to be
used in PDCIDFontType2Embedder only and could also be done with a primitive int?
* PDCIDFontType2Embedder buildSubset also uses Map<Integer,Integer>
* There are a lot of map objects that map a Integer to an object. Implementing
a special mapping class for int to Object mappings (analog to IntIntMap) may
help here too
I'd be happy to hear your opinion on this patch and whether I should
investigate further.
Also: Is there a set of different fonts available to properly test all the
processSubtypeX methods in CmapSubtable? I currently work with DejaVu and the
test code in fontbox works with LiberationSans, I'm not sure if this tests all
the cases.
> Optimize CID to GlyphId mapping (TTF)
> -------------------------------------
>
> Key: PDFBOX-3432
> URL: https://issues.apache.org/jira/browse/PDFBOX-3432
> Project: PDFBox
> Issue Type: Improvement
> Components: FontBox
> Affects Versions: 2.0.2
> Environment: Ubuntu 14.04.4 LTS
> Reporter: Michael Doswald
> Priority: Trivial
> Labels: optimization, performance
> Attachments: PDFBOX-3432_Optimize_CID_to_GlyphId_mapping_rev1.patch,
> pdfbox-performance-PDFBOX-3432.zip
>
>
> TTF fonts map code-points (Code IDs) to glyphs. These are mappings from int
> to int. Because the JDK lacks map classes for primitive types, the code (e.g.
> in CmapSubtable) currently uses Map<Integer,Integer> for those mappings. This
> is inefficient in different ways:
> * Autoboxing/unboxing introduces a performance penalty
> * Boxing to Integer objects has a memory overhead
> * The JDK Map implementation has a big memory overhead for such simple objects
> For efficiency (execution time and memory consumption) I would propose to
> introduce a simple IntIntMap implementation which works with primitive
> integers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]