[
https://issues.apache.org/jira/browse/PDFBOX-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078282#comment-18078282
]
Maruan Sahyoun commented on PDFBOX-6199:
----------------------------------------
TLDR - accept a 38% performance regression for correctness, readability, and
maintainability in {{{}GenericRefinmentRegionDecodingProdure{}}}.
*Background*
The {{GenericRefinementRegionDecodingProcedure}} class implements the JBIG2
Generic Refinement Region decoding procedure (ITU-T T.88 §6.3.5.6). Two bugs
were identified in the template 1 decoding paths that caused corrupted output
for JBIG2 files using template 1 with typical prediction (TPGRON=1).
*Bugs Fixed*
* Wrong SLTP context constant for template 1
{{SLTP_CONTEXT_TEMPLATE1}} was {{0x080}} instead of the correct {{0x008}}
(§6.3.5.6, Figure 15). This caused the typical prediction line flag to be
decoded using the wrong arithmetic coding context.
* Broken TPGR path for template 1
{{decodeTypicalPredictedLineTemplate1}} had a {{grReferenceValue}} drift bug —
the variable was not being re-anchored to the freshly loaded byte window
variables at each byte boundary. This caused incorrect typical prediction
decisions for all pixels beyond the first byte of each row.
*Fix*
Both template 1 decoding paths were replaced with a readable pixel-by-pixel
implementation directly verifiable against Figure 13 of the spec:
* {{decodeLineExplicitT1}} — replaces the explicit decode path (LTP=0)
* {{decodeLineTPGRT1}} — replaces the TPGR path (LTP=1)
* {{buildContextT1}} — forms the 10-bit arithmetic coding context by directly
reading the 10 pixel positions defined in Figure 13
The pixel accessors {{{}getRegionBit{}}}, {{{}getReferenceBit{}}}, and
{{getPixelSafe}} provide a clean separation between coordinate transformation,
bounds handling, and bit extraction.
*Performance Impact*
Benchmarked using JMH against a captured real-world refinement region
(1728×2339, template 1, TPGRON=false) from the Serenity JBIG2 test suite:
|Version|Score|vs original|
|Original (buggy)|3.32 ms/op|baseline|
|Fixed pixel-by-pixel|4.57 ms/op|+38%|
The 38% regression is the cost of correctness and maintainability. The original
code's performance came from an optimized sliding byte window that amortized
bitmap access across 8 pixels per byte fetch. The pixel-by-pixel replacement
pays full access cost per pixel.
*Performance Recovery Investigation*
A {{SlidingBitmapWindow}} class was designed, implemented, and benchmarked as a
potential performance recovery mechanism. Key findings:
* With bounds-checked {{{}getBit(){}}}: 73% slower than {{getPixelSafe}} for
pure pixel access
* With bounds-check-free {{getBitFast()}} for interior pixels: 15% faster than
{{getPixelSafe}} for interior pixels
* However, when integrated into the decoder, no measurable improvement was
observed regardless of integration approach — the arithmetic decoder
{{arithDecoder.decode(cx)}} dominates total cost and pixel access improvements
are not reflected in end-to-end performance
The only way to fully recover the 38% regression would be to fix the original
sliding window code while preserving its byte-amortization optimization.
*Decision Point*
the fixed implementation:
* pro Fixes two spec-violating bugs in template 1 decoding
* pro Improves readability and maintainability significantly
* pro: Directly implements the JBIG2 spec (Figure 13)
* con: 38% performance regression for the test case
{*}Recommendation{*}: Accept the regression and commit the pixel-by-pixel
version. The correctness and maintainability gains outweigh the performance
cost for most use cases. If performance becomes critical in the future, we can
revisit the sliding window optimization with the knowledge gained from this
investigation.
I have not yet committed the code but wanted to provide a summary. If we can
accept the performance degradation I'd do some cleanup and commit with the
pixel-by-pixel access. This would give us a much better readable and
maintainable version. If not I'll use the findings and try to isolate the issue
in the current inlined sliding window implementation. I'm sure I'm able to find
it but the exercise showed the importance of readable code.
WDYT?
> Template 1 handling with TPGRON not supported
> ---------------------------------------------
>
> Key: PDFBOX-6199
> URL: https://issues.apache.org/jira/browse/PDFBOX-6199
> Project: PDFBox
> Issue Type: Sub-task
> Components: JBIG2
> Affects Versions: 3.0.4 JBIG2
> Reporter: Maruan Sahyoun
> Priority: Minor
> Attachments: image-2026-04-28-12-27-28-955.png,
> image-2026-04-28-12-28-19-575.png, image-2026-04-28-12-29-21-038.png,
> result-1.png, screenshot-1.png, screenshot-2.png, screenshot-3.png,
> screenshot-4.png
>
>
> {{bitmap-refine-template1-tpgron.jbig2}} tests the folling features of JBIG2:
> - Refinement region
> - Template 1
> - TPGRON
> - Reference bitmap reuse
> We currently produce garbled output when decoding
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]