[ 
https://issues.apache.org/jira/browse/PDFBOX-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078282#comment-18078282
 ] 

Maruan Sahyoun commented on PDFBOX-6199:
----------------------------------------

TLDR - accept a 38% performance regression for correctness, readability, and 
maintainability in {{{}GenericRefinmentRegionDecodingProdure{}}}.

*Background*
The {{GenericRefinementRegionDecodingProcedure}} class implements the JBIG2 
Generic Refinement Region decoding procedure (ITU-T T.88 §6.3.5.6). Two bugs 
were identified in the template 1 decoding paths that caused corrupted output 
for JBIG2 files using template 1 with typical prediction (TPGRON=1).

*Bugs Fixed*
 * Wrong SLTP context constant for template 1
{{SLTP_CONTEXT_TEMPLATE1}} was {{0x080}} instead of the correct {{0x008}} 
(§6.3.5.6, Figure 15). This caused the typical prediction line flag to be 
decoded using the wrong arithmetic coding context.
 * Broken TPGR path for template 1
{{decodeTypicalPredictedLineTemplate1}} had a {{grReferenceValue}} drift bug — 
the variable was not being re-anchored to the freshly loaded byte window 
variables at each byte boundary. This caused incorrect typical prediction 
decisions for all pixels beyond the first byte of each row.

*Fix*
Both template 1 decoding paths were replaced with a readable pixel-by-pixel 
implementation directly verifiable against Figure 13 of the spec:
 * {{decodeLineExplicitT1}} — replaces the explicit decode path (LTP=0)
 * {{decodeLineTPGRT1}} — replaces the TPGR path (LTP=1)
 * {{buildContextT1}} — forms the 10-bit arithmetic coding context by directly 
reading the 10 pixel positions defined in Figure 13

The pixel accessors {{{}getRegionBit{}}}, {{{}getReferenceBit{}}}, and 
{{getPixelSafe}} provide a clean separation between coordinate transformation, 
bounds handling, and bit extraction.

*Performance Impact*
Benchmarked using JMH against a captured real-world refinement region 
(1728×2339, template 1, TPGRON=false) from the Serenity JBIG2 test suite:
|Version|Score|vs original|
|Original (buggy)|3.32 ms/op|baseline|
|Fixed pixel-by-pixel|4.57 ms/op|+38%|

The 38% regression is the cost of correctness and maintainability. The original 
code's performance came from an optimized sliding byte window that amortized 
bitmap access across 8 pixels per byte fetch. The pixel-by-pixel replacement 
pays full access cost per pixel.

*Performance Recovery Investigation*
A {{SlidingBitmapWindow}} class was designed, implemented, and benchmarked as a 
potential performance recovery mechanism. Key findings:
 * With bounds-checked {{{}getBit(){}}}: 73% slower than {{getPixelSafe}} for 
pure pixel access
 * With bounds-check-free {{getBitFast()}} for interior pixels: 15% faster than 
{{getPixelSafe}} for interior pixels
 * However, when integrated into the decoder, no measurable improvement was 
observed regardless of integration approach — the arithmetic decoder 
{{arithDecoder.decode(cx)}} dominates total cost and pixel access improvements 
are not reflected in end-to-end performance

The only way to fully recover the 38% regression would be to fix the original 
sliding window code while preserving its byte-amortization optimization.

*Decision Point*

the fixed implementation:
 * pro Fixes two spec-violating bugs in template 1 decoding
 * pro Improves readability and maintainability significantly
 * pro: Directly implements the JBIG2 spec (Figure 13)
 * con: 38% performance regression for the test case

{*}Recommendation{*}: Accept the regression and commit the pixel-by-pixel 
version. The correctness and maintainability gains outweigh the performance 
cost for most use cases. If performance becomes critical in the future, we can 
revisit the sliding window optimization with the knowledge gained from this 
investigation.

I have not yet committed the code but wanted to provide a summary. If we can 
accept the performance degradation I'd do some cleanup and commit with the 
pixel-by-pixel access. This would give us a much better readable and 
maintainable version. If not I'll use the findings and try to isolate the issue 
in the current inlined sliding window implementation. I'm sure I'm able to find 
it but the exercise showed the importance of readable code.

WDYT? 

> Template 1 handling with TPGRON not supported
> ---------------------------------------------
>
>                 Key: PDFBOX-6199
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6199
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: JBIG2
>    Affects Versions: 3.0.4 JBIG2
>            Reporter: Maruan Sahyoun
>            Priority: Minor
>         Attachments: image-2026-04-28-12-27-28-955.png, 
> image-2026-04-28-12-28-19-575.png, image-2026-04-28-12-29-21-038.png, 
> result-1.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> {{bitmap-refine-template1-tpgron.jbig2}} tests the folling features of JBIG2:
> - Refinement region
> - Template 1
> - TPGRON
> - Reference bitmap reuse
> We currently produce garbled output when decoding



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to