[ 
https://issues.apache.org/jira/browse/PDFBOX-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075960#comment-18075960
 ] 

Maruan Sahyoun commented on PDFBOX-6162:
----------------------------------------

Summary

Ensure that the lifecycle of {{CX}} and {{ArithmeticDecoder}} aligns with the 
JBIG2 specification by either:
 * Creating new instances, or
 * Copying {{CX}} statistics from a previous segment using {{{}CX.copy(){}}}.

Changes Made
 * Added a new method to perform a deep copy of {{CX}} statistics into a new, 
independent {{CX}} instance.
 * Ensures that reused coding contexts are isolated and do not interfere with 
previous segments.
 * Ensured that both {{CX}} and {{ArithmeticDecoder}} are initialized before 
the dictionary is processed.
 * Added validation logic to prevent {{null}} states during critical operations.
 * As per the JBIG2 spec, the {{ArithmeticDecoder}} is now reset after each 
decode operation to maintain correctness.
 * Added Javadoc and inline comments to clarify the purpose and usage of 
critical methods and logic.
 * Added Fail-Safe Guard in {{{}decodeNewSymbols.{}}}Throws an 
{{IllegalStateException}} if either is uninitialized, ensuring fail-fast 
behavior for debugging.

Notes
 * The guard in {{decodeNewSymbols}} is intentionally kept as a fail-safe to 
catch any missing initialization during refactoring or future changes.

Pending Tasks
 * Add detailed Javadoc for public and critical private methods to improve 
maintainability.
 * Explore further restructuring to enhance readability and modularity, 
especially for complex decoding logic.

Disclosure
 * During debugging I ran several ideas and exploring code paths and stack 
traces using Claude AI and Mistral AI. Javadoc by Mistral AI.

> Reuse of symbol context not properly supported
> ----------------------------------------------
>
>                 Key: PDFBOX-6162
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6162
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: JBIG2
>    Affects Versions: 3.0.4 JBIG2
>            Reporter: Tilman Hausherr
>            Priority: Major
>         Attachments: bitmap-symbol-context-reuse.pdf
>
>
> .ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.getToExportFlags(SymbolDictionary.java:898)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.getDictionary(SymbolDictionary.java:467)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.retrieveImportSymbols(SymbolDictionary.java:990)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.setInSyms(SymbolDictionary.java:267)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.parseHeader(SymbolDictionary.java:130)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.init(SymbolDictionary.java:1025)
>       at 
> org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:380)
> Considering the name of the file, I assume this means we don't support the 
> reuse of symbols correctly. The file has at least 4 different symbol 
> segments. From what I see on
> https://github.com/SerenityOS/serenity/blob/master/Tests/LibGfx/test-inputs/jbig2/json/bitmap-symbol-context-reuse.json
> the text segment refers to the symbols of the 4 previous symbol segments, and 
> the symbol segments indicate some logic to retain the symbols of previous 
> segments, so one will have to investigate what happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to