[
https://issues.apache.org/jira/browse/PDFBOX-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075960#comment-18075960
]
Maruan Sahyoun commented on PDFBOX-6162:
----------------------------------------
Summary
Ensure that the lifecycle of {{CX}} and {{ArithmeticDecoder}} aligns with the
JBIG2 specification by either:
* Creating new instances, or
* Copying {{CX}} statistics from a previous segment using {{{}CX.copy(){}}}.
Changes Made
* Added a new method to perform a deep copy of {{CX}} statistics into a new,
independent {{CX}} instance.
* Ensures that reused coding contexts are isolated and do not interfere with
previous segments.
* Ensured that both {{CX}} and {{ArithmeticDecoder}} are initialized before
the dictionary is processed.
* Added validation logic to prevent {{null}} states during critical operations.
* As per the JBIG2 spec, the {{ArithmeticDecoder}} is now reset after each
decode operation to maintain correctness.
* Added Javadoc and inline comments to clarify the purpose and usage of
critical methods and logic.
* Added Fail-Safe Guard in {{{}decodeNewSymbols.{}}}Throws an
{{IllegalStateException}} if either is uninitialized, ensuring fail-fast
behavior for debugging.
Notes
* The guard in {{decodeNewSymbols}} is intentionally kept as a fail-safe to
catch any missing initialization during refactoring or future changes.
Pending Tasks
* Add detailed Javadoc for public and critical private methods to improve
maintainability.
* Explore further restructuring to enhance readability and modularity,
especially for complex decoding logic.
Disclosure
* During debugging I ran several ideas and exploring code paths and stack
traces using Claude AI and Mistral AI. Javadoc by Mistral AI.
> Reuse of symbol context not properly supported
> ----------------------------------------------
>
> Key: PDFBOX-6162
> URL: https://issues.apache.org/jira/browse/PDFBOX-6162
> Project: PDFBox
> Issue Type: Sub-task
> Components: JBIG2
> Affects Versions: 3.0.4 JBIG2
> Reporter: Tilman Hausherr
> Priority: Major
> Attachments: bitmap-symbol-context-reuse.pdf
>
>
> .ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.getToExportFlags(SymbolDictionary.java:898)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.getDictionary(SymbolDictionary.java:467)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.retrieveImportSymbols(SymbolDictionary.java:990)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.setInSyms(SymbolDictionary.java:267)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.parseHeader(SymbolDictionary.java:130)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.init(SymbolDictionary.java:1025)
> at
> org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:380)
> Considering the name of the file, I assume this means we don't support the
> reuse of symbols correctly. The file has at least 4 different symbol
> segments. From what I see on
> https://github.com/SerenityOS/serenity/blob/master/Tests/LibGfx/test-inputs/jbig2/json/bitmap-symbol-context-reuse.json
> the text segment refers to the symbols of the 4 previous symbol segments, and
> the symbol segments indicate some logic to retain the symbols of previous
> segments, so one will have to investigate what happens.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]