[
https://issues.apache.org/jira/browse/PDFBOX-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075648#comment-18075648
]
Maruan Sahyoun commented on PDFBOX-6162:
----------------------------------------
Refactoring: GenericRefinementRegion — Extract Decoding Procedure
Summary
The JBIG2 Generic Refinement Region decoding procedure (§6.3.5.6) has been
extracted from {{GenericRefinementRegion}} into a new dedicated class,
{{{}GenericRefinementRegionDecodingProcedure{}}}. This separates the pure
decoding algorithm from the segment-level concerns of parsing and bitmap
resolution,
and lays the groundwork for {{SymbolDictionary}} and {{TextRegion}} to call the
procedure directly rather than routing through {{{}GenericRefinementRegion{}}}.
Changes
New class: {{GenericRefinementRegionDecodingProcedure}}
- Implements the pure §6.3.5.6 algorithm with no dependency on segment headers
or input streams.
- Entry point is a single static {{decode()}} method. {{ArithmeticDecoder}}
and {{CX }}are passed explicitly, so callers control whether instances are
fresh or shared.
- Cannot be instantiated externally. A short-lived private instance is created
per {{decode()}} call so that private helper methods can share state through
fields rather than long parameter lists.
- The {{Template}} inner class hierarchy has moved here from
{{{}GenericRefinementRegion{}}}.
Updated: {{GenericRefinementRegion}}
- Now acts on the segment-level only: parses the header, resolves the
reference bitmap from referred-to segments or the page buffer (§7.4.7.4),
and delegates to {{{}GenericRefinementRegionDecodingProcedure.decode(){}}}.
- {{setParameters()}} is marked {{{}@Deprecated{}}}. Callers should migrate to
invoking {{GenericRefinementRegionDecodingProcedure.decode()}} directly.
- The {{Template}} inner class is marked {{@Deprecated}} with a reference to
its new location.
Pending
- Migrate {{SymbolDictionary}} and {{TextRegion}} to call
{{GenericRefinementRegionDecodingProcedure.decode()}} directly, after which
{{setParameters()}} and the deprecated {{Template}} class can be removed
> Reuse of symbol context not properly supported
> ----------------------------------------------
>
> Key: PDFBOX-6162
> URL: https://issues.apache.org/jira/browse/PDFBOX-6162
> Project: PDFBox
> Issue Type: Sub-task
> Components: JBIG2
> Affects Versions: 3.0.4 JBIG2
> Reporter: Tilman Hausherr
> Priority: Major
> Attachments: bitmap-symbol-context-reuse.pdf
>
>
> .ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.getToExportFlags(SymbolDictionary.java:898)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.getDictionary(SymbolDictionary.java:467)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.retrieveImportSymbols(SymbolDictionary.java:990)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.setInSyms(SymbolDictionary.java:267)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.parseHeader(SymbolDictionary.java:130)
> at
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.init(SymbolDictionary.java:1025)
> at
> org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:380)
> Considering the name of the file, I assume this means we don't support the
> reuse of symbols correctly. The file has at least 4 different symbol
> segments. From what I see on
> https://github.com/SerenityOS/serenity/blob/master/Tests/LibGfx/test-inputs/jbig2/json/bitmap-symbol-context-reuse.json
> the text segment refers to the symbols of the 4 previous symbol segments, and
> the symbol segments indicate some logic to retain the symbols of previous
> segments, so one will have to investigate what happens.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]