[ 
https://issues.apache.org/jira/browse/PDFBOX-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075648#comment-18075648
 ] 

Maruan Sahyoun commented on PDFBOX-6162:
----------------------------------------

Refactoring: GenericRefinementRegion — Extract Decoding Procedure

Summary

The JBIG2 Generic Refinement Region decoding procedure (§6.3.5.6) has been 
extracted from {{GenericRefinementRegion}} into a new dedicated class,
{{{}GenericRefinementRegionDecodingProcedure{}}}. This separates the pure 
decoding algorithm from the segment-level concerns of parsing and bitmap 
resolution,
and lays the groundwork for {{SymbolDictionary}} and {{TextRegion}} to call the 
procedure directly rather than routing through {{{}GenericRefinementRegion{}}}.

Changes

New class: {{GenericRefinementRegionDecodingProcedure}}
 - Implements the pure §6.3.5.6 algorithm with no dependency on segment headers 
or input streams.
 - Entry point is a single static {{decode()}} method. {{ArithmeticDecoder}} 
and {{CX }}are passed explicitly, so callers control whether instances are 
fresh or shared.
 - Cannot be instantiated externally. A short-lived private instance is created 
per {{decode()}} call so that private helper methods can share state through
fields rather than long parameter lists.
 - The {{Template}} inner class hierarchy has moved here from 
{{{}GenericRefinementRegion{}}}.

Updated: {{GenericRefinementRegion}}
 - Now acts on the segment-level only: parses the header, resolves the 
reference bitmap from referred-to segments or the page buffer (§7.4.7.4),
and delegates to {{{}GenericRefinementRegionDecodingProcedure.decode(){}}}.
 - {{setParameters()}} is marked {{{}@Deprecated{}}}. Callers should migrate to 
invoking {{GenericRefinementRegionDecodingProcedure.decode()}} directly.
 - The {{Template}} inner class is marked {{@Deprecated}} with a reference to 
its new location.

Pending
 - Migrate {{SymbolDictionary}} and {{TextRegion}} to call 
{{GenericRefinementRegionDecodingProcedure.decode()}} directly, after which
{{setParameters()}} and the deprecated {{Template}} class can be removed

> Reuse of symbol context not properly supported
> ----------------------------------------------
>
>                 Key: PDFBOX-6162
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6162
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: JBIG2
>    Affects Versions: 3.0.4 JBIG2
>            Reporter: Tilman Hausherr
>            Priority: Major
>         Attachments: bitmap-symbol-context-reuse.pdf
>
>
> .ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.getToExportFlags(SymbolDictionary.java:898)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.getDictionary(SymbolDictionary.java:467)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.retrieveImportSymbols(SymbolDictionary.java:990)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.setInSyms(SymbolDictionary.java:267)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.parseHeader(SymbolDictionary.java:130)
>       at 
> org.apache.pdfbox.jbig2.segments.SymbolDictionary.init(SymbolDictionary.java:1025)
>       at 
> org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:380)
> Considering the name of the file, I assume this means we don't support the 
> reuse of symbols correctly. The file has at least 4 different symbol 
> segments. From what I see on
> https://github.com/SerenityOS/serenity/blob/master/Tests/LibGfx/test-inputs/jbig2/json/bitmap-symbol-context-reuse.json
> the text segment refers to the symbols of the 4 previous symbol segments, and 
> the symbol segments indicate some logic to retain the symbols of previous 
> segments, so one will have to investigate what happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to