[ 
https://issues.apache.org/jira/browse/PDFBOX-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702263#comment-17702263
 ] 

Tilman Hausherr commented on PDFBOX-5575:
-----------------------------------------

You're right and the javadoc ("Find the longest matching pattern") is wrong and 
so is the length comparison in the original code. Finding the "longest" is done 
in the calling code. I'm gonna think about this for a while but it looks like 
your change is good.

> optimize LZWFilter
> ------------------
>
>                 Key: PDFBOX-5575
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5575
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Axel Howind
>            Priority: Minor
>         Attachments: optimize_LZWFilter.patch
>
>
> I ran the PDFBox tests with a profiler and saw that LZWFilter used quite a 
> bunch of time, so I thought I might look at the code. I just looked at it 
> totally out of context and tried to understand what is done there and what 
> could be changed without altering results.
>  * made the private mehtods static
>  * changed the variable/method parameter 'earlyChange' from integer to 
> boolean because I thought tha would be more readable
>  * some minor tweaks
>  * it looks like codeTable is initialized quite often and everytime, 256 
> length 1 byte arrays are created, so I pre-allocate those byte arrays so that 
> they can be shared by all code tables. [~tilman] I assumed the contents of 
> the codeTable entries will not be changed, and my analysis of the code seems 
> to prove that (also the passing unit tests). Just please have a look at this 
> so I don't break anything.
>  * it took me some time to fully understand what findPatternCode() does and 
> why it checks the codeTable in reverse order. I more or less recreated that 
> method from scratch and I think it should now always be faster: for patterns 
> of length 1 no iteration is done, and for longer patterns iteration stops 
> once the correct entry is found. As this is the most notable change, please 
> take a closer look. Unit tests pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to