[ 
https://issues.apache.org/jira/browse/PDFBOX-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702248#comment-17702248
 ] 

Axel Howind edited comment on PDFBOX-5575 at 3/19/23 10:22 AM:
---------------------------------------------------------------

No, I also thought that the old code would look for the longest match, at least 
I expected that. But it doesn't. Look at the old code:


{code:java}
            byte[] tryPattern = codeTable.get(i);
            if ((foundCode != -1 || tryPattern.length > foundLen) && 
Arrays.equals(tryPattern, pattern))
            {
                foundCode = i;
                foundLen = tryPattern.length;
            }

{code}

Because Arrays.equals() will only return true if both arguments have the same 
length, foundLen will only ever be set to pattern.length.
pattern here is the argument passed to the method. Its length doesn't change 
during iterations. So it is impossible to find a longer match and it's OK to 
return at once. If that's not intended here, the expected unit test results 
will have to be updated together with the code.


was (Author: axh):
No, I also thought that the old code would look for the longest match, at least 
I expected that. But it doesn't. Look at the old code:

            byte[] tryPattern = codeTable.get(i);
            if ((foundCode != -1 || tryPattern.length > foundLen) && 
Arrays.equals(tryPattern, pattern))
            {
                foundCode = i;
                foundLen = tryPattern.length;
            }

Because Arrays.equals() will only return true if both arguments have the same 
length, foundLen will only ever be set to pattern.length.
pattern here is the argument passed to the method. Its length doesn't change 
during iterations. So it is impossible to find a longer match and it's OK to 
return at once. If that's not intended here, the expected unit test results 
will have to be updated together with the code.

> optimize LZWFilter
> ------------------
>
>                 Key: PDFBOX-5575
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5575
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Axel Howind
>            Priority: Minor
>         Attachments: optimize_LZWFilter.patch
>
>
> I ran the PDFBox tests with a profiler and saw that LZWFilter used quite a 
> bunch of time, so I thought I might look at the code. I just looked at it 
> totally out of context and tried to understand what is done there and what 
> could be changed without altering results.
>  * made the private mehtods static
>  * changed the variable/method parameter 'earlyChange' from integer to 
> boolean because I thought tha would be more readable
>  * some minor tweaks
>  * it looks like codeTable is initialized quite often and everytime, 256 
> length 1 byte arrays are created, so I pre-allocate those byte arrays so that 
> they can be shared by all code tables. [~tilman] I assumed the contents of 
> the codeTable entries will not be changed, and my analysis of the code seems 
> to prove that (also the passing unit tests). Just please have a look at this 
> so I don't break anything.
>  * it took me some time to fully understand what findPatternCode() does and 
> why it checks the codeTable in reverse order. I more or less recreated that 
> method from scratch and I think it should now always be faster: for patterns 
> of length 1 no iteration is done, and for longer patterns iteration stops 
> once the correct entry is found. As this is the most notable change, please 
> take a closer look. Unit tests pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to