[ 
https://issues.apache.org/jira/browse/COMPRESS-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043666#comment-18043666
 ] 

Piotr Karwasz edited comment on COMPRESS-713 at 12/8/25 8:44 PM:
-----------------------------------------------------------------

Hi [~pbebr],

Thank you for the report. Yes, the suggestions are very helpful, both as a 
brain teaser and as a reminder that Commons Compress needs some architectural 
changes in version 2.x. Because of its age and its limited real-world use (even 
long after the patents expired), most users probably never encounter ZIP files 
using the "Unshrink" compression method. However, corrupted or malicious ZIPs 
can still contain such data, so it would be safer to provide a setting allowing 
users to explicitly choose which compression algorithms are enabled in ZIP and 
7z.

I have distilled the root cause to a cycle that appears in the LZW dictionary 
used by {{{}UnshrinkingInputStream{}}}. The dictionary stores entries in the 
form {{{}(parentSymbol, lastChar){}}}, and a loop can occur for certain 
sequences such as: 65, 66, 257, (256, 2) (the partial clear), 257.
||Prev symbol||Input symbol||Output sequence||New symbol in dictionary||
| |65|"A"| |
|65|66|"B"|257 = (65, "B")|
|66|257|"AB"|258 = (66, "A")|
|257|256|""| |
|257|2|""|clears 257; it becomes free but prevCode remains 257|
|257|257|crashes|257 = (257, "A")|

Piotr


was (Author: pkarwasz):
Hi [~pbebr],

Thank you for the report. Yes, the suggestions are very helpful, both as a 
brain teaser and as a reminder that Commons Compress needs some architectural 
changes in version 2.x. Because of its age and its limited real-world use (even 
long after the patents expired), most users probably never encounter ZIP files 
using the "Unshrink" compression method. However, corrupted or malicious ZIPs 
can still contain such data, so it would be safer to provide a setting allowing 
users to explicitly choose which compression algorithms are enabled in ZIP and 
7z.

I have distilled the root cause to a cycle that appears in the LZW dictionary 
used by {{UnshrinkingInputStream}}. The dictionary stores entries in the form 
{{(parentSymbol, lastChar)}}, and a loop can occur for certain sequences such 
as: 65, 66, 257, (256, 2) (the partial clear), 257.

||Prev symbol||Input symbol||Output sequence||New symbol in dictionary||
| |65|"A"| |
|65|66|"B"|257 = (65, "B")|
|66|257|"AB"|258 = (66, "B")|
|257|256|""| |
|257|2|""|clears 257; it becomes free but prevCode remains 257|
|257|257|crashes|257 = (257, "A")|

Piotr

> Unchecked pre-decremental notation in for-loop as array index causes 
> ArrayOutOfBounds access
> --------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-713
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-713
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>         Environment: Ubuntu 24.04
> $ java --version
> openjdk 21.0.8 2025-07-15
> OpenJDK Runtime Environment (build 21.0.8+9-Ubuntu-0ubuntu124.04.1)
> OpenJDK 64-Bit Server VM (build 21.0.8+9-Ubuntu-0ubuntu124.04.1, mixed mode, 
> sharing)
>            Reporter: Philip Betzler-Braun
>            Assignee: Gary D. Gregory
>            Priority: Major
>         Attachments: ArrayOutOfBoundsZipInArchiveInputStreamReproducer.java
>
>
> *Issue:* 
> LZWInputStream 
> (org.apache.commons.compress.compressors.lzw.LZWInputStream.expandCodeToOutputStack(LZWInputStream.java:150))
>   contains a byte array outputStack with the size 8192 and an int 
> outputStackLocation that is used to find the position to write to in the 
> stack. In the function expandCodeToOutputStack (LZWInputStream.java:150) 
> there is a C-style pre-decremental statement that is executed in a for-loop 
> and never checks what it's value is and if the loop goes on for more then 
> 8192 iterations, it causes an ArrayOutOfBounds access to the outputStack byte 
> array.
>  
> Begin: LZWInputStream.java:149
> {code:java}
> for (int entry = code; entry >= 0; entry = prefixes[entry]) {
>     outputStack[--outputStackLocation] = characters[entry];
> } {code}
>  
> *Suggestion:*
>  * Catch the ArrayOutOfBounds exception and throw a library specific 
> exception.
>  
> *Reproduction:*
> (reprocuder in attached file -> intended location is: 
> src/test/java/org/apache/commons/compress/archivers/zip/ArrayOutOfBoundsZipInArchiveInputStreamReproducer.java)
> [^ArrayOutOfBoundsZipInArchiveInputStreamReproducer.java]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to