[
https://issues.apache.org/jira/browse/COMPRESS-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043666#comment-18043666
]
Piotr Karwasz edited comment on COMPRESS-713 at 12/8/25 8:44 PM:
-----------------------------------------------------------------
Hi [~pbebr],
Thank you for the report. Yes, the suggestions are very helpful, both as a
brain teaser and as a reminder that Commons Compress needs some architectural
changes in version 2.x. Because of its age and its limited real-world use (even
long after the patents expired), most users probably never encounter ZIP files
using the "Unshrink" compression method. However, corrupted or malicious ZIPs
can still contain such data, so it would be safer to provide a setting allowing
users to explicitly choose which compression algorithms are enabled in ZIP and
7z.
I have distilled the root cause to a cycle that appears in the LZW dictionary
used by {{{}UnshrinkingInputStream{}}}. The dictionary stores entries in the
form {{{}(parentSymbol, lastChar){}}}, and a loop can occur for certain
sequences such as: 65, 66, 257, (256, 2) (the partial clear), 257.
||Prev symbol||Input symbol||Output sequence||New symbol in dictionary||
| |65|"A"| |
|65|66|"B"|257 = (65, "B")|
|66|257|"AB"|258 = (66, "A")|
|257|256|""| |
|257|2|""|clears 257; it becomes free but prevCode remains 257|
|257|257|crashes|257 = (257, "A")|
Piotr
was (Author: pkarwasz):
Hi [~pbebr],
Thank you for the report. Yes, the suggestions are very helpful, both as a
brain teaser and as a reminder that Commons Compress needs some architectural
changes in version 2.x. Because of its age and its limited real-world use (even
long after the patents expired), most users probably never encounter ZIP files
using the "Unshrink" compression method. However, corrupted or malicious ZIPs
can still contain such data, so it would be safer to provide a setting allowing
users to explicitly choose which compression algorithms are enabled in ZIP and
7z.
I have distilled the root cause to a cycle that appears in the LZW dictionary
used by {{UnshrinkingInputStream}}. The dictionary stores entries in the form
{{(parentSymbol, lastChar)}}, and a loop can occur for certain sequences such
as: 65, 66, 257, (256, 2) (the partial clear), 257.
||Prev symbol||Input symbol||Output sequence||New symbol in dictionary||
| |65|"A"| |
|65|66|"B"|257 = (65, "B")|
|66|257|"AB"|258 = (66, "B")|
|257|256|""| |
|257|2|""|clears 257; it becomes free but prevCode remains 257|
|257|257|crashes|257 = (257, "A")|
Piotr
> Unchecked pre-decremental notation in for-loop as array index causes
> ArrayOutOfBounds access
> --------------------------------------------------------------------------------------------
>
> Key: COMPRESS-713
> URL: https://issues.apache.org/jira/browse/COMPRESS-713
> Project: Commons Compress
> Issue Type: Bug
> Components: Compressors
> Environment: Ubuntu 24.04
> $ java --version
> openjdk 21.0.8 2025-07-15
> OpenJDK Runtime Environment (build 21.0.8+9-Ubuntu-0ubuntu124.04.1)
> OpenJDK 64-Bit Server VM (build 21.0.8+9-Ubuntu-0ubuntu124.04.1, mixed mode,
> sharing)
> Reporter: Philip Betzler-Braun
> Assignee: Gary D. Gregory
> Priority: Major
> Attachments: ArrayOutOfBoundsZipInArchiveInputStreamReproducer.java
>
>
> *Issue:*
> LZWInputStream
> (org.apache.commons.compress.compressors.lzw.LZWInputStream.expandCodeToOutputStack(LZWInputStream.java:150))
> contains a byte array outputStack with the size 8192 and an int
> outputStackLocation that is used to find the position to write to in the
> stack. In the function expandCodeToOutputStack (LZWInputStream.java:150)
> there is a C-style pre-decremental statement that is executed in a for-loop
> and never checks what it's value is and if the loop goes on for more then
> 8192 iterations, it causes an ArrayOutOfBounds access to the outputStack byte
> array.
>
> Begin: LZWInputStream.java:149
> {code:java}
> for (int entry = code; entry >= 0; entry = prefixes[entry]) {
> outputStack[--outputStackLocation] = characters[entry];
> } {code}
>
> *Suggestion:*
> * Catch the ArrayOutOfBounds exception and throw a library specific
> exception.
>
> *Reproduction:*
> (reprocuder in attached file -> intended location is:
> src/test/java/org/apache/commons/compress/archivers/zip/ArrayOutOfBoundsZipInArchiveInputStreamReproducer.java)
> [^ArrayOutOfBoundsZipInArchiveInputStreamReproducer.java]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)