Re: [PR] feat: enforce 20-bit Huffman code-length limit consistently [commons-compress]

via GitHub Wed, 27 Aug 2025 03:22:52 -0700


ppkarwasz commented on code in PR #699:
URL: https://github.com/apache/commons-compress/pull/699#discussion_r2303512258



##########
src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.java:
##########
@@ -171,7 +175,6 @@ private static void hbCreateDecodeTables(final int[] limit, 
final int[] base, fi
         }
         for (int i = 0; i < alphaSize; i++) {
             final int len = length[i] + 1;
-            checkBounds(len, MAX_ALPHA_SIZE, "length");
             base[len]++;
         }
         for (int i = 1, b = base[0]; i < MAX_CODE_LEN; i++) {

Review Comment:
   Nice catch! :100:
   
   We actually need a bit more here, the loop should iterate through the full 
length of `base`. The reasoning is:
   
   * At this stage we fill `base` with the cumulative counts Σ<sub>< len</sub> 
count\[len], where `count[len]` is the number of symbols of a given length. 
Later, when computing `limit[len]`, we need to access `count[len]` values all 
the way up to `count[MAX_CODE_LEN]`, which requires having the next 
`MAX_CODE_LEN + 1` index available. I commented the code in 
b3331775b813e0bf1e12e05f641c655928e31413 to make this more clear.
   * In e217afe0ac33638caf7d866daecd0eb81953429f I replaced bound checks 
against theoretical maximums with checks against the actual array length.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: enforce 20-bit Huffman code-length limit consistently [commons-compress]

Reply via email to