emkornfield commented on a change in pull request #12274:
URL: https://github.com/apache/arrow/pull/12274#discussion_r793127355



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -1486,7 +1486,7 @@ class DictDecoderImpl : public DecoderImpl, virtual 
public DictDecoder<Type> {
       return;
     }
     uint8_t bit_width = *data;
-    if (ARROW_PREDICT_FALSE(bit_width >= 64)) {
+    if (ARROW_PREDICT_FALSE(bit_width > 32)) {
       throw ParquetException("Invalid or corrupted bit_width");

Review comment:
       could we update the message here.  I guess since this is failing today, 
we probably could never read bit-widths greater then 32 but the spec under RLE 
encoding says:
   
   > This length restriction was not part of the Parquet 2.5.0 and earlier 
specifications, but longer runs were not readable by the most common Parquet 
implementations so, in practice, were not safe for Parquet writers to emit.
   
   Its not clear if this also applies to dictionary bit-width.  But I guess a 
dictionary of 2-Billion plus entries is probably untenable anways.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to