zhangjiashen commented on code in PR #1184:
URL: https://github.com/apache/parquet-mr/pull/1184#discussion_r1396702452
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java:
##########
@@ -1347,11 +1348,24 @@ public BloomFilter readBloomFilter(ColumnChunkMetaData
meta) throws IOException
}
}
- // Read Bloom filter data header.
+ // Seek to Bloom filter offset.
f.seek(bloomFilterOffset);
+
+ // Read Bloom filter length.
+ int bloomFilterLength = meta.getBloomFilterLength();
+
+ // If it is set, read Bloom filter header and bitset together.
+ // Otherwise, read Bloom filter header first and then bitset.
+ InputStream in = null;
+ if (bloomFilterLength > 0) {
+ byte[] headerAndBitSet = new byte[bloomFilterLength];
+ f.readFully(headerAndBitSet);
+ in = new ByteArrayInputStream(headerAndBitSet);
+ }
+
BloomFilterHeader bloomFilterHeader;
try {
- bloomFilterHeader = Util.readBloomFilterHeader(f, bloomFilterDecryptor,
bloomFilterHeaderAAD);
+ bloomFilterHeader = Util.readBloomFilterHeader(in != null ? in : f,
bloomFilterDecryptor, bloomFilterHeaderAAD);
Review Comment:
It would make code more complex to read if we separate these into two
methods. Changed code little bit to avoid sereral checks, please check if it
makes sense?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]