amogh-jahagirdar commented on code in PR #9902:
URL: https://github.com/apache/iceberg/pull/9902#discussion_r1518451041
##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderWithBloomFilter.java:
##########
@@ -367,11 +374,28 @@ public void testReadWithFilter() {
.filter(
"id = 250 AND id_long = 1250 AND id_double = 10250.0 AND
id_float = 100250.0"
+ " AND id_string = 'BINARY测试_250' AND id_boolean = true
AND id_date = '2021-09-05'"
- + " AND id_int_decimal = 77.77 AND id_long_decimal = 88.88
AND id_fixed_decimal = 99.99");
+ + " AND id_int_decimal = 77.77 AND id_long_decimal = 88.88
AND id_fixed_decimal = 99.99"
+ + " AND id_nested.nested_id = 250");
record = SparkValueConverter.convert(table.schema(),
df.collectAsList().get(0));
assertThat(df.collectAsList()).as("Table should contain 1 row").hasSize(1);
assertThat(record.get(0)).as("Table should contain expected
rows").isEqualTo(250);
}
+
+ @TestTemplate
+ public void testBloomCreation() throws IOException {
+ org.apache.hadoop.fs.Path path = new
org.apache.hadoop.fs.Path(temp.toString());
+ ParquetMetadata parquetMetadata = ParquetFileReader.readFooter(new
Configuration(), path);
+ for(int i = 0; i < 11; i++) {
+ if (useBloomFilter)
+ {
+
assertThat(parquetMetadata.getBlocks().get(0).getColumns().get(0).getBloomFilterOffset()).isNotEqualTo(-1L);
+ }
+ else
+ {
+
assertThat(parquetMetadata.getBlocks().get(0).getColumns().get(0).getBloomFilterOffset()).isEqualTo(-1L);
+ }
+ }
Review Comment:
I think this is great validation we should add, but I think in the Spark
tests we should use the Spark APIs or Spark SQL to perform the write and then
we run the validation to make sure the bloom filters exist. That should help
catch the issue why we're not seeing bloom filters being written for nested
types when writing via Spark (but going through `FileAppender` writes masks
that)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]