Re: [PR] Spark: Add a test to check if the bloom filters are added to the parquet files [iceberg]

via GitHub Sat, 09 Mar 2024 11:24:29 -0800


hussein-awala commented on code in PR #9902:
URL: https://github.com/apache/iceberg/pull/9902#discussion_r1518648135



##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderWithBloomFilter.java:
##########
@@ -367,11 +374,28 @@ public void testReadWithFilter() {
             .filter(
                 "id = 250 AND id_long = 1250 AND id_double = 10250.0 AND 
id_float = 100250.0"
                     + " AND id_string = 'BINARY测试_250' AND id_boolean = true 
AND id_date = '2021-09-05'"
-                    + " AND id_int_decimal = 77.77 AND id_long_decimal = 88.88 
AND id_fixed_decimal = 99.99");
+                    + " AND id_int_decimal = 77.77 AND id_long_decimal = 88.88 
AND id_fixed_decimal = 99.99"
+                    + " AND id_nested.nested_id = 250");
 
     record = SparkValueConverter.convert(table.schema(), 
df.collectAsList().get(0));
 
     assertThat(df.collectAsList()).as("Table should contain 1 row").hasSize(1);
     assertThat(record.get(0)).as("Table should contain expected 
rows").isEqualTo(250);
   }
+
+  @TestTemplate
+  public void testBloomCreation() throws IOException {
+    org.apache.hadoop.fs.Path path = new 
org.apache.hadoop.fs.Path(temp.toString());
+    ParquetMetadata parquetMetadata = ParquetFileReader.readFooter(new 
Configuration(), path);
+    for(int i = 0; i < 11; i++) {
+      if (useBloomFilter)
+      {
+        
assertThat(parquetMetadata.getBlocks().get(0).getColumns().get(0).getBloomFilterOffset()).isNotEqualTo(-1L);
+      }
+      else
+      {
+          
assertThat(parquetMetadata.getBlocks().get(0).getColumns().get(0).getBloomFilterOffset()).isEqualTo(-1L);
+      }
+    }

Review Comment:
   > I think in the Spark tests we should use the Spark APIs
   
   I totally agree, I will update it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Add a test to check if the bloom filters are added to the parquet files [iceberg]

Reply via email to