huaxingao commented on code in PR #10149:
URL: https://github.com/apache/iceberg/pull/10149#discussion_r1597722756
##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetWriter.java:
##########
@@ -116,4 +128,27 @@ public void testCorrectness() throws IOException {
assertThat(rows).as("Should not have extra rows").isExhausted();
}
}
+
+ @Test
+ public void testFpp() throws IOException, NoSuchFieldException,
IllegalAccessException {
+ File testFile = File.createTempFile("junit", null, temp.toFile());
+ try (FileAppender<InternalRow> writer =
+ Parquet.write(Files.localOutput(testFile))
+ .schema(SCHEMA)
+ .set(PARQUET_BLOOM_FILTER_COLUMN_ENABLED_PREFIX + "id", "true")
+ .set(PARQUET_BLOOM_FILTER_COLUMN_FPP_PREFIX + "id", "0.05")
+ .createWriterFunc(
+ msgType ->
+
SparkParquetWriters.buildWriter(SparkSchemaUtil.convert(SCHEMA), msgType))
+ .build()) {
+ // Using reflection to access the private 'props' field in ParquetWriter
+ Field propsField = writer.getClass().getDeclaredField("props");
+ propsField.setAccessible(true);
+ ParquetProperties props = (ParquetProperties) propsField.get(writer);
+ MessageType parquetSchema = ParquetSchemaUtil.convert(SCHEMA, "test");
+ ColumnDescriptor descriptor = parquetSchema.getColumnDescription(new
String[] {"id"});
+ double fpp = props.getBloomFilterFPP(descriptor).getAsDouble();
+ assertThat(fpp).isEqualTo(0.05);
Review Comment:
parquet-mr takes the `bloomFilterFPPs` in `ParquetProperties` and uses it to
build the bloom filter. Checking `bloomFilterFPPs` in `ParquetProperties` is
sufficient for verifying bloom filter fpp is set correctly in Iceberg.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]