moomindani commented on issue #15347:
URL: https://github.com/apache/iceberg/issues/15347#issuecomment-4059177430
I investigated this issue and was unable to reproduce it on `main`. I wrote
the following test that disables stats for two columns (`int_field` and
`string_field`) and verifies a third column (`long_field`) still has stats:
```java
@Test
public void testMultipleColumnsStatisticsDisabled() throws Exception {
Schema schema =
new Schema(
optional(1, "int_field", IntegerType.get()),
optional(2, "string_field", Types.StringType.get()),
optional(3, "long_field", Types.LongType.get()));
File file = createTempFile(temp);
List<GenericData.Record> records = Lists.newArrayListWithCapacity(5);
org.apache.avro.Schema avroSchema =
AvroSchemaUtil.convert(schema.asStruct());
for (int i = 1; i <= 5; i++) {
GenericData.Record record = new GenericData.Record(avroSchema);
record.put("int_field", i);
record.put("string_field", "test");
record.put("long_field", (long) i);
records.add(record);
}
write(
file,
schema,
ImmutableMap.<String, String>builder()
.put(PARQUET_COLUMN_STATS_ENABLED_PREFIX + "int_field", "false")
.put(PARQUET_COLUMN_STATS_ENABLED_PREFIX + "string_field", "false")
.buildOrThrow(),
ParquetAvroWriter::buildWriter,
records.toArray(new GenericData.Record[] {}));
InputFile inputFile = Files.localInput(file);
try (ParquetFileReader reader =
ParquetFileReader.open(ParquetIO.file(inputFile))) {
for (BlockMetaData block : reader.getFooter().getBlocks()) {
for (ColumnChunkMetaData column : block.getColumns()) {
boolean emptyStats = column.getStatistics().isEmpty();
if (column.getPath().toDotString().equals("int_field")) {
assertThat(emptyStats).as("int_field should not have
statistics").isTrue();
} else if (column.getPath().toDotString().equals("string_field")) {
assertThat(emptyStats).as("string_field should not have
statistics").isTrue();
} else if (column.getPath().toDotString().equals("long_field")) {
assertThat(emptyStats).as("long_field should have
statistics").isFalse();
}
}
}
}
}
```
This test passes on `main` — both columns have stats disabled as expected.
Could you share more details about your setup (query engine, catalog type, how
the table properties are set)? The issue might be related to how properties are
propagated in your specific environment rather than the parquet writer itself.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]