alliasgher commented on code in PR #878:
URL: https://github.com/apache/iceberg-go/pull/878#discussion_r3074672595


##########
table/internal/parquet_files.go:
##########
@@ -259,8 +259,27 @@ func (parquetFormat) GetWriteProperties(props 
iceberg.Properties) any {
                slog.Warn("unrecognized compression codec, falling back to 
uncompressed", "codec", compression)
        }
 
-       return append(writerProps, parquet.WithCompression(codec),
+       writerProps = append(writerProps, parquet.WithCompression(codec),
                parquet.WithCompressionLevel(compressionLevel))
+
+       // Bloom filter properties.
+       // write.parquet.bloom-filter-max-bytes caps the per-column bloom 
filter size.
+       bloomMaxBytes := props.GetInt(ParquetBloomFilterMaxBytesKey, 
ParquetBloomFilterMaxBytesDefault)
+       writerProps = append(writerProps, 
parquet.WithMaxBloomFilterBytes(int64(bloomMaxBytes)))
+
+       // write.parquet.bloom-filter-enabled.column.<col-name> enables bloom 
filters
+       // for individual columns. Scan all properties for the prefix.
+       prefix := ParquetBloomFilterColumnEnabledKeyPrefix + "."
+       for key, val := range props {
+               colName, ok := strings.CutPrefix(key, prefix)
+               if !ok || colName == "" {
+                       continue
+               }
+               enabled := strings.EqualFold(val, "true")

Review Comment:
   Added a comment in 9684d10 explaining this.



##########
table/internal/parquet_files.go:
##########
@@ -259,8 +259,27 @@ func (parquetFormat) GetWriteProperties(props 
iceberg.Properties) any {
                slog.Warn("unrecognized compression codec, falling back to 
uncompressed", "codec", compression)
        }
 
-       return append(writerProps, parquet.WithCompression(codec),
+       writerProps = append(writerProps, parquet.WithCompression(codec),

Review Comment:
   Agreed, harmless as-is — leaving it unconditional for now since the parquet 
writer ignores it when no bloom filters are written.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to