zhongyujiang commented on code in PR #6118:
URL: https://github.com/apache/iceberg/pull/6118#discussion_r1015035815
##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java:
##########
@@ -75,23 +76,27 @@ public static Metrics fileMetrics(InputFile file,
MetricsConfig metricsConfig) {
public static Metrics fileMetrics(
InputFile file, MetricsConfig metricsConfig, NameMapping nameMapping) {
try (ParquetFileReader reader =
ParquetFileReader.open(ParquetIO.file(file))) {
- return footerMetrics(reader.getFooter(), Stream.empty(), metricsConfig,
nameMapping);
+ return footerMetrics(reader.getFooter(), Stream.empty(), metricsConfig,
nameMapping, null);
Review Comment:
This is used to migrate external files to Iceberg tables, I am thinking if
those files are generated by Iceberg, we may still not be able to collect
metrics according to the correct metric mode with current change.
To solve this , I think we can change TableMigrationUtil to pass Iceberg
schema also;
Or we can update MetricsConfig to make it tracks mapping between column ids
metric modes, and use column ids to get metric modes directly when collecting
parquet metrics, but that will need all writers to update from
`MetricsConfig.fromProperties(Map<String, String> props)` to `MetricsConfig
from(Map<String, String> props, Schema schema, SortOrder order)` (this is a
private method now).
I prefer the second way and I can update the releated code in this repo, but
I don't know if any external projects uses
`MetricsConfig.fromProperties(Map<String, String> props)` too. What do you
think about it? @rdblue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]