szehon-ho opened a new pull request, #4560: URL: https://github.com/apache/iceberg/pull/4560
https://github.com/apache/iceberg/pull/4516 fixes the schema of the partitions table in the case of changed partition specs to use Partitioning.partitionType (a union of all previous partition specs), but the data is still wrong in some cases: * The PartitionsMap is constructed with the table's current spec, leading to colliding of some values which get lost. * Ref: https://github.com/apache/iceberg/blob/apache-iceberg-0.13.1/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L99. So some partitions may collide in the map and don't get returned. * Example: p1=foo, p2=bar collides with p1=foo if PartitionMap is instantiated with current spec of {p1}, and other instance is lost. * The partition values are just listed in order without any transformation, meaning they may in the wrong field in the unified schema. * Ref: https://github.com/apache/iceberg/blob/a78aa2dbdb98634f26066c457cc1aef93166be9f/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L93. * Example, p1=foo, p2=bar and then p2=foo, p1=bar still comes as (foo, bar), (foo, bar) in wrong order. This fixes the problem by : 1. Instantiating the PartitionMap by Partitioning.partitionType, so all types are used in the hashcode generation. 2. Transforming the PartitionDatas to fit the final schema Partioning.partitionType() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
