[GitHub] [iceberg] szehon-ho opened a new pull request, #4560: Core: Fix Partitions table for evolved partition specs

GitBox Thu, 14 Apr 2022 00:49:50 -0700


szehon-ho opened a new pull request, #4560:
URL: https://github.com/apache/iceberg/pull/4560

https://github.com/apache/iceberg/pull/4516 fixes the schema of the
partitions table in the case of changed partition specs to use
Partitioning.partitionType (a union of all previous partition specs), but the
data is still wrong in some cases:

* The PartitionsMap is constructed with the table's current spec, leading to
colliding of some values which get lost.
* Ref:
https://github.com/apache/iceberg/blob/apache-iceberg-0.13.1/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L99.
So some partitions may collide in the map and don't get returned.
* Example: p1=foo, p2=bar collides with p1=foo if PartitionMap is
instantiated with current spec of {p1}, and other instance is lost.
* The partition values are just listed in order without any transformation,
meaning they may in the wrong field in the unified schema.
* Ref:
https://github.com/apache/iceberg/blob/a78aa2dbdb98634f26066c457cc1aef93166be9f/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L93.

* Example, p1=foo, p2=bar and then p2=foo, p1=bar still comes as (foo,
bar), (foo, bar) in wrong order.

This fixes the problem by :
1. Instantiating the PartitionMap by Partitioning.partitionType, so all
types are used in the hashcode generation.
2. Transforming the PartitionDatas to fit the final schema
Partioning.partitionType()

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho opened a new pull request, #4560: Core: Fix Partitions table for evolved partition specs

Reply via email to