szehon-ho opened a new pull request, #4560:
URL: https://github.com/apache/iceberg/pull/4560

   https://github.com/apache/iceberg/pull/4516 fixes the schema of the 
partitions table in the case of changed partition specs to use 
Partitioning.partitionType (a union of all previous partition specs), but the 
data is still wrong in some cases:
   
   * The PartitionsMap is constructed with the table's current spec, leading to 
colliding of some values which get lost. 
     * Ref:  
https://github.com/apache/iceberg/blob/apache-iceberg-0.13.1/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L99.
  So some partitions may collide in the map and don't get returned.  
     * Example:  p1=foo, p2=bar collides with p1=foo if PartitionMap is 
instantiated with current spec of {p1}, and other instance is lost.
   * The partition values are just listed in order without any transformation, 
meaning they may in the wrong field in the unified schema.
     * Ref: 
https://github.com/apache/iceberg/blob/a78aa2dbdb98634f26066c457cc1aef93166be9f/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L93.
  
     * Example, p1=foo, p2=bar and then p2=foo, p1=bar still comes as (foo, 
bar), (foo, bar) in wrong order.
   
   This fixes the problem by :
   1.  Instantiating the PartitionMap by Partitioning.partitionType, so all 
types are used in the hashcode generation.
   2. Transforming the PartitionDatas to fit the final schema 
Partioning.partitionType()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to