[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4812: Spark 3.2: Support reading position deletes

GitBox Thu, 14 Jul 2022 14:07:36 -0700


aokolnychyi commented on code in PR #4812:
URL: https://github.com/apache/iceberg/pull/4812#discussion_r921571789



##########
core/src/main/java/org/apache/iceberg/MetadataColumns.java:
##########
@@ -53,6 +53,8 @@ private MetadataColumns() {
   public static final String DELETE_FILE_ROW_FIELD_NAME = "row";
   public static final int DELETE_FILE_ROW_FIELD_ID = Integer.MAX_VALUE - 103;
   public static final String DELETE_FILE_ROW_DOC = "Deleted row values";
+  public static final int POSITION_DELETE_TABLE_PARTITION_FIELD_ID = 
Integer.MAX_VALUE - 104;

Review Comment:
    It seems we have a static column in the metadata table that we plan to 
populate via the mechanism for metadata columns. It looks a little bit 
suspicious to me.
   
   I feel we should pick one of these options:
   - Have only `path`, `pos`, `row` columns in the table and use `_partition` 
and `_spec_id` metadata columns. That will mean we have to support filter 
pushdown on metadata columns. It is easy to handle this on the Spark side but 
we will also have to adapt ALL of our binding code to allow binding predicates 
with metadata columns. The last part will be a big change.
   - Make `partition` and `spec_id` columns static and use `DataTask`.
   
   I kind of like using `FileScanTask` for this effort to support vectorized 
reads so the first option seems preferable.
   
   Thoughts, @szehon-ho @RussellSpitzer @rdblue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4812: Spark 3.2: Support reading position deletes

Reply via email to