Re: [PR] Spark: support read of partition metadata column when table is over 1k [iceberg]

via GitHub Wed, 03 Jul 2024 11:42:17 -0700


dramaticlly commented on code in PR #10547:
URL: https://github.com/apache/iceberg/pull/10547#discussion_r1664606896



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -366,15 +371,56 @@ public void pruneColumns(StructType requestedSchema) {
 
   private Schema schemaWithMetadataColumns() {
     // metadata columns
-    List<Types.NestedField> fields =
+    List<Types.NestedField> metadataFields =
         metaColumns.stream()
             .distinct()
             .map(name -> MetadataColumns.metadataColumn(table, name))
             .collect(Collectors.toList());
-    Schema meta = new Schema(fields);
+    Schema metadataSchema = calculateMetadataSchema(metadataFields);
 
     // schema or rows returned by readers
-    return TypeUtil.join(schema, meta);
+    return TypeUtil.join(schema, metadataSchema);
+  }
+
+  private Schema calculateMetadataSchema(List<Types.NestedField> 
metaColumnFields) {
+    Optional<Types.NestedField> partitionField =
+        metaColumnFields.stream()
+            .filter(f -> MetadataColumns.PARTITION_COLUMN_ID == f.fieldId())
+            .findFirst();
+
+    // only calculate potential column id collision if partition metadata 
column was requested
+    if (!partitionField.isPresent()) {
+      return new Schema(metaColumnFields);
+    }
+
+    Set<Integer> idsToReassign =

Review Comment:
   yeah I agree, it seem too big of block to put inside map, where within we 
are just assuming partition column is found. I updated to use `.get()` as 
suggested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: support read of partition metadata column when table is over 1k [iceberg]

Reply via email to