Re: [PR] Spark: fix NPE thrown for MAP/LIST columns on DELETE, UPDATE, and MERGE operations [iceberg]

via GitHub Tue, 24 Mar 2026 14:07:29 -0700


szehon-ho commented on code in PR #15726:
URL: https://github.com/apache/iceberg/pull/15726#discussion_r2984269176



##########
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkMetadataColumns.java:
##########
@@ -343,6 +343,88 @@ public void testRowLineageColumnsResolvedInV3OrHigher() {
     }
   }
 
+  @TestTemplate
+  public void testPartitionMetadataColumnWithMapColumn() throws IOException {
+    assumeThat(fileFormat).isEqualTo(FileFormat.PARQUET);
+    assumeThat(formatVersion).isGreaterThanOrEqualTo(2);
+
+    Schema mapSchema =
+        new Schema(
+            Types.NestedField.required(1, "id", Types.LongType.get()),
+            Types.NestedField.required(2, "ts", Types.LongType.get()),
+            Types.NestedField.optional(
+                3,
+                "tags",
+                Types.MapType.ofOptional(4, 5, Types.StringType.get(), 
Types.StringType.get())));
+    PartitionSpec bucketSpec = 
PartitionSpec.builderFor(mapSchema).bucket("id", 1).build();
+
+    Map<String, String> properties = Maps.newHashMap();
+    properties.put(FORMAT_VERSION, String.valueOf(formatVersion));
+    properties.put(DEFAULT_FILE_FORMAT, FileFormat.PARQUET.name());
+    properties.put(PARQUET_VECTORIZATION_ENABLED, String.valueOf(vectorized));
+    // merge-on-read: DELETE writes position delete files instead of rewriting 
data files.
+    // This routes through SupportsDelta which adds _partition to the scan 
projection.
+    properties.put("write.delete.mode", "merge-on-read");
+
+    String mapTableName = "test_map_partition_collision";
+    TestTables.create(
+        Files.createTempDirectory(temp, "junit").toFile(),
+        mapTableName,
+        mapSchema,
+        bucketSpec,
+        properties);
+
+    // Both rows in a single INSERT so they land in the same Parquet file.
+    // With both rows sharing a file, Spark uses merge-on-read which adds 
_partition to the scan
+    // projection.
+    sql(
+        "INSERT INTO TABLE %s VALUES (1, 1000, map('env', 'prod')), (2, 
9999999999999999, map('env', 'dev'))",
+        mapTableName);
+
+    sql("DELETE FROM %s WHERE ts < 9999999999999999", mapTableName);
+    assertThat(sql("SELECT id FROM %s", mapTableName)).hasSize(1);

Review Comment:
   sorry @antonlin1 to give more trouble, do you think we can select _partition 
column as well in the test?  the test file is 'sparkMetadataColumns' and the 
test name is testPartitionMetadataColumn, but doesnt actually use metadata 
columns



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: fix NPE thrown for MAP/LIST columns on DELETE, UPDATE, and MERGE operations [iceberg]

Reply via email to