Ayush Saxena created HIVE-29190:
-----------------------------------

             Summary: Iceberg: [V3] Fix working of Delete/Update with DV's
                 Key: HIVE-29190
                 URL: https://issues.apache.org/jira/browse/HIVE-29190
             Project: Hive
          Issue Type: Bug
            Reporter: Ayush Saxena


Currently if we try to delete or update on a V3 table. If the DataFile being 
operated already has a DeleteVector, The subsequent queries fail.
{noformat}
 org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException: Vertex failed, 
vertexName=Map 1, vertexId=vertex_1757524880780_0001_5_00, diagnostics=[Vertex 
vertex_1757524880780_0001_5_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ice01 initializer failed, 
vertex=vertex_1757524880780_0001_5_00 [Map 1], 
org.apache.iceberg.exceptions.ValidationException: Can't index multiple DVs for 
hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102137_22c067b7-d899-4ecf-9832-c167f7d402a6-job_17575248807800_0001-1-00001.orc:
 
DV{location=hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102141_dc808c4f-746c-4a38-b2dc-ba6b8d719f44-job_17575248807800_0001-2-00001-pos-deletes.orc,
 offset=4, length=42, 
referencedDataFile=hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102137_22c067b7-d899-4ecf-9832-c167f7d402a6-job_17575248807800_0001-1-00001.orc}
 and 
DV{location=hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102142_300e55da-c854-415e-854b-6a0b9ac641da-job_17575248807800_0001-3-00001-pos-deletes.orc,
 offset=4, length=44, 
referencedDataFile=hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102137_22c067b7-d899-4ecf-9832-c167f7d402a6-job_17575248807800_0001-1-00001.orc}
        at 
org.apache.iceberg.DeleteFileIndex$Builder.add(DeleteFileIndex.java:509)
        at 
org.apache.iceberg.DeleteFileIndex$Builder.build(DeleteFileIndex.java:481)
        at org.apache.iceberg.ManifestGroup.plan(ManifestGroup.java:185)
        at org.apache.iceberg.ManifestGroup.planFiles(ManifestGroup.java:172)
        at org.apache.iceberg.DataTableScan.doPlanFiles(DataTableScan.java:90)
        at org.apache.iceberg.SnapshotScan.planFiles(SnapshotScan.java:139)
        at org.apache.iceberg.BaseTableScan.planTasks(BaseTableScan.java:44)
        at org.apache.iceberg.DataTableScan.planTasks(DataTableScan.java:26)
        at 
org.apache.iceberg.mr.mapreduce.IcebergInputFormat.generateInputSplits(IcebergInputFormat.java:230)
        at 
org.apache.iceberg.mr.mapreduce.IcebergInputFormat.planInputSplits(IcebergInputFormat.java:199)
        at 
org.apache.iceberg.mr.mapreduce.IcebergInputFormat.getSplits(IcebergInputFormat.java:172)
        at 
org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getSplits(MapredIcebergInputFormat.java:69)
        at 
org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:167)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:585)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:880)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:363){noformat}
The reason being: Iceberg V3 only allows one DV per DataFile. 

Related Iceberg code:

https://github.com/apache/iceberg/blob/720ef99720a1c59e4670db983c951243dffc4f3e/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java#L507-L509



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to