amogh-jahagirdar commented on issue #10008:
URL: https://github.com/apache/iceberg/issues/10008#issuecomment-2014209251
I looked into this a bit and I think I know the problem. Here's a sample
test that can be added to `TestAddFilesProcedure` to repro
```
@TestTemplate
public void addFilesPartitionEvolved() {
createIcebergTable(
"p1 int, p2 int, data int not null", "PARTITIONED BY (p1)");
sql("ALTER TABLE %s ADD PARTITION FIELD p2", tableName);
String createParquet =
"CREATE TABLE %s (p1 int, p2 int, data int) USING %s "
+ "PARTITIONED BY (p1, p2) LOCATION '%s'";
sql(createParquet, sourceTableName, "parquet",
fileTableDir.getAbsolutePath());
sql("INSERT INTO %s PARTITION (p1=1, p2=10) VALUES (100)",
sourceTableName);
List<Object[]> result =
sql(
"CALL %s.system.add_files('%s', '%s')",
catalogName, tableName, sourceTableName);
sql("SELECT * FROM %s", tableName);
}
```
When we import the partitions, we derive an Icebeg partition spec from the
hive style partitioning here
https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java#L430.
This new partition spec will have a spec ID of 0 (the same spec ID as when
you created the Iceberg table).
This is the spec that gets used when writing the manifests here
https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java#L350
But in the target Iceberg table, the spec ID with (p1, p2) is actually 1.
I'll need to think more about what the right solution is, but on the surface
it seems like the right thing to do here is to
1.) Derive the partition spec from the source table partitioning.
2.) See if that same schema exists in the target table
3.) If so build a copy of the derived partition spec but with the updated
spec ID of the target table.
But that seems too specific of a fix for this. I'm also not sure what the
behavior of the procedure is if the partition spec on the target is completely
different.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]