nreich opened a new issue, #4565:
URL: https://github.com/apache/iceberg/issues/4565
Iceberg version: 0.13.1
Spark version: 3.1.2
catalog type: dynamodb
file io: S3FileIO
When trying to run the spark expire snapshot actions I receive the following
exception:
```
java.lang.IndexOutOfBoundsException: index (1) must be less than size (1)
at
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1343)
at
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1325)
at
org.apache.iceberg.relocated.com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:43)
at
org.apache.iceberg.ManifestsTable.partitionSummariesToRows(ManifestsTable.java:115)
at
org.apache.iceberg.ManifestsTable.manifestFileToRow(ManifestsTable.java:98)
at
org.apache.iceberg.AllManifestsTable$ManifestListReadTask.lambda$rows$0(AllManifestsTable.java:176)
...
```
The acute cause seems to be that the PartitionSpec passed into
`ManifestsTable.partitionSummariesToRows` has only a single item, but the
List<PartitionFieldSummary> passed in has two entries (instances of
GenericPartitionFieldSummary) appear to be identical.
I have found that there are a number of manifest files which hit this
exception. I believe the issue may have to do with having added a new partition
field and then later removing the partition (though retaining the field that
was partitioned).
This is the current structure of the table:
```
table {
1: id: optional long
2: internal_id: optional long
3: fields: optional string
4: created_at: optional timestamptz
5: updated_at: optional timestamptz
6: primary_key_id: required long
}
```
spec:
```
[
1000: primary_key_id: identity(6)
]
```
In the snapshot(s) that hit this issue, however, I see in a manifest file
that causes the exception:
```
{
"id": 102,
"name": "partition",
"required": true,
"type": {
"type": "struct",
"fields": [{
"id": 1000,
"name": "primary_key_id",
"required": false,
"type": "long"
}, {
"id": 1001,
"name": "id_trunc_5000000",
"required": false,
"type": "long"
}]
}
}
```
This matches past actions, where I had added the 1001 id partition, which I
have since removed from the table (this was done by calling on the Table
object: `table.updateSpec().removeField("id_trunc_5000000").commit()`. I am
still able to read and write from the current snapshot, as well as compact it.
I have not yet tried to interact directly with the snapshot(s) causing this
exception.
Is removing the partition not allowed or not allowed in the way I conducted
it? Having ended up in this state, is there a safe way for me to resolve this
issue so as to fix the state of this table? Any additional information that
would be helpful for pinpointing this issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]