nreich opened a new issue, #4565:
URL: https://github.com/apache/iceberg/issues/4565

   Iceberg version: 0.13.1
   Spark version: 3.1.2
   catalog type: dynamodb
   file io: S3FileIO
   When trying to run the spark expire snapshot actions I receive the following 
exception:
   ```
   java.lang.IndexOutOfBoundsException: index (1) must be less than size (1)
        at 
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1343)
        at 
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1325)
        at 
org.apache.iceberg.relocated.com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:43)
        at 
org.apache.iceberg.ManifestsTable.partitionSummariesToRows(ManifestsTable.java:115)
        at 
org.apache.iceberg.ManifestsTable.manifestFileToRow(ManifestsTable.java:98)
        at 
org.apache.iceberg.AllManifestsTable$ManifestListReadTask.lambda$rows$0(AllManifestsTable.java:176)
        ...
   ```
   The acute cause seems to be that the PartitionSpec passed into 
`ManifestsTable.partitionSummariesToRows` has only a single item, but the 
List<PartitionFieldSummary> passed in has two entries (instances of 
GenericPartitionFieldSummary) appear to be identical.
   
   I have found that there are a number of manifest files which hit this 
exception. I believe the issue may have to do with having added a new partition 
field and then later removing the partition (though retaining the field that 
was partitioned).
   
   This is the current structure of the table:
   ```
   table {
     1: id: optional long
     2: internal_id: optional long
     3: fields: optional string
     4: created_at: optional timestamptz
     5: updated_at: optional timestamptz
     6: primary_key_id: required long
   }
   ```
   
   spec:
   ```
   [
     1000: primary_key_id: identity(6)
   ]
   ```
   
   In the snapshot(s) that hit this issue, however, I see in a manifest file 
that causes the exception:
   ```
   {
        "id": 102,
        "name": "partition",
        "required": true,
        "type": {
                "type": "struct",
                "fields": [{
                        "id": 1000,
                        "name": "primary_key_id",
                        "required": false,
                        "type": "long"
                }, {
                        "id": 1001,
                        "name": "id_trunc_5000000",
                        "required": false,
                        "type": "long"
                }]
        }
   }
   ```
   This matches past actions, where I had added the 1001 id partition, which I 
have since removed from the table (this was done by calling on the Table 
object: `table.updateSpec().removeField("id_trunc_5000000").commit()`. I am 
still able to read and write from the current snapshot, as well as compact it. 
I have not yet tried to interact directly with the snapshot(s) causing this 
exception.
   
   Is removing the partition not allowed or not allowed in the way I conducted 
it? Having ended up in this state, is there a safe way for me to resolve this 
issue so as to fix the state of this table? Any additional information that 
would be helpful for pinpointing this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to