dmgcodevil opened a new issue #2375:
URL: https://github.com/apache/iceberg/issues/2375
I have a partition spec that looks like that:
```
PartitionSpec.builderFor(Schemas.enrichedTicks)
.day(FRONTDOOR_TIMESTAMP)
.bucket(SECURITY_ID, 10)
.build()
```
I used `table.updateSpec()`
```
table.updateSpec()
.addField(bucket(FieldNames.SECURITY_ID, 1000)).commit()
```
I thought it will just update the existing partition but instead, it created
another partition:
```json
"partition-spec" : [ {
"name" : "frontdoor_timestamp_day",
"transform" : "day",
"source-id" : 19,
"field-id" : 1000
}, {
"name" : "security_id_bucket",
"transform" : "bucket[10]",
"source-id" : 1,
"field-id" : 1001
},
{
"name" : "security_id_bucket_1000",
"transform" : "bucket[1000]",
"source-id" : 1,
"field-id" : 1002
}
]
```
which resulted in the following structure in S3:
```
security_id_bucket=0
---------------------/security_id_bucket_1000=0
---------------------/security_id_bucket_1000=1
---------------------/security_id_bucket_1000=...
---------------------/security_id_bucket_1000=1000
security_id_bucket=1
---------------------/security_id_bucket_1000=0
---------------------/security_id_bucket_1000=1
---------------------/security_id_bucket_1000=...
---------------------/security_id_bucket_1000=1000
....
```
Actually, it should have failed b/c I used the same name for the partition,
at least I thought so, but bucket(FieldNames.SECURITY_ID, 1000) appends
`_bucket_n` to the source name, in our case it's `security_id`, however
`PartitionSpec.builderFor(...).bucket(SECURITY_ID, 10)` doesn't append the
buckets num.
Then I tried the following:
```scala
table.updateSpec()
.removeField("security_id_bucket")
.addField("security_id_bucket", bucket(FieldNames.SECURITY_ID,
1000)).commit()
```
Got this error:
```
java.lang.IllegalArgumentException: Cannot use partition name more than
once: security_id_bucket
```
Tried the following code:
```
table.updateSpec()
.removeField("security_id_bucket")
.addField(bucket(FieldNames.SECURITY_ID, 1000)).commit()
```
Got the following schema:
```
{
"spec-id" : 1,
"fields" : [ {
"name" : "frontdoor_timestamp_day",
"transform" : "day",
"source-id" : 19,
"field-id" : 1000
}, {
"name" : "security_id_bucket",
"transform" : "void",
"source-id" : 1,
"field-id" : 1001
}, {
"name" : "security_id_bucket_1000",
"transform" : "bucket[1000]",
"source-id" : 1,
"field-id" : 1002
} ]
}
```
When I tried to query presto:
I got this: `Query 20210324_052724_00036_giqjj failed: Unsupported partition
transform: 1001: security_id_bucket: void(1)`
Also the structure in BCS was like:
```
security_id_bucket=null/
---------------------/security_id_bucket_1000=0
---------------------/security_id_bucket_1000=1
```
And at this point, I gave up and modified `xxx.metadata.json` manually:
```json
"partition-spec" : [ {
"name" : "frontdoor_timestamp_day",
"transform" : "day",
"source-id" : 19,
"field-id" : 1000
}, {
"name" : "security_id_bucket",
"transform" : "bucket[1000]",
"source-id" : 1,
"field-id" : 1001
} ],
"default-spec-id" : 1,
"partition-specs" : [ {
"spec-id" : 0,
"fields" : [ {
"name" : "frontdoor_timestamp_day",
"transform" : "day",
"source-id" : 19,
"field-id" : 1000
}, {
"name" : "security_id_bucket",
"transform" : "bucket[10]",
"source-id" : 1,
"field-id" : 1001
} ]
}, {
"spec-id" : 1,
"fields" : [ {
"name" : "frontdoor_timestamp_day",
"transform" : "day",
"source-id" : 19,
"field-id" : 1000
}, {
"name" : "security_id_bucket",
"transform" : "bucket[1000]",
"source-id" : 1,
"field-id" : 1001
} ]
} ]
```
How to properly change the number of buckets using Iceberg API ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]