lkokhreidze opened a new issue, #10211:
URL: https://github.com/apache/iceberg/issues/10211
### Query engine
Flink
### Question
Hello,
I searched the documentation as much as possible but couldn't find the
decisive answer.
We have a Flink Streaming pipeline that writes to the Iceberg table. The
table has upsert enabled and uses a Copy-On-Write merge strategy. When checking
the latest snapshot, we still see a lot of deleted files being created. The
Flink streaming pipeline commits every minute. My question is, is it still
normal to create delete files even with Copy-on-Write being used?
Table properties:
```
write.upsert.enabled=true,
write.parquet.compression-codec=zstd,
format-version=2,
write.metadata.delete-after-commit.enabled=true,
history.expire.min-snapshots-to-keep=5,
history.expire.max-snapshot-age-ms=3600000,
write.metadata.previous-versions-max=50,
write.parquet.bloom-filter-enabled.column._source_metadata_key=true,
write.target-file-size-bytes=536870912,
write.object-storage.enabled=true,
write.metadata.metrics.column.payload=none,
commit.manifest.min-count-to-merge=50
```
Table schema:
```
'table {
0: _source_metadata_key: required string (id)
1: _source_metadata_split_id: required int
2: _source_metadata_offset: required long
3: _source_metadata_created_at: required timestamptz
4: payload: required string
5: time: optional timestamptz
6: time_month: optional date
}' with partitionSpec '[
1000: submit_time_month: identity(6)
]'
```
Equality field names `[_source_metadata_key, time_month]`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]