davidrobbo opened a new issue, #12444:
URL: https://github.com/apache/iceberg/issues/12444
### Apache Iceberg version
1.6.1
### Query engine
Spark
### Please describe the bug 🐞
I'm trying to use Spark's readStream functionality to incrementally read
changes to an Iceberg table on EMR serverless integrated with the AWS Glue
Catalog, and S3 also used for checkpointing.
```
spark.readStream
.format("iceberg")
.load(f"${sourceDatabase}.${sourceTable}")
```
The initial query sets the offset for to the 79th of the entire 130 snapshot
versions, and all subsequent runs do not progress.
I've confirmed the base table I read from only has append snapshots, not
deletes or updates.
It appears as though the query is stuck. I have tried setting a variety of
different options for similar bugs - more so just to hope they somehow assist
with the problem as opposed to expecting that they are needed. This includes
setting `"stream-from-timestamp"` to a value prior to the first snapshot, and
also using:
```
.option("streaming-skip-overwrite-snapshots", "true")
.option("streaming-skip-delete-snapshots", "true")
```
None of which change the behaviour (nor do I expect they should have any
effect given the append only base table without snapshot expiration)
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]