amogh-jahagirdar commented on issue #9172:
URL: https://github.com/apache/iceberg/issues/9172#issuecomment-1848220358
Thanks for the details, one key thing stands out to me:
```
I also tested with latest version, iceberg-spark-runtime-3.4_2.12-1.4.2.jar
as well, I could see that the second number, part of the file name, is
continuously increasing
00001-3200-11773075-523f-4667-936b-88702fe9860c-00001.parquet, however after
around 200 execution of stream, the file name got reset
00001-3166-11773075-523f-4667-936b-88702fe9860c-00001.parquet and files were
started getting overwritten.
```
This does align with the suspicion in the other issue that task IDs can be
reused across epochs ("after around 200 executions of stream" I'm reading that
as 200 intervals of miccrobatches)
Which I think makes sense (and anyways that's probably intentional in the
DSV2 API to surface the writer). I'll put up a draft for adding the epochID to
the output path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]