Abacn commented on PR #37750: URL: https://github.com/apache/beam/pull/37750#issuecomment-3995427363
Thanks, this change allows user to provide an offset that was obtained elsewhere. It now supports to read from an offset, yet not completely resolve #28248, as restarting the pipeline that have set the offset would still start from the same offset. I had a (yet materialized) idea of restartable offset. Basically we can provide an "OffsetRetainer" interface to 1. read offset on pipeline startup; 2. write offset on checkpointing, and expose as a configuration for the IO. And we can then provide a built-in FileSystem based OffsetRetainer for user to use. When pipeline is running, it continuously write committed offset to external location (gcs, kafka, etc) of choice; when pipeline starts or restarts, it tries to read from the same location at first. If you prefer to go for current change for now, let me take a closer look for this PR. Or we can work on a more concrete solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
