stevenzwu edited a comment on issue #1383: URL: https://github.com/apache/iceberg/issues/1383#issuecomment-707225772
I am mainly talking about in the context of FLIP-27 source where enumerator runs in jobmanager and needs to track the unconsumed splits. I was mainly thinking about using `TableScan.useSnapshot(long snapshotId)`. we can use a snapshot blocking queue (with configurable size) to back pressure the enumerator. @openinx note that this is not keyed state (like after keyBy user_id) etc. 8 GB operator state can be problematic. I vaguely remember RocksDB can't handle a list larger than 1 GB. the bigger the list, the slower it gets. @JingsongLi how would the enumerator/coordinator track which snapshot is planned/enumerated using? maybe I didn't understand how to use `TableScan.appendsBetween`. I was thinking the idea was to run `planFiles` or `planTasks` between last planned snapshot and the latest table snapshot. that is what I was referring earlier as eager discovery/planning of all unconsumed splits. how are we scanning one snapshot at a time? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
