Zhanxiao-Ma commented on issue #8802: URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1760700741
> Currently there is no way to order the scan task. The planning side specifically makes sure that even the planning could be done by parallel threads (reading manifests files parallel) > > Sometimes we need to do similar thing in Flink Source, and we ended up creating our own comparator for this which compares Iceberg splits (which are a wrapper above ScanTasks). > > You can do something similar like this in java code with one serious caveat: For a big table you might not want/able to keep all of the tasks in memory, which is needed for sorting. What we do in flink is limit the number of snapshots to read once. > > I hope this helps, Peter > Currently there is no way to order the scan task. The planning side specifically makes sure that even the planning could be done by parallel threads (reading manifests files parallel) > > Sometimes we need to do similar thing in Flink Source, and we ended up creating our own comparator for this which compares Iceberg splits (which are a wrapper above ScanTasks). > > You can do something similar like this in java code with one serious caveat: For a big table you might not want/able to keep all of the tasks in memory, which is needed for sorting. What we do in flink is limit the number of snapshots to read once. > > I hope this helps, Peter > Currently there is no way to order the scan task. The planning side specifically makes sure that even the planning could be done by parallel threads (reading manifests files parallel) > > Sometimes we need to do similar thing in Flink Source, and we ended up creating our own comparator for this which compares Iceberg splits (which are a wrapper above ScanTasks). > > You can do something similar like this in java code with one serious caveat: For a big table you might not want/able to keep all of the tasks in memory, which is needed for sorting. What we do in flink is limit the number of snapshots to read once. > > I hope this helps, Peter > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
