Re: [I] How to read data in the order in which files are commited? [iceberg]

via GitHub Thu, 12 Oct 2023 20:17:19 -0700


Zhanxiao-Ma commented on issue #8802:
URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1760700741


   > Currently there is no way to order the scan task. The planning side 
specifically makes sure that even the planning could be done by parallel 
threads (reading manifests files parallel)
   > 
   > Sometimes we need to do similar thing in Flink Source, and we ended up 
creating our own comparator for this which compares Iceberg splits (which are a 
wrapper above ScanTasks).
   > 
   > You can do something similar like this in java code with one serious 
caveat: For a big table you might not want/able to keep all of the tasks in 
memory, which is needed for sorting. What we do in flink is limit the number of 
snapshots to read once.
   > 
   > I hope this helps, Peter
   
   
   
   > Currently there is no way to order the scan task. The planning side 
specifically makes sure that even the planning could be done by parallel 
threads (reading manifests files parallel)
   > 
   > Sometimes we need to do similar thing in Flink Source, and we ended up 
creating our own comparator for this which compares Iceberg splits (which are a 
wrapper above ScanTasks).
   > 
   > You can do something similar like this in java code with one serious 
caveat: For a big table you might not want/able to keep all of the tasks in 
memory, which is needed for sorting. What we do in flink is limit the number of 
snapshots to read once.
   > 
   > I hope this helps, Peter
   
   
   
   > Currently there is no way to order the scan task. The planning side 
specifically makes sure that even the planning could be done by parallel 
threads (reading manifests files parallel)
   > 
   > Sometimes we need to do similar thing in Flink Source, and we ended up 
creating our own comparator for this which compares Iceberg splits (which are a 
wrapper above ScanTasks).
   > 
   > You can do something similar like this in java code with one serious 
caveat: For a big table you might not want/able to keep all of the tasks in 
memory, which is needed for sorting. What we do in flink is limit the number of 
snapshots to read once.
   > 
   > I hope this helps, Peter
   
   > 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] How to read data in the order in which files are commited? [iceberg]

Reply via email to