kevinjqliu commented on issue #542:
URL: https://github.com/apache/iceberg-python/issues/542#issuecomment-2016897354

   So to summarize the above. 
   There's a bug with `table.scan` related to Futures execution when a `limit` 
is set. The bug is related to the order of the Futures returned and the shared 
state `row_counts`.
   
   When the executor is used to run multiple Futures, each Future checks the 
shared state `row_counts` first before proceeding. 
   
   The bug is when one Future updates the shared state `row_counts` 
([L1021](https://github.com/apache/iceberg-python/blob/6989b92c2d449beb9fe4817c64f619ea5bfc81dc/pyiceberg/io/pyarrow.py#L1021)).
 Before this specific Future returns and completes 
([L1023](https://github.com/apache/iceberg-python/blob/6989b92c2d449beb9fe4817c64f619ea5bfc81dc/pyiceberg/io/pyarrow.py#L1023)),
 another Future check the shared state `row_counts` 
([L953](https://github.com/apache/iceberg-python/blob/6989b92c2d449beb9fe4817c64f619ea5bfc81dc/pyiceberg/io/pyarrow.py#L953))
 and returns first 
([L954](https://github.com/apache/iceberg-python/blob/6989b92c2d449beb9fe4817c64f619ea5bfc81dc/pyiceberg/io/pyarrow.py#L954)).
   
   This leads to the correct `row_counts` but incorrect `completed_futures` 
since the Future returned is not the only that modified the `row_counts` 
([L1112](https://github.com/apache/iceberg-python/blob/6989b92c2d449beb9fe4817c64f619ea5bfc81dc/pyiceberg/io/pyarrow.py#L1112)).
   
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to