frankliee commented on issue #1488:
URL:
https://github.com/apache/iceberg-python/issues/1488#issuecomment-2581774240
I use strace on the worker process, there are `FUTEX_WAIT_BITSET_PRIVATE`,
and I not sure it was dead locking caused by process forking in pyarrow.
Then I find that using "spawn" process could avoid hanging.
```python
from multiprocessing import Process
from pyiceberg.io.pyarrow import PyArrowFileIO
import multiprocessing as mp
worker_num = 2
def worker(tbl):
tbl.io = PyArrowFileIO(tbl.properties)
arr = tbl.scan().to_arrow()
print(arr)
if __name__ == "__main__":
ctx = mp.get_context("spawn")
from pyiceberg.catalog import load_catalog
catalog = load_catalog("mycatalog")
tbl = catalog.load_table("db.table")
workers = [ctx.Process(target=worker, args=(tbl, )) for worker_id in
range(worker_num)]
[p.start() for p in workers]
[p.join() for p in workers]
```
@kevinjqliu
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]