jorisvandenbossche commented on issue #14229: URL: https://github.com/apache/arrow/issues/14229#issuecomment-1282321577
I tried to reproduce this using a smaller example (but just large enough to not fit in a single ListArray), so I could test this on my laptop with limited memory: ``` n_rows = 12000000 data = [np.zeros(200, dtype='int8')] * n_rows df = pd.DataFrame({'a': data}) table = pa.table(df) >>> table.schema a: list<item: int8> child 0, item: int8 -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 389 >>> table["a"].num_chunks # <--- needed to use 2 chunks to fit all data in a ListType 2 import pyarrow.parquet as pq pq.write_table(table, "test_large_list.parquet") ``` But the above is hanging at the `write_table` command (after first taking up a lot of memory and CPU consumption, at some point it stops doing anything (no significant CPU usage anymore) but the file is also not written (only a 4kB file), and trying to kill the process with Ctrl-C also doesn't work then) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org