[GitHub] [arrow] jorisvandenbossche commented on issue #14229: OSError: List index overflow.

GitBox Tue, 18 Oct 2022 05:38:16 -0700


jorisvandenbossche commented on issue #14229:
URL: https://github.com/apache/arrow/issues/14229#issuecomment-1282321577


   I tried to reproduce this using a smaller example (but just large enough to 
not fit in a single ListArray), so I could test this on my laptop with limited 
memory:
   
   ```
   n_rows = 12000000
   data = [np.zeros(200, dtype='int8')] * n_rows
   df = pd.DataFrame({'a': data})
   table = pa.table(df)
   
   >>> table.schema
   a: list<item: int8>
     child 0, item: int8
   -- schema metadata --
   pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 
389
   
   >>> table["a"].num_chunks   # <--- needed to use 2 chunks to fit all data in 
a ListType
   2
   
   import pyarrow.parquet as pq
   pq.write_table(table, "test_large_list.parquet")
   ```
   
   But the above is hanging at the `write_table` command (after first taking up 
a lot of memory and CPU consumption, at some point it stops doing anything (no 
significant CPU usage anymore) but the file is also not written (only a 4kB 
file), and trying to kill the process with Ctrl-C also doesn't work then)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on issue #14229: OSError: List index overflow.

Reply via email to