Re: [I] [Python][R] Memory leak in RecordBatchFileWriter [arrow]

via GitHub Thu, 12 Oct 2023 00:58:50 -0700


jorisvandenbossche commented on issue #38212:
URL: https://github.com/apache/arrow/issues/38212#issuecomment-1759121919


   Thanks for the report! Do you see the issue as well with a reproducible 
example?  
   I tried something very simple creating a generator of batches to write, and 
then I don't see any memory issue:
   
   ```python
   import pyarrow as pa
   import numpy as np
   import pandas as pd
   
   
   def generate_random_data():
       for _ in range(1000):
           yield pa.record_batch([np.random.randn(60000) for _ in range(5)], 
['a', 'b', 'c', 'd', 'e'])
   
   schema = pa.schema([(name, 'float64') for name in ['a', 'b', 'c', 'd', 'e']])
   record_batch_reader = pa.RecordBatchReader.from_batches(schema, 
generate_random_data())
   
   
   with pa.OSFile("/tmp/outfile", mode="wb") as f:
       record_batch_writer = pa.ipc.RecordBatchFileWriter(f, schema=schema)
   
       for batch in record_batch_reader:   
           record_batch_writer.write_batch(batch)
   
       record_batch_writer.close()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python][R] Memory leak in RecordBatchFileWriter [arrow]

Reply via email to