jorisvandenbossche commented on issue #38212:
URL: https://github.com/apache/arrow/issues/38212#issuecomment-1759121919
Thanks for the report! Do you see the issue as well with a reproducible
example?
I tried something very simple creating a generator of batches to write, and
then I don't see any memory issue:
```python
import pyarrow as pa
import numpy as np
import pandas as pd
def generate_random_data():
for _ in range(1000):
yield pa.record_batch([np.random.randn(60000) for _ in range(5)],
['a', 'b', 'c', 'd', 'e'])
schema = pa.schema([(name, 'float64') for name in ['a', 'b', 'c', 'd', 'e']])
record_batch_reader = pa.RecordBatchReader.from_batches(schema,
generate_random_data())
with pa.OSFile("/tmp/outfile", mode="wb") as f:
record_batch_writer = pa.ipc.RecordBatchFileWriter(f, schema=schema)
for batch in record_batch_reader:
record_batch_writer.write_batch(batch)
record_batch_writer.close()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]