[GitHub] [arrow] abekfenn edited a comment on issue #11469: read_table/read_feather in chunks?

GitBox Tue, 19 Oct 2021 15:57:37 -0700


abekfenn edited a comment on issue #11469:
URL: https://github.com/apache/arrow/issues/11469#issuecomment-947165750



   Thank you for the example @westonpace. I've given this a shot below and the 
example you provided was exactly what I needed!
   
   
   ```
   import pyarrow.ipc as ipc
   import pyarrow.feather as feather
   import pyarrow as pa
   import pandas as pd
   import numpy as np
   
   df = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), 
columns=list('ABCDF'))
   print('original df')
   print(df)
   feather.write_feather(df, 'test_df.ftr', compression='zstd', 
compression_level=None,
                     chunksize=10, version=2)
   
   def read_feather_in_chunks(filepath):
       with ipc.RecordBatchFileReader(filepath) as reader:
           for batch_index in range(reader.num_record_batches):
               batch = reader.get_batch(batch_index)
               print(f'Read in batch {batch_index} which had {batch.num_rows} 
rows')
               data_df = batch.to_pandas(use_threads=True, 
timestamp_as_object=True, )
               yield data_df
   
   for batch in read_feather_in_chunks('test_df.ftr'):
       print(batch)
   ```
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] abekfenn edited a comment on issue #11469: read_table/read_feather in chunks?

Reply via email to