[GitHub] [arrow] wjones1 edited a comment on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

GitBox Fri, 26 Jun 2020 19:20:34 -0700


wjones1 edited a comment on pull request #6979:
URL: https://github.com/apache/arrow/pull/6979#issuecomment-650474759



   RE: @jorisvandenbossche 
   > Same question as in the other PR: does setting the batch size also 
influence existing methods like `read` or `read_row_group` ? Should we add that 
keyword there as well?
   
   My one hesitation to adding them is that it's not clear to me what the 
effect would be on the execution. For `iter_batches()` the effect of 
`batch_size` is quite obvious, but I'm not sure about these other methods. 
   
   After quick search of the Apache Arrow docs the only explanation I saw on 
the batch size parameter was this:
   
   >  [The maximum row count for scanned record batches. If scanned record 
batches are overflowing memory then this method can be called to reduce their 
size.](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html?highlight=batch_size)
   
   If I can find a good explanation for it or if you have one, I'd be happy to 
add the `batch_size` parameter to the `read()` and `read_row_group()` methods 
and include that explanation in the docstring.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wjones1 edited a comment on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

Reply via email to