I found that the RecordBatchReader reads fewer rows at a time than each 
row_group contains, meaning that a row_group needs to be read twice by 
RecordBatchReader. So what is the default batch size for 
RecordBatchReader? 


Also, any good advice if I have to follow the row_group? I have a lot of 
parquet files stored on S3, and if I convert scanner to BatchRecordReader, I 
just loop ReadNext(), and if I want to read row_group, I find, I have to call 
`auto Fragments dataset->GetFragments()`,then iterate through fragments and 
call SplitByRowGroups() to split each fragment again, The scanner is then 
constructed for each fragment divided and the scanner's ToTable() is called to 
read the data. 


Finally, is there a performance difference between ToTable() and ReadNext()?

Reply via email to