I found that the RecordBatchReader reads fewer rows at a time than each row_group contains, meaning that a row_group needs to be read twice by RecordBatchReader. So what is the default batch size for RecordBatchReader?
Also, any good advice if I have to follow the row_group? I have a lot of parquet files stored on S3, and if I convert scanner to BatchRecordReader, I just loop ReadNext(), and if I want to read row_group, I find, I have to call `auto Fragments dataset->GetFragments()`,then iterate through fragments and call SplitByRowGroups() to split each fragment again, The scanner is then constructed for each fragment divided and the scanner's ToTable() is called to read the data. Finally, is there a performance difference between ToTable() and ReadNext()?