Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20361#discussion_r164685543 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -377,6 +377,12 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_VECTORIZED_READER_BATCH_SIZE = buildConf("spark.sql.parquet.batchSize") --- End diff -- I'd say it's very hard. If we need to satisfy a sizeInBytes limitation, we would need to load data record by record, and stop loading if we hit the limitation. But for performance reasons, we wanna load the data with batch, which needs to know the batch size ahead.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org