Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20361#discussion_r164685543
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -377,6 +377,12 @@ object SQLConf {
           .booleanConf
           .createWithDefault(true)
     
    +  val PARQUET_VECTORIZED_READER_BATCH_SIZE = 
buildConf("spark.sql.parquet.batchSize")
    --- End diff --
    
    I'd say it's very hard. If we need to satisfy a sizeInBytes limitation, we 
would need to load data record by record, and stop loading if we hit the 
limitation. But for performance reasons, we wanna load the data with batch, 
which needs to know the batch size ahead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to