[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

bersprockets Sun, 28 Oct 2018 16:36:01 -0700

Github user bersprockets commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22865#discussion_r228771361
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -462,7 +462,7 @@ object SQLConf {
       val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
         .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
           "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
    -      "is enabled.")
    +      "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
    --- End diff --
    
    I see, because of this check:
    
https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L338
    So when the data contains a Map column, for example., the vectorized reader 
is not used, even though spark.sql.parquet.enableVectorizedReader=true.
    
    How about something like:
    
    "If true, enables Parquet's native record-level filtering using the pushed 
down filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled *and the vectorized reader is not 
used. You can ensure the vectorized reader is not used by setting 
'spark.sql.parquet.enableVectorizedReader' to false*"




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

Reply via email to