[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

DaimonPl Fri, 15 Jun 2018 04:29:52 -0700

Github user DaimonPl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21320#discussion_r195704995
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1288,8 +1288,18 @@ object SQLConf {
             "issues. Turn on this config to insert a local sort before 
actually doing repartition " +
             "to generate consistent repartition results. The performance of 
repartition() may go " +
             "down since we insert extra local sort before it.")
    +        .booleanConf
    +        .createWithDefault(true)
    +
    +  val NESTED_SCHEMA_PRUNING_ENABLED =
    +    buildConf("spark.sql.nestedSchemaPruning.enabled")
    +      .internal()
    +      .doc("Prune nested fields from a logical relation's output which are 
unnecessary in " +
    +        "satisfying a query. This optimization allows columnar file format 
readers to avoid " +
    +        "reading unnecessary nested column data. Currently Parquet is the 
only data source that " +
    +        "implements this optimization.")
           .booleanConf
    -      .createWithDefault(true)
    +      .createWithDefault(false)
    --- End diff --
    
    how about enabling it as default? there should be enough time to find any 
unexpected problems with 2.4.0
    
    + nested column pruning would be enabled during for all other automatic 
tests



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

Reply via email to