[GitHub] spark issue #15835: [SPARK-17059][SQL] Allow FileFormat to specify partition...

pwoody Fri, 11 Nov 2016 06:29:08 -0800

Github user pwoody commented on the issue:

    https://github.com/apache/spark/pull/15835
  
    I've pushed up the ability to configure this feature being enabled as well. 
Here is a benchmark when writing out 200 files with this code:
    
    ```
    withSQLConf(ParquetOutputFormat.ENABLE_JOB_SUMMARY -> "true") {
          withTempPath { path =>
            spark.range(0, numPartitions, 1, numPartitions)
              .write.parquet(path.getCanonicalPath)
            val benchmark = new Benchmark("Parquet partition pruning 
benchmark", numPartitions)
    
            benchmark.addCase("Parquet partition pruning enabled") { iter =>
              spark.read.parquet(path.getCanonicalPath).filter("id = 
0").collect()
            }
    
            benchmark.addCase("Parquet partition pruning disabled") { iter =>
              withSQLConf(SQLConf.PARQUET_PARTITION_PRUNING_ENABLED.key -> 
"false") {
                spark.read.parquet(path.getCanonicalPath).filter("id = 
0").collect()
              }
            }
    
            benchmark.run()
          }
        }
    ```
    
    
    ```
    Running benchmark: Parquet partition pruning benchmark
      Running case: Parquet partition pruning enabled
      Stopped after 12 iterations, 2049 ms
      Running case: Parquet partition pruning disabled
      Stopped after 5 iterations, 2177 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26 on Mac OS X 10.10.5
    Intel(R) Core(TM) i7-3635QM CPU @ 2.40GHz
    
    Parquet partition pruning benchmark:     Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet partition pruning enabled              145 /  171          0.0      
723119.3       1.0X
    Parquet partition pruning disabled             414 /  436          0.0     
2070279.4       0.3X
    
    
    Process finished with exit code 0
    
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15835: [SPARK-17059][SQL] Allow FileFormat to specify partition...

Reply via email to