[GitHub] spark pull request: [SPARK-11088] [SQL] Merges partition values us...

liancheng Tue, 13 Oct 2015 15:28:56 -0700

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9104#issuecomment-147872104
  
    Micro-benchmark result with TPC-DS (scale-factor 15) `store_sales` table 
shows a ~12% performance gain.
    
    Before:
    
    - Round 0: 8133 ms
    - Round 1: 7799 ms
    - Round 2: 8010 ms
    - Round 3: 8009 ms
    - Round 4: 8223 ms
    - Average: 8034.8 ms
    
    After:
    
    - Round 0: 7401 ms
    - Round 1: 6897 ms
    - Round 2: 6873 ms
    - Round 3: 6935 ms
    - Round 4: 7056 ms
    - Average: 7032.4 ms
    
    Benchmark code (where `ss_sold_date_sk` is an `INT` partitioning column and 
`ss_sold_time_sk` is an `INT` data column):
    
    ```scala
    import com.google.common.base.Stopwatch
    
    def benchmark(runs: Int, warmupRuns: Int = 0)(f: => Unit) {
      val stopwatch = new Stopwatch()
    
      (0 until warmupRuns).foreach { i =>
        f
      }
    
      def run(i: Int) = {
        stopwatch.reset()
        stopwatch.start()
        f
        stopwatch.stop()
        val elapsed = stopwatch.elapsedMillis()
        println(s"Round $i: $elapsed ms")
        elapsed
      }
    
      val total = (0 until runs).map(i => run(i)).sum.toDouble
      println(s"Average: ${total / runs} ms")
    }
    
    val path = "file:///Users/lian/tpcds/sf15/store_sales"
    
    benchmark(5, 5) {
      val df = sqlContext.read.parquet(path).selectExpr("ss_sold_time_sk", 
"ss_sold_date_sk")
      df.queryExecution.toRdd.foreach(row => ())
    }
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11088] [SQL] Merges partition values us...

Reply via email to