[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

cloud-fan Thu, 22 Dec 2016 06:03:46 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16383#discussion_r93627963
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
    @@ -192,14 +192,14 @@ object DatasetBenchmark {
         benchmark2.run()
     
         /*
    -    OpenJDK 64-Bit Server VM 1.8.0_91-b14 on Linux 
3.10.0-327.18.2.el7.x86_64
    -    Intel Xeon E3-12xx v2 (Ivy Bridge)
    +    Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.12.1
    +    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
         aggregate:                               Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
         
------------------------------------------------------------------------------------------------
    -    RDD sum                                       1420 / 1523         70.4 
         14.2       1.0X
    -    DataFrame sum                                   31 /   49       3214.3 
          0.3      45.6X
    -    Dataset sum using Aggregator                  3216 / 3257         31.1 
         32.2       0.4X
    -    Dataset complex Aggregator                    7948 / 8461         12.6 
         79.5       0.2X
    +    RDD sum                                       1913 / 1942         52.3 
         19.1       1.0X
    +    DataFrame sum                                   46 /   61       2157.7 
          0.5      41.3X
    +    Dataset sum using Aggregator                  4656 / 4758         21.5 
         46.6       0.4X
    +    Dataset complex Aggregator                    6636 / 7039         15.1 
         66.4       0.3X
    --- End diff --
    
    hash-based or sort-based only decides how we "group" the records, while 
this PR speed up the "aggregating" part.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

Reply via email to