[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

WeichenXu123 Fri, 22 Sep 2017 18:33:08 -0700

Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19317
  
    And I have to point out that your impl have high risk causing OOM. The 
current impl will auto spill when local hashmap is too large and can take 
advantage of spark auto memory management mechanism which you'd better take a 
look. 
    Another thing is the JHashmap will be slow perf and it is better to use 
`org.apache.spark.util.collection.OpenHashSet`, in the case the hashmap is 
append-only.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

Reply via email to