Github user ConeyLiu commented on the issue:

    https://github.com/apache/spark/pull/19317
  
    Test case:
    ```scala
    test("performance of aggregateByKeyLocally ") {
        val random = new Random(1)
    
        val pairs = sc.parallelize(0 until 10000000, 20)
          .map(p => (random.nextInt(100), p))
          .persist(StorageLevel.MEMORY_ONLY)
    
        pairs.count()
    
        val start = System.currentTimeMillis()
    //    val jHashMap = pairs.aggregateByKeyLocallyWithJHashMap(new 
HashSet[Int]())(_ += _, _ ++= _).toArray
        val openHashMap = pairs.aggregateByKeyLocally(new HashSet[Int]())(_ += 
_, _ ++= _).toArray
        println(System.currentTimeMillis() - start)
      }
    ```
    
    Test result:
    | map| 1| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | avg |
    | ------| ------ | ------ |------| ------ | ------ |------| ------ | ------ 
|------| ------ | ------ |
    | JHashMap | 2921 | 2920 | 2843 | 2950 | 2898 | 3316 | 2770 | 2994 | 3016 | 
3005 | 2963.3 |
    | OpenHashMap | 3029 | 2884 | 3064 | 3023 | 3108 | 3194 | 3003 | 2961 | 
3115 | 3023 | 3040.4 |
    
    Looks almost the same performance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to