Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/612#discussion_r9925630
--- Diff:
core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala
---
@@ -83,6 +83,28 @@ class ExternalAppendOnlyMapSuite extends FunSuite with
BeforeAndAfter with Local
(3, Set[Int](30))))
}
+ test("insert with collision on hashCode Int.MaxValue") {
+ val conf = new SparkConf(false)
+ sc = new SparkContext("local", "test", conf)
+
+ val map = new ExternalAppendOnlyMap[Int, Int,
ArrayBuffer[Int]](createCombiner,
+ mergeValue, mergeCombiners)
+
+ map.insert(Int.MaxValue, 10)
+ map.insert(2, 20)
+ map.insert(3, 30)
+ map.insert(Int.MaxValue, 100)
+ map.insert(2, 200)
+ map.insert(Int.MaxValue, 1000)
+ val it = map.iterator
+ assert(it.hasNext)
+ val result = it.toSet[(Int, ArrayBuffer[Int])].map(kv => (kv._1,
kv._2.toSet))
+ assert(result == Set[(Int, Set[Int])](
+ (Int.MaxValue, Set[Int](10, 100, 1000)),
+ (2, Set[Int](20, 200)),
+ (3, Set[Int](30))))
--- End diff --
Even after setting the memory parameters, we still need to insert a lot
into the map to induce spilling. I have been able to trigger the exception that
you found with the following:
(1 until 100000).foreach { i => map.insert(i, i) }
map.insert(Int.MaxValue, Int.MaxValue)
val it = map.iterator
while (it.hasNext) {
it.next()
}
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
[email protected] or file a JIRA ticket with INFRA.
---