How did you run the Spark command? Maybe the memory setting didn't actually apply? How much memory does the web ui say is available?
BTW - I don't think any JVM can actually handle 700G heap ... (maybe Zing). On Thu, Mar 12, 2015 at 4:09 PM, Tom Hubregtsen <thubregt...@gmail.com> wrote: > Hi all, > > I'm running the teraSort benchmark with a relative small input set: 5GB. > During profiling, I can see I am using a total of 68GB. I've got a terabyte > of memory in my system, and set > spark.executor.memory 900g > spark.driver.memory 900g > I use the default for > spark.shuffle.memoryFraction > spark.storage.memoryFraction > I believe that I now have 0.2*900=180GB for shuffle and 0.6*900=540GB for > storage. > > I noticed a lot of variation in runtime (under the same load), and tracked > this down to this function in > core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala > private def spillToPartitionFiles(collection: > SizeTrackingPairCollection[(Int, K), C]): Unit = { > spillToPartitionFiles(collection.iterator) > } > In a slow run, it would loop through this function 12000 times, in a fast > run only 700 times, even though the settings in both runs are the same and > there are no other users on the system. When I look at the function calling > this (insertAll, also in ExternalSorter), I see that spillToPartitionFiles > is only called 700 times in both fast and slow runs, meaning that the > function recursively calls itself very often. Because of the function name, > I assume the system is spilling to disk. As I have sufficient memory, I > assume that I forgot to set a certain memory setting. Anybody any idea > which > other setting I have to set, in order to not spill data in this scenario? > > Thanks, > > Tom > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Spilling-when-not-expected-tp11017.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >