Re: flatMap output on disk / flatMap memory overhead

2015-08-01 Thread Puneet Kapoor
-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead-tp23098p23108.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: flatMap output on disk / flatMap memory overhead

2015-06-09 Thread Imran Rashid
was tried using reduceByKey, without success. I also tried this: rdd.persist(MEMORY_AND_DISK).flatMap(...).reduceByKey . However, I got the same error as before, namely the error described here: http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead-td23098

Re: flatMap output on disk / flatMap memory overhead

2015-06-02 Thread Akhil Das
(spark.akka.frameSize, 1024) conf.set(spark.executor.memory, 125g) conf.set(spark.shuffle.file.buffer.kb, 1000) conf.set(spark.shuffle.consolidateFiles, true) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory

Re: flatMap output on disk / flatMap memory overhead

2015-06-02 Thread octavian.ganea
I was tried using reduceByKey, without success. I also tried this: rdd.persist(MEMORY_AND_DISK).flatMap(...).reduceByKey . However, I got the same error as before, namely the error described here: http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead

Re: flatMap output on disk / flatMap memory overhead

2015-06-02 Thread Richard Marscher
, namely the error described here: http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead-td23098.html My task is to count the frequencies of pairs of words that occur in a set of documents at least 5 times. I know that this final output is sparse

flatMap output on disk / flatMap memory overhead

2015-06-01 Thread octavian.ganea
) conf.set(spark.shuffle.file.buffer.kb, 1000) conf.set(spark.shuffle.consolidateFiles, true) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead-tp23098.html Sent from the Apache Spark User List mailing list archive