-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead-tp23098p23108.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
was tried using reduceByKey, without success.
I also tried this: rdd.persist(MEMORY_AND_DISK).flatMap(...).reduceByKey .
However, I got the same error as before, namely the error described here:
http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead-td23098
(spark.akka.frameSize, 1024)
conf.set(spark.executor.memory, 125g)
conf.set(spark.shuffle.file.buffer.kb, 1000)
conf.set(spark.shuffle.consolidateFiles, true)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory
I was tried using reduceByKey, without success.
I also tried this: rdd.persist(MEMORY_AND_DISK).flatMap(...).reduceByKey .
However, I got the same error as before, namely the error described here:
http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead
, namely the error described here:
http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead-td23098.html
My task is to count the frequencies of pairs of words that occur in a set
of
documents at least 5 times. I know that this final output is sparse
)
conf.set(spark.shuffle.file.buffer.kb, 1000)
conf.set(spark.shuffle.consolidateFiles, true)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/flatMap-output-on-disk-flatMap-memory-overhead-tp23098.html
Sent from the Apache Spark User List mailing list archive