I am testing the performance of Spark to see how it behaves when the
dataset size exceeds the amount of memory available. I am running
wordcount on a 4-node cluster (Intel Xeon 16 cores (32 threads), 256GB
RAM per node). I limited spark.executor.memory to 64g, so I have 256g
of memory available in the cluster. Wordcount  fails due to connection
errors during saveAsTextFile() when the input size is 1TB. I have
tried experimenting with different timeouts, and akka frame sizes but
the job is still failing. Are there any changes that I should make to
get the job to run successfully?

Here is my most recent config.

SPARK_WORKER_CORES=32
SPARK_WORKER_MEMORY=64g
SPARK_WORKER_INSTANCES=1
SPARK_DAEMON_MEMORY=1g

SPARK_JAVA_OPTS="-Dspark.executor.memory=64g
-Dspark.default.parallelism=128 -Dspark.deploy.spreadOut=true
-Dspark.storage.memoryFraction=0.5
-Dspark.shuffle.consolidateFiles=true -Dspark.akka.frameSize=200
-Dspark.akka.timeout=300
-Dspark.storage.blockManagerSlaveTimeoutMs=300000"


Error logs:
14/03/17 13:07:52 WARN ExternalAppendOnlyMap: Spilling in-memory map
of 584 MB to disk (1 time so far)

14/03/17 13:07:52 WARN ExternalAppendOnlyMap: Spilling in-memory map
of 510 MB to disk (1 time so far)

14/03/17 13:08:03 INFO ConnectionManager: Removing ReceivingConnection
to ConnectionManagerId(node02,56673)

14/03/17 13:08:03 INFO ConnectionManager: Removing SendingConnection
to ConnectionManagerId(node02,56673)

14/03/17 13:08:03 INFO ConnectionManager: Removing SendingConnection
to ConnectionManagerId(node02,56673)

14/03/17 13:08:03 INFO ConnectionManager: Notifying
org.apache.spark.network.ConnectionManager$MessageStatus@2c762242

14/03/17 13:08:03 INFO ConnectionManager: Notifying
org.apache.spark.network.ConnectionManager$MessageStatus@7fc331db

14/03/17 13:08:03 INFO ConnectionManager: Notifying
org.apache.spark.network.ConnectionManager$MessageStatus@220eecfa

14/03/17 13:08:03 INFO ConnectionManager: Notifying
org.apache.spark.network.ConnectionManager$MessageStatus@286cca3

14/03/17 13:08:03 ERROR
BlockFetcherIterator$BasicBlockFetcherIterator: Could not get block(s)
from ConnectionManagerId(node02,56673)

14/03/17 13:08:03 ERROR
BlockFetcherIterator$BasicBlockFetcherIterator: Could not get block(s)
from ConnectionManagerId(node02,56673)

14/03/17 13:08:03 ERROR
BlockFetcherIterator$BasicBlockFetcherIterator: Could not get block(s)
from ConnectionManagerId(node02,56673)

14/03/17 13:08:03 INFO ConnectionManager: Notifying
org.apache.spark.network.ConnectionManager$MessageStatus@2d1d6b0a

14/03/17 13:08:03 ERROR
BlockFetcherIterator$BasicBlockFetcherIterator: Could not get block(s)
from ConnectionManagerId(node02,56673)

Reply via email to