Hey guys,
After rebuilding from the master branch this morning, I’ve started to see these
errors that I’ve never gotten before while running connected components. Anyone
seen this before?
14/09/10 20:38:53 INFO collection.ExternalSorter: Thread 87 spilling in-memory
batch of 1020 MB to disk
Hey all,
I’m trying to run connected components in graphx on about 400GB of data on 50
m3.xlarge nodes on emr. I keep getting java.nio.channels.CancelledKeyException
when it gets to mapPartitions at VertexRDD.scala:347”. I haven’t been able to
find much about this online, and nothing that
files (stored on s3) and it finishes in
about 12 minutes, but with all the data I’ve let it run up to 4 hours and it
still doesn’t complete. Does anyone have ideas for approaches to trouble
shooting this, spark parameters that might need to be tuned, etc?
Best Regards,
Jeffrey Picard