java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2

2014-09-10 Thread Jeffrey Picard
Hey guys, After rebuilding from the master branch this morning, I’ve started to see these errors that I’ve never gotten before while running connected components. Anyone seen this before? 14/09/10 20:38:53 INFO collection.ExternalSorter: Thread 87 spilling in-memory batch of 1020 MB to disk

java.nio.channels.CancelledKeyException in Graphx Connected Components

2014-08-18 Thread Jeffrey Picard
Hey all, I’m trying to run connected components in graphx on about 400GB of data on 50 m3.xlarge nodes on emr. I keep getting java.nio.channels.CancelledKeyException when it gets to mapPartitions at VertexRDD.scala:347”. I haven’t been able to find much about this online, and nothing that

GraphX Connected Components

2014-07-29 Thread Jeffrey Picard
files (stored on s3) and it finishes in about 12 minutes, but with all the data I’ve let it run up to 4 hours and it still doesn’t complete. Does anyone have ideas for approaches to trouble shooting this, spark parameters that might need to be tuned, etc? Best Regards, Jeffrey Picard