Spark intermittently fails to recover from a worker failure (in standalone mode)

2015-09-08 Thread Cheuk Lam
We have run into a problem where some Spark job is aborted after a worker is killed in a 2-worker standalone cluster. The problem is intermittent, but we can consistently reproduce it. The problem only appears to happen when we kill a worker. It doesn't happen when we kill an executor directly.

Re: in GraphX,program with Pregel runs slower and slower after several iterations

2015-06-03 Thread Cheuk Lam
I think you're exactly right. I once had 100 iterations in a single Pregel call, and got into the lineage problem right there. I had to modify the Pregel function and checkpoint both the graph and the newVerts RDD there to cut off the lineage. If you draw out the dependency graph among the g,

Re: in GraphX,program with Pregel runs slower and slower after several iterations

2015-06-02 Thread Cheuk Lam
I've been encountering something similar too. I suspected that was related to the lineage growth of the graph/RDDs. So I checkpoint the graph every 60 Pregel rounds, after doing which my program doesn't slow down any more (except that every checkpoint takes some extra time). -- View this

How to delete graph checkpoints?

2015-01-21 Thread Cheuk Lam
in VertexRDD.scala: private[graphx] def partitionsRDD: RDD[ShippableVertexPartition[VD]] We would really appreciate it if anyone could shed some light on solving this problem, or anyone who has come across a similar problem could share a solution or workaround. Thank you, Cheuk Lam -- View

Re: Spark in cluster and errors

2014-10-17 Thread Cheuk Lam
I wasn't the original person who posted the question, but this helped me! :) Thank you. I had a similar issue today when I tried to connect using the IP address (spark://master_ip:7077). I got it resolved by replacing it with the URL displayed in the Spark web console - in my case it is

Performance with activeSetOpt in GraphImpl.mapReduceTriplets()

2014-10-09 Thread Cheuk Lam
When using the activeSetOpt in GraphImpl.mapReduceTriplets(), can we expect a performance that is only proportional to the size of the active set and independent of the size of the original data set? Or there is still a fixed overhead that depends on the size of the original data set? Thank you!

Pregel messages serialized in local machine?

2014-09-25 Thread Cheuk Lam
This is a question on using the Pregel function in GraphX. Does a message get serialized and then de-serialized in the scenario where both the source and the destination vertices are in the same compute node/machine? Thank you! -- View this message in context: