Re: Timed out while stopping the job generator plus subsequent failures
I don't think that's the same issue I was seeing, but you can have a look at https://issues.apache.org/jira/browse/SPARK-4545 for more detail on my issue. On Thu, Mar 12, 2015 at 12:51 AM, Tobias Pfeiffer t...@preferred.jp wrote: Sean, On Wed, Mar 11, 2015 at 7:43 PM, Tobias Pfeiffer t...@preferred.jp wrote: it seems like I am unable to shut down my StreamingContext properly, both in local[n] and yarn-cluster mode. In addition, (only) in yarn-cluster mode, subsequent use of a new StreamingContext will raise an InvalidActorNameException. I was wondering if this is related to your question on spark-dev http://tinyurl.com/q5cd5px Did you get any additional insight into this issue? In my case the processing of the first batch completes, but I don't know if there is anything wrong with the checkpoints? When I look to the corresponding checkpoint directory in HDFS, it doesn't seem like all state RDDs are persisted there, just a subset. Any ideas? Thanks Tobias - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Timed out while stopping the job generator plus subsequent failures
Sean, On Wed, Mar 11, 2015 at 7:43 PM, Tobias Pfeiffer t...@preferred.jp wrote: it seems like I am unable to shut down my StreamingContext properly, both in local[n] and yarn-cluster mode. In addition, (only) in yarn-cluster mode, subsequent use of a new StreamingContext will raise an InvalidActorNameException. I was wondering if this is related to your question on spark-dev http://tinyurl.com/q5cd5px Did you get any additional insight into this issue? In my case the processing of the first batch completes, but I don't know if there is anything wrong with the checkpoints? When I look to the corresponding checkpoint directory in HDFS, it doesn't seem like all state RDDs are persisted there, just a subset. Any ideas? Thanks Tobias
Re: Timed out while stopping the job generator plus subsequent failures
Hi, I discovered what caused my issue when running on YARN and was able to work around it. On Wed, Mar 11, 2015 at 7:43 PM, Tobias Pfeiffer t...@preferred.jp wrote: The processing itself is complete, i.e., the batch currently processed at the time of stop() is finished and no further batches are processed. However, something keeps the streaming context from stopping properly. In local[n] mode, this is not actually a problem (other than I have to wait 20 seconds for shutdown), but in yarn-cluster mode, I get an error akka.actor.InvalidActorNameException: actor name [JobGenerator] is not unique! It seems that not all checkpointed RDDs are cleaned (metadata cleared, checkpoint directories deleted etc.?) at the time when the streamingContext is stopped, but only afterwards. In particular, when I add `Thread.sleep(5000)` after my streamingContext.stop() call, then it works and I can start a different streamingContext afterwards. This is pretty ugly, so does anyone know a method to poll whether it's safe to continue or whether there are still some RDDs waiting to be cleaned up? Thanks Tobias