Re: Timed out while stopping the job generator plus subsequent failures

2015-03-12 Thread Sean Owen
I don't think that's the same issue I was seeing, but you can have a
look at https://issues.apache.org/jira/browse/SPARK-4545 for more
detail on my issue.

On Thu, Mar 12, 2015 at 12:51 AM, Tobias Pfeiffer t...@preferred.jp wrote:
 Sean,

 On Wed, Mar 11, 2015 at 7:43 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 it seems like I am unable to shut down my StreamingContext properly, both
 in local[n] and yarn-cluster mode. In addition, (only) in yarn-cluster mode,
 subsequent use of a new StreamingContext will raise an
 InvalidActorNameException.


 I was wondering if this is related to your question on spark-dev
   http://tinyurl.com/q5cd5px
 Did you get any additional insight into this issue?

 In my case the processing of the first batch completes, but I don't know if
 there is anything wrong with the checkpoints? When I look to the
 corresponding checkpoint directory in HDFS, it doesn't seem like all state
 RDDs are persisted there, just a subset. Any ideas?

 Thanks
 Tobias


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Timed out while stopping the job generator plus subsequent failures

2015-03-11 Thread Tobias Pfeiffer
Sean,

On Wed, Mar 11, 2015 at 7:43 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 it seems like I am unable to shut down my StreamingContext properly, both
 in local[n] and yarn-cluster mode. In addition, (only) in yarn-cluster
 mode, subsequent use of a new StreamingContext will raise
 an InvalidActorNameException.


I was wondering if this is related to your question on spark-dev
  http://tinyurl.com/q5cd5px
Did you get any additional insight into this issue?

In my case the processing of the first batch completes, but I don't know if
there is anything wrong with the checkpoints? When I look to the
corresponding checkpoint directory in HDFS, it doesn't seem like all state
RDDs are persisted there, just a subset. Any ideas?

Thanks
Tobias


Re: Timed out while stopping the job generator plus subsequent failures

2015-03-11 Thread Tobias Pfeiffer
Hi,

I discovered what caused my issue when running on YARN and was able to work
around it.

On Wed, Mar 11, 2015 at 7:43 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 The processing itself is complete, i.e., the batch currently processed at
 the time of stop() is finished and no further batches are processed.
 However, something keeps the streaming context from stopping properly. In
 local[n] mode, this is not actually a problem (other than I have to wait 20
 seconds for shutdown), but in yarn-cluster mode, I get an error

   akka.actor.InvalidActorNameException: actor name [JobGenerator] is not
 unique!


It seems that not all checkpointed RDDs are cleaned (metadata cleared,
checkpoint directories deleted etc.?) at the time when the streamingContext
is stopped, but only afterwards. In particular, when I add
`Thread.sleep(5000)` after my streamingContext.stop() call, then it works
and I can start a different streamingContext afterwards.

This is pretty ugly, so does anyone know a method to poll whether it's safe
to continue or whether there are still some RDDs waiting to be cleaned up?

Thanks
Tobias