Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

2016-02-22 Thread Sumona Routh
Ok, I understand.

Yes, I will have to handle them in the main thread.

Thanks!
Sumona



On Wed, Feb 17, 2016 at 12:24 PM Shixiong(Ryan) Zhu 
wrote:

> `onApplicationEnd` is posted when SparkContext is stopping, and you cannot
> submit any job to a stopping SparkContext. In general, SparkListener is
> used to monitor the job progress and collect job information, an you should
> not submit jobs there. Why not submit your jobs in the main thread?
>
> On Wed, Feb 17, 2016 at 7:11 AM, Sumona Routh  wrote:
>
>> Can anyone provide some insight into the flow of SparkListeners,
>> specifically onApplicationEnd? I'm having issues with the SparkContext
>> being stopped before my final processing can complete.
>>
>> Thanks!
>> Sumona
>>
>> On Mon, Feb 15, 2016 at 8:59 AM Sumona Routh  wrote:
>>
>>> Hi there,
>>> I am trying to implement a listener that performs as a post-processor
>>> which stores data about what was processed or erred. With this, I use an
>>> RDD that may or may not change during the course of the application.
>>>
>>> My thought was to use onApplicationEnd and then saveToCassandra call to
>>> persist this.
>>>
>>> From what I've gathered in my experiments,  onApplicationEnd  doesn't
>>> get called until sparkContext.stop() is called. If I don't call stop in my
>>> code, the listener won't be called. This works fine on my local tests -
>>> stop gets called, the listener is called and then persisted to the db, and
>>> everything works fine. However when I run this on our server,  the code in
>>> onApplicationEnd throws the following exception:
>>>
>>> Task serialization failed: java.lang.IllegalStateException: Cannot call
>>> methods on a stopped SparkContext
>>>
>>> What's the best way to resolve this? I can think of creating a new
>>> SparkContext in the listener (I think I have to turn on allowing multiple
>>> contexts, in case I try to create one before the other one is stopped). It
>>> seems odd but might be doable. Additionally, what if I were to simply add
>>> the code into my job in some sort of procedural block: doJob,
>>> doPostProcessing, does that guarantee postProcessing will occur after the
>>> other?
>>>
>>> We are currently using Spark 1.2 standalone at the moment.
>>>
>>> Please let me know if you require more details. Thanks for the
>>> assistance!
>>> Sumona
>>>
>>>
>


Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

2016-02-17 Thread Shixiong(Ryan) Zhu
`onApplicationEnd` is posted when SparkContext is stopping, and you cannot
submit any job to a stopping SparkContext. In general, SparkListener is
used to monitor the job progress and collect job information, an you should
not submit jobs there. Why not submit your jobs in the main thread?

On Wed, Feb 17, 2016 at 7:11 AM, Sumona Routh  wrote:

> Can anyone provide some insight into the flow of SparkListeners,
> specifically onApplicationEnd? I'm having issues with the SparkContext
> being stopped before my final processing can complete.
>
> Thanks!
> Sumona
>
> On Mon, Feb 15, 2016 at 8:59 AM Sumona Routh  wrote:
>
>> Hi there,
>> I am trying to implement a listener that performs as a post-processor
>> which stores data about what was processed or erred. With this, I use an
>> RDD that may or may not change during the course of the application.
>>
>> My thought was to use onApplicationEnd and then saveToCassandra call to
>> persist this.
>>
>> From what I've gathered in my experiments,  onApplicationEnd  doesn't get
>> called until sparkContext.stop() is called. If I don't call stop in my
>> code, the listener won't be called. This works fine on my local tests -
>> stop gets called, the listener is called and then persisted to the db, and
>> everything works fine. However when I run this on our server,  the code in
>> onApplicationEnd throws the following exception:
>>
>> Task serialization failed: java.lang.IllegalStateException: Cannot call
>> methods on a stopped SparkContext
>>
>> What's the best way to resolve this? I can think of creating a new
>> SparkContext in the listener (I think I have to turn on allowing multiple
>> contexts, in case I try to create one before the other one is stopped). It
>> seems odd but might be doable. Additionally, what if I were to simply add
>> the code into my job in some sort of procedural block: doJob,
>> doPostProcessing, does that guarantee postProcessing will occur after the
>> other?
>>
>> We are currently using Spark 1.2 standalone at the moment.
>>
>> Please let me know if you require more details. Thanks for the assistance!
>> Sumona
>>
>>


Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

2016-02-17 Thread Sumona Routh
Can anyone provide some insight into the flow of SparkListeners,
specifically onApplicationEnd? I'm having issues with the SparkContext
being stopped before my final processing can complete.

Thanks!
Sumona

On Mon, Feb 15, 2016 at 8:59 AM Sumona Routh  wrote:

> Hi there,
> I am trying to implement a listener that performs as a post-processor
> which stores data about what was processed or erred. With this, I use an
> RDD that may or may not change during the course of the application.
>
> My thought was to use onApplicationEnd and then saveToCassandra call to
> persist this.
>
> From what I've gathered in my experiments,  onApplicationEnd  doesn't get
> called until sparkContext.stop() is called. If I don't call stop in my
> code, the listener won't be called. This works fine on my local tests -
> stop gets called, the listener is called and then persisted to the db, and
> everything works fine. However when I run this on our server,  the code in
> onApplicationEnd throws the following exception:
>
> Task serialization failed: java.lang.IllegalStateException: Cannot call
> methods on a stopped SparkContext
>
> What's the best way to resolve this? I can think of creating a new
> SparkContext in the listener (I think I have to turn on allowing multiple
> contexts, in case I try to create one before the other one is stopped). It
> seems odd but might be doable. Additionally, what if I were to simply add
> the code into my job in some sort of procedural block: doJob,
> doPostProcessing, does that guarantee postProcessing will occur after the
> other?
>
> We are currently using Spark 1.2 standalone at the moment.
>
> Please let me know if you require more details. Thanks for the assistance!
> Sumona
>
>


SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

2016-02-15 Thread Sumona Routh
Hi there,
I am trying to implement a listener that performs as a post-processor which
stores data about what was processed or erred. With this, I use an RDD that
may or may not change during the course of the application.

My thought was to use onApplicationEnd and then saveToCassandra call to
persist this.

>From what I've gathered in my experiments,  onApplicationEnd  doesn't get
called until sparkContext.stop() is called. If I don't call stop in my
code, the listener won't be called. This works fine on my local tests -
stop gets called, the listener is called and then persisted to the db, and
everything works fine. However when I run this on our server,  the code in
onApplicationEnd throws the following exception:

Task serialization failed: java.lang.IllegalStateException: Cannot call
methods on a stopped SparkContext

What's the best way to resolve this? I can think of creating a new
SparkContext in the listener (I think I have to turn on allowing multiple
contexts, in case I try to create one before the other one is stopped). It
seems odd but might be doable. Additionally, what if I were to simply add
the code into my job in some sort of procedural block: doJob,
doPostProcessing, does that guarantee postProcessing will occur after the
other?

We are currently using Spark 1.2 standalone at the moment.

Please let me know if you require more details. Thanks for the assistance!
Sumona