Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext
Ok, I understand. Yes, I will have to handle them in the main thread. Thanks! Sumona On Wed, Feb 17, 2016 at 12:24 PM Shixiong(Ryan) Zhuwrote: > `onApplicationEnd` is posted when SparkContext is stopping, and you cannot > submit any job to a stopping SparkContext. In general, SparkListener is > used to monitor the job progress and collect job information, an you should > not submit jobs there. Why not submit your jobs in the main thread? > > On Wed, Feb 17, 2016 at 7:11 AM, Sumona Routh wrote: > >> Can anyone provide some insight into the flow of SparkListeners, >> specifically onApplicationEnd? I'm having issues with the SparkContext >> being stopped before my final processing can complete. >> >> Thanks! >> Sumona >> >> On Mon, Feb 15, 2016 at 8:59 AM Sumona Routh wrote: >> >>> Hi there, >>> I am trying to implement a listener that performs as a post-processor >>> which stores data about what was processed or erred. With this, I use an >>> RDD that may or may not change during the course of the application. >>> >>> My thought was to use onApplicationEnd and then saveToCassandra call to >>> persist this. >>> >>> From what I've gathered in my experiments, onApplicationEnd doesn't >>> get called until sparkContext.stop() is called. If I don't call stop in my >>> code, the listener won't be called. This works fine on my local tests - >>> stop gets called, the listener is called and then persisted to the db, and >>> everything works fine. However when I run this on our server, the code in >>> onApplicationEnd throws the following exception: >>> >>> Task serialization failed: java.lang.IllegalStateException: Cannot call >>> methods on a stopped SparkContext >>> >>> What's the best way to resolve this? I can think of creating a new >>> SparkContext in the listener (I think I have to turn on allowing multiple >>> contexts, in case I try to create one before the other one is stopped). It >>> seems odd but might be doable. Additionally, what if I were to simply add >>> the code into my job in some sort of procedural block: doJob, >>> doPostProcessing, does that guarantee postProcessing will occur after the >>> other? >>> >>> We are currently using Spark 1.2 standalone at the moment. >>> >>> Please let me know if you require more details. Thanks for the >>> assistance! >>> Sumona >>> >>> >
Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext
`onApplicationEnd` is posted when SparkContext is stopping, and you cannot submit any job to a stopping SparkContext. In general, SparkListener is used to monitor the job progress and collect job information, an you should not submit jobs there. Why not submit your jobs in the main thread? On Wed, Feb 17, 2016 at 7:11 AM, Sumona Routhwrote: > Can anyone provide some insight into the flow of SparkListeners, > specifically onApplicationEnd? I'm having issues with the SparkContext > being stopped before my final processing can complete. > > Thanks! > Sumona > > On Mon, Feb 15, 2016 at 8:59 AM Sumona Routh wrote: > >> Hi there, >> I am trying to implement a listener that performs as a post-processor >> which stores data about what was processed or erred. With this, I use an >> RDD that may or may not change during the course of the application. >> >> My thought was to use onApplicationEnd and then saveToCassandra call to >> persist this. >> >> From what I've gathered in my experiments, onApplicationEnd doesn't get >> called until sparkContext.stop() is called. If I don't call stop in my >> code, the listener won't be called. This works fine on my local tests - >> stop gets called, the listener is called and then persisted to the db, and >> everything works fine. However when I run this on our server, the code in >> onApplicationEnd throws the following exception: >> >> Task serialization failed: java.lang.IllegalStateException: Cannot call >> methods on a stopped SparkContext >> >> What's the best way to resolve this? I can think of creating a new >> SparkContext in the listener (I think I have to turn on allowing multiple >> contexts, in case I try to create one before the other one is stopped). It >> seems odd but might be doable. Additionally, what if I were to simply add >> the code into my job in some sort of procedural block: doJob, >> doPostProcessing, does that guarantee postProcessing will occur after the >> other? >> >> We are currently using Spark 1.2 standalone at the moment. >> >> Please let me know if you require more details. Thanks for the assistance! >> Sumona >> >>
Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext
Can anyone provide some insight into the flow of SparkListeners, specifically onApplicationEnd? I'm having issues with the SparkContext being stopped before my final processing can complete. Thanks! Sumona On Mon, Feb 15, 2016 at 8:59 AM Sumona Routhwrote: > Hi there, > I am trying to implement a listener that performs as a post-processor > which stores data about what was processed or erred. With this, I use an > RDD that may or may not change during the course of the application. > > My thought was to use onApplicationEnd and then saveToCassandra call to > persist this. > > From what I've gathered in my experiments, onApplicationEnd doesn't get > called until sparkContext.stop() is called. If I don't call stop in my > code, the listener won't be called. This works fine on my local tests - > stop gets called, the listener is called and then persisted to the db, and > everything works fine. However when I run this on our server, the code in > onApplicationEnd throws the following exception: > > Task serialization failed: java.lang.IllegalStateException: Cannot call > methods on a stopped SparkContext > > What's the best way to resolve this? I can think of creating a new > SparkContext in the listener (I think I have to turn on allowing multiple > contexts, in case I try to create one before the other one is stopped). It > seems odd but might be doable. Additionally, what if I were to simply add > the code into my job in some sort of procedural block: doJob, > doPostProcessing, does that guarantee postProcessing will occur after the > other? > > We are currently using Spark 1.2 standalone at the moment. > > Please let me know if you require more details. Thanks for the assistance! > Sumona > >
SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext
Hi there, I am trying to implement a listener that performs as a post-processor which stores data about what was processed or erred. With this, I use an RDD that may or may not change during the course of the application. My thought was to use onApplicationEnd and then saveToCassandra call to persist this. >From what I've gathered in my experiments, onApplicationEnd doesn't get called until sparkContext.stop() is called. If I don't call stop in my code, the listener won't be called. This works fine on my local tests - stop gets called, the listener is called and then persisted to the db, and everything works fine. However when I run this on our server, the code in onApplicationEnd throws the following exception: Task serialization failed: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext What's the best way to resolve this? I can think of creating a new SparkContext in the listener (I think I have to turn on allowing multiple contexts, in case I try to create one before the other one is stopped). It seems odd but might be doable. Additionally, what if I were to simply add the code into my job in some sort of procedural block: doJob, doPostProcessing, does that guarantee postProcessing will occur after the other? We are currently using Spark 1.2 standalone at the moment. Please let me know if you require more details. Thanks for the assistance! Sumona