This might be an issue with how pyspark propagates the error back to the AM. I'm pretty sure this does not happen for Scala / Java apps.
Have you filed a bug? On Tue, Jul 28, 2015 at 11:17 AM, Elkhan Dadashov <elkhan8...@gmail.com> wrote: > Thanks Corey for your answer, > > Do you mean that "final status : SUCCEEDED" in terminal logs means that > YARN RM could clean the resources after the application has finished > (application finishing does not necessarily mean succeeded or failed) ? > > With that logic it totally makes sense. > > Basically the YARN logs does not say anything about the Spark job itself. > It just says that Spark job resources have been cleaned up after the job > completed and returned back to Yarn. > > It would be great if Yarn logs could also say about the consequence of the > job, because the user is interested in more about the job final status. > > Yarn related logs can be found in RM ,NM, DN, NN log files in detail. > > Thanks again. > > On Mon, Jul 27, 2015 at 7:45 PM, Corey Nolet <cjno...@gmail.com> wrote: > >> Elkhan, >> >> What does the ResourceManager say about the final status of the job? >> Spark jobs that run as Yarn applications can fail but still successfully >> clean up their resources and give them back to the Yarn cluster. Because of >> this, there's a difference between your code throwing an exception in an >> executor/driver and the Yarn application failing. Generally you'll see a >> yarn application fail when there's a memory problem (too much memory being >> allocated or not enough causing executors to fail multiple times not >> allowing your job to finish). >> >> What I'm seeing from your post is that you had an exception in your >> application which was caught by the Spark framework which then proceeded to >> clean up the job and shut itself down- which it did successfully. When you >> aren't running in the Yarn modes, you aren't seeing any Yarn status that's >> telling you the Yarn application was successfully shut down, you are just >> seeing the failure(s) from your drivers/executors. >> >> >> >> On Mon, Jul 27, 2015 at 2:11 PM, Elkhan Dadashov <elkhan8...@gmail.com> >> wrote: >> >>> Any updates on this bug ? >>> >>> Why Spark log results & Job final status does not match ? (one saying >>> that job has failed, another stating that job has succeeded) >>> >>> Thanks. >>> >>> >>> On Thu, Jul 23, 2015 at 4:43 PM, Elkhan Dadashov <elkhan8...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> While running Spark Word count python example with intentional mistake >>>> in *Yarn cluster mode*, Spark terminal states final status as >>>> SUCCEEDED, but log files state correct results indicating that the job >>>> failed. >>>> >>>> Why terminal log output & application log output contradict each other ? >>>> >>>> If i run same job on *local mode* then terminal logs and application >>>> logs match, where both state that job has failed to expected error in >>>> python script. >>>> >>>> More details: Scenario >>>> >>>> While running Spark Word count python example on *Yarn cluster mode*, >>>> if I make intentional error in wordcount.py by changing this line (I'm >>>> using Spark 1.4.1, but this problem exists in Spark 1.4.0 and in 1.3.0 >>>> versions - which i tested): >>>> >>>> lines = sc.textFile(sys.argv[1], 1) >>>> >>>> into this line: >>>> >>>> lines = sc.textFile(*nonExistentVariable*,1) >>>> >>>> where nonExistentVariable variable was never created and initialized. >>>> >>>> then i run that example with this command (I put README.md into HDFS >>>> before running this command): >>>> >>>> *./bin/spark-submit --master yarn-cluster wordcount.py /README.md* >>>> >>>> The job runs and finishes successfully according the log printed in the >>>> terminal : >>>> *Terminal logs*: >>>> ... >>>> 15/07/23 16:19:17 INFO yarn.Client: Application report for >>>> application_1437612288327_0013 (state: RUNNING) >>>> 15/07/23 16:19:18 INFO yarn.Client: Application report for >>>> application_1437612288327_0013 (state: RUNNING) >>>> 15/07/23 16:19:19 INFO yarn.Client: Application report for >>>> application_1437612288327_0013 (state: RUNNING) >>>> 15/07/23 16:19:20 INFO yarn.Client: Application report for >>>> application_1437612288327_0013 (state: RUNNING) >>>> 15/07/23 16:19:21 INFO yarn.Client: Application report for >>>> application_1437612288327_0013 (state: FINISHED) >>>> 15/07/23 16:19:21 INFO yarn.Client: >>>> client token: N/A >>>> diagnostics: Shutdown hook called before final status was reported. >>>> ApplicationMaster host: 10.0.53.59 >>>> ApplicationMaster RPC port: 0 >>>> queue: default >>>> start time: 1437693551439 >>>> final status: *SUCCEEDED* >>>> tracking URL: >>>> http://localhost:8088/proxy/application_1437612288327_0013/history/application_1437612288327_0013/1 >>>> user: edadashov >>>> 15/07/23 16:19:21 INFO util.Utils: Shutdown hook called >>>> 15/07/23 16:19:21 INFO util.Utils: Deleting directory >>>> /tmp/spark-eba0a1b5-a216-4afa-9c54-a3cb67b16444 >>>> >>>> But if look at log files generated for this application in HDFS - it >>>> indicates failure of the job with correct reason: >>>> *Application log files*: >>>> ... >>>> \00 stdout\00 179Traceback (most recent call last): >>>> File "wordcount.py", line 32, in <module> >>>> lines = sc.textFile(nonExistentVariable,1) >>>> *NameError: name 'nonExistentVariable' is not defined* >>>> >>>> >>>> Why terminal output - final status: *SUCCEEDED , *is not matching >>>> application log results - failure of the job (NameError: name >>>> 'nonExistentVariable' is not defined) ? >>>> >>>> Is this bug ? Is there Jira ticket related to this issue ? (Is someone >>>> assigned to this issue ?) >>>> >>>> If i run this wordcount .py example (with mistake line) in local mode, >>>> then terminal log states that the job has failed in terminal logs too. >>>> >>>> *./bin/spark-submit wordcount.py /README.md* >>>> >>>> *Terminal logs*: >>>> >>>> ... >>>> 15/07/23 16:31:55 INFO scheduler.EventLoggingListener: Logging events >>>> to hdfs:///app-logs/local-1437694314943 >>>> Traceback (most recent call last): >>>> File "/home/edadashov/tools/myspark/spark/wordcount.py", line 32, in >>>> <module> >>>> lines = sc.textFile(nonExistentVariable,1) >>>> NameError: name 'nonExistentVariable' is not defined >>>> 15/07/23 16:31:55 INFO spark.SparkContext: Invoking stop() from >>>> shutdown hook >>>> >>>> >>>> Thanks. >>>> >>> >>> >>> >>> -- >>> >>> Best regards, >>> Elkhan Dadashov >>> >> >> > > > -- > > Best regards, > Elkhan Dadashov > -- Marcelo