[ https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125938#comment-14125938 ]
Matthew Farrellee commented on SPARK-2972: ------------------------------------------ > Thanks for answering. I guess it's a debatable question. I admit I expected > the context to shut itself down at application exit, a bit in the way that > files and other resources get closed. i can understand that. those resources are ones that are cleaned up by the kernel, which doesn't have external dependencies on their cleanup, e.g. closing a file handle need not depend on writing to a log. it's always nice to have the lower level library handle things like this for you. > Note that the way the examples are currently written (pi.py), an exception > anywhere in the code would bypass sc.stop() and the Spark application > disappears without leaving a trace in the history server. For this reason, my > scripts all contain try/finally blocks around the application code, which > seems like needless boilerplate that complicates life and can easily be > forgotten. you're right! imho, this means your program is written better than the examples. it would be good to enhance the examples w/ try/finally semantics. however, > Is there any specific reason not to use the application shutdown hooks > available in python/java to close the context(s)? getting the shutdown semantics right is difficult, and may not apply broadly across applications. for instance, your application may want to catch a failure in stop() and retry to make sure that a history record is written. another application may be ok w/ best effort writing history events. still another application may want to exit w/o stop() to avoid having a history event written. asking the context creator to do context destruction shifts burden to the application writer and maintains flexibility for applications. that's my 2c > APPLICATION_COMPLETE not created in Python unless context explicitly stopped > ---------------------------------------------------------------------------- > > Key: SPARK-2972 > URL: https://issues.apache.org/jira/browse/SPARK-2972 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.0.2 > Environment: Cloudera 5.1, yarn master on ubuntu precise > Reporter: Shay Rojansky > > If you don't explicitly stop a SparkContext at the end of a Python > application with sc.stop(), an APPLICATION_COMPLETE file isn't created and > the job doesn't get picked up by the history server. > This can be easily reproduced with pyspark (but affects scripts as well). > The current workaround is to wrap the entire script with a try/finally and > stop manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org