[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

Matthew Farrellee (JIRA) Mon, 08 Sep 2014 12:12:13 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125938#comment-14125938
 ]


Matthew Farrellee commented on SPARK-2972:
------------------------------------------

> Thanks for answering. I guess it's a debatable question. I admit I expected 
> the context to shut itself down at application exit, a bit in the way that 
> files and other resources get closed.

i can understand that. those resources are ones that are cleaned up by the 
kernel, which doesn't have external dependencies on their cleanup, e.g. closing 
a file handle need not depend on writing to a log. it's always nice to have the 
lower level library handle things like this for you.

> Note that the way the examples are currently written (pi.py), an exception 
> anywhere in the code would bypass sc.stop() and the Spark application 
> disappears without leaving a trace in the history server. For this reason, my 
> scripts all contain try/finally blocks around the application code, which 
> seems like needless boilerplate that complicates life and can easily be 
> forgotten.

you're right! imho, this means your program is written better than the 
examples. it would be good to enhance the examples w/ try/finally semantics. 
however,

> Is there any specific reason not to use the application shutdown hooks 
> available in python/java to close the context(s)?

getting the shutdown semantics right is difficult, and may not apply broadly 
across applications. for instance, your application may want to catch a failure 
in stop() and retry to make sure that a history record is written. another 
application may be ok w/ best effort writing history events. still another 
application may want to exit w/o stop() to avoid having a history event written.

asking the context creator to do context destruction shifts burden to the 
application writer and maintains flexibility for applications.

that's my 2c

> APPLICATION_COMPLETE not created in Python unless context explicitly stopped
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-2972
>                 URL: https://issues.apache.org/jira/browse/SPARK-2972
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.0.2
>         Environment: Cloudera 5.1, yarn master on ubuntu precise
>            Reporter: Shay Rojansky
>
> If you don't explicitly stop a SparkContext at the end of a Python 
> application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
> the job doesn't get picked up by the history server.
> This can be easily reproduced with pyspark (but affects scripts as well).
> The current workaround is to wrap the entire script with a try/finally and 
> stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

Reply via email to