[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped
[ https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126686#comment-14126686 ] Shay Rojansky commented on SPARK-2972: -- you're right! imho, this means your program is written better than the examples. it would be good to enhance the examples w/ try/finally semantics. however, Then I can submit a pull request for that, no problem. getting the shutdown semantics right is difficult, and may not apply broadly across applications. for instance, your application may want to catch a failure in stop() and retry to make sure that a history record is written. another application may be ok w/ best effort writing history events. still another application may want to exit w/o stop() to avoid having a history event written. I don't think explicit stop() should be removed - of course users may choose to manually manage stop(), catch exceptions and retry, etc. For me it's just a question of what to do with a context that *didn't* get explicitly closed at the end of the application. As to apps that need to exit without a history event - it's a requirement that's hard to imagine (for me). At least with YARN/Mesos you will be leaving traces anyway, and these traces will be partial and difficult to understand, since the corresponding Spark traces haven't been produced. asking the context creator to do context destruction shifts burden to the application writer and maintains flexibility for applications. I guess it's a question of how high-level a tool you want Spark to be. It seems a bit strange for Spark to handle so much of the troublesome low-level details, while forcing the user to boilerplate-wrap all their programs with try/finally. But I do understand the points you're making and it can be argued both ways. As a minimum, I suggest having context implement the language-specific dispose patterns ('using' in Java, 'with' in Python), so at least the code looks better? APPLICATION_COMPLETE not created in Python unless context explicitly stopped Key: SPARK-2972 URL: https://issues.apache.org/jira/browse/SPARK-2972 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.2 Environment: Cloudera 5.1, yarn master on ubuntu precise Reporter: Shay Rojansky If you don't explicitly stop a SparkContext at the end of a Python application with sc.stop(), an APPLICATION_COMPLETE file isn't created and the job doesn't get picked up by the history server. This can be easily reproduced with pyspark (but affects scripts as well). The current workaround is to wrap the entire script with a try/finally and stop manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped
[ https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127016#comment-14127016 ] Matthew Farrellee commented on SPARK-2972: -- I suggest having context implement the language-specific dispose patterns ('using' in Java, 'with' in Python), so at least the code looks better? that's a great idea. i'll spec this out for python, would you care to do it for java / scala? APPLICATION_COMPLETE not created in Python unless context explicitly stopped Key: SPARK-2972 URL: https://issues.apache.org/jira/browse/SPARK-2972 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.2 Environment: Cloudera 5.1, yarn master on ubuntu precise Reporter: Shay Rojansky If you don't explicitly stop a SparkContext at the end of a Python application with sc.stop(), an APPLICATION_COMPLETE file isn't created and the job doesn't get picked up by the history server. This can be easily reproduced with pyspark (but affects scripts as well). The current workaround is to wrap the entire script with a try/finally and stop manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped
[ https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127171#comment-14127171 ] Shay Rojansky commented on SPARK-2972: -- I'd love to help on this, but I know 0 Scala (I could have helped with the Python though :)). A quick search shows that Scala has no Python 'with' or Java Closeable equivalent in Java. There are several third-party implementations out there, but it doesn't seem right to bring in a non-core library for this kind of thing. I think someone with real Scala knowledge should take a look at this. We can close this issue and open a separate one for the Scala closeability if you want. APPLICATION_COMPLETE not created in Python unless context explicitly stopped Key: SPARK-2972 URL: https://issues.apache.org/jira/browse/SPARK-2972 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.2 Environment: Cloudera 5.1, yarn master on ubuntu precise Reporter: Shay Rojansky If you don't explicitly stop a SparkContext at the end of a Python application with sc.stop(), an APPLICATION_COMPLETE file isn't created and the job doesn't get picked up by the history server. This can be easily reproduced with pyspark (but affects scripts as well). The current workaround is to wrap the entire script with a try/finally and stop manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped
[ https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127187#comment-14127187 ] Matthew Farrellee commented on SPARK-2972: -- +1 close this and open 2 feature requests, one for java and one for scala that mirror SPARK-3458 APPLICATION_COMPLETE not created in Python unless context explicitly stopped Key: SPARK-2972 URL: https://issues.apache.org/jira/browse/SPARK-2972 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.2 Environment: Cloudera 5.1, yarn master on ubuntu precise Reporter: Shay Rojansky If you don't explicitly stop a SparkContext at the end of a Python application with sc.stop(), an APPLICATION_COMPLETE file isn't created and the job doesn't get picked up by the history server. This can be easily reproduced with pyspark (but affects scripts as well). The current workaround is to wrap the entire script with a try/finally and stop manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped
[ https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125938#comment-14125938 ] Matthew Farrellee commented on SPARK-2972: -- Thanks for answering. I guess it's a debatable question. I admit I expected the context to shut itself down at application exit, a bit in the way that files and other resources get closed. i can understand that. those resources are ones that are cleaned up by the kernel, which doesn't have external dependencies on their cleanup, e.g. closing a file handle need not depend on writing to a log. it's always nice to have the lower level library handle things like this for you. Note that the way the examples are currently written (pi.py), an exception anywhere in the code would bypass sc.stop() and the Spark application disappears without leaving a trace in the history server. For this reason, my scripts all contain try/finally blocks around the application code, which seems like needless boilerplate that complicates life and can easily be forgotten. you're right! imho, this means your program is written better than the examples. it would be good to enhance the examples w/ try/finally semantics. however, Is there any specific reason not to use the application shutdown hooks available in python/java to close the context(s)? getting the shutdown semantics right is difficult, and may not apply broadly across applications. for instance, your application may want to catch a failure in stop() and retry to make sure that a history record is written. another application may be ok w/ best effort writing history events. still another application may want to exit w/o stop() to avoid having a history event written. asking the context creator to do context destruction shifts burden to the application writer and maintains flexibility for applications. that's my 2c APPLICATION_COMPLETE not created in Python unless context explicitly stopped Key: SPARK-2972 URL: https://issues.apache.org/jira/browse/SPARK-2972 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.2 Environment: Cloudera 5.1, yarn master on ubuntu precise Reporter: Shay Rojansky If you don't explicitly stop a SparkContext at the end of a Python application with sc.stop(), an APPLICATION_COMPLETE file isn't created and the job doesn't get picked up by the history server. This can be easily reproduced with pyspark (but affects scripts as well). The current workaround is to wrap the entire script with a try/finally and stop manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped
[ https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124872#comment-14124872 ] Matthew Farrellee commented on SPARK-2972: -- [~roji] this was addressed for a pyspark shell in https://issues.apache.org/jira/browse/SPARK-2435. as for applications, it is the programmer's responsibility to stop the context before exit. this can be seen in all the example code provided with spark. are you looking for the SparkContext to stop itself? APPLICATION_COMPLETE not created in Python unless context explicitly stopped Key: SPARK-2972 URL: https://issues.apache.org/jira/browse/SPARK-2972 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.2 Environment: Cloudera 5.1, yarn master on ubuntu precise Reporter: Shay Rojansky If you don't explicitly stop a SparkContext at the end of a Python application with sc.stop(), an APPLICATION_COMPLETE file isn't created and the job doesn't get picked up by the history server. This can be easily reproduced with pyspark (but affects scripts as well). The current workaround is to wrap the entire script with a try/finally and stop manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped
[ https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124873#comment-14124873 ] Shay Rojansky commented on SPARK-2972: -- Thanks for answering. I guess it's a debatable question. I admit I expected the context to shut itself down at application exit, a bit in the way that files and other resources get closed. Note that the way the examples are currently written (pi.py), an exception anywhere in the code would bypass sc.stop() and the Spark application disappears without leaving a trace in the history server. For this reason, my scripts all contain try/finally blocks around the application code, which seems like needless boilerplate that complicates life and can easily be forgotten. Is there any specific reason not to use the application shutdown hooks available in python/java to close the context(s)? APPLICATION_COMPLETE not created in Python unless context explicitly stopped Key: SPARK-2972 URL: https://issues.apache.org/jira/browse/SPARK-2972 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.2 Environment: Cloudera 5.1, yarn master on ubuntu precise Reporter: Shay Rojansky If you don't explicitly stop a SparkContext at the end of a Python application with sc.stop(), an APPLICATION_COMPLETE file isn't created and the job doesn't get picked up by the history server. This can be easily reproduced with pyspark (but affects scripts as well). The current workaround is to wrap the entire script with a try/finally and stop manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org