[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-09 Thread Shay Rojansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126686#comment-14126686
 ] 

Shay Rojansky commented on SPARK-2972:
--

 you're right! imho, this means your program is written better than the 
 examples. it would be good to enhance the examples w/ try/finally semantics. 
 however,

Then I can submit a pull request for that, no problem.

 getting the shutdown semantics right is difficult, and may not apply broadly 
 across applications. for instance, your application may want to catch a 
 failure in stop() and retry to make sure that a history record is written. 
 another application may be ok w/ best effort writing history events. still 
 another application may want to exit w/o stop() to avoid having a history 
 event written.

I don't think explicit stop() should be removed - of course users may choose to 
manually manage stop(), catch exceptions and retry, etc. For me it's just a 
question of what to do with a context that *didn't* get explicitly closed at 
the end of the application.

As to apps that need to exit without a history event - it's a requirement 
that's hard to imagine (for me). At least with YARN/Mesos you will be leaving 
traces anyway, and these traces will be partial and difficult to understand, 
since the corresponding Spark traces haven't been produced.

 asking the context creator to do context destruction shifts burden to the 
 application writer and maintains flexibility for applications.

I guess it's a question of how high-level a tool you want Spark to be. It seems 
a bit strange for Spark to handle so much of the troublesome low-level details, 
while forcing the user to boilerplate-wrap all their programs with try/finally.

But I do understand the points you're making and it can be argued both ways. As 
a minimum, I suggest having context implement the language-specific dispose 
patterns ('using' in Java, 'with' in Python), so at least the code looks better?

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-09 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127016#comment-14127016
 ] 

Matthew Farrellee commented on SPARK-2972:
--

 I suggest having context implement the language-specific dispose patterns 
 ('using' in Java, 'with' in Python), so at least the code looks better?

that's a great idea. i'll spec this out for python, would you care to do it for 
java / scala?

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-09 Thread Shay Rojansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127171#comment-14127171
 ] 

Shay Rojansky commented on SPARK-2972:
--

I'd love to help on this, but I know 0 Scala (I could have helped with the 
Python though :)).

A quick search shows that Scala has no Python 'with' or Java Closeable 
equivalent in Java. There are several third-party implementations out there, 
but it doesn't seem right to bring in a non-core library for this kind of 
thing. I think someone with real Scala knowledge should take a look at this.

We can close this issue and open a separate one for the Scala closeability if 
you want.

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-09 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127187#comment-14127187
 ] 

Matthew Farrellee commented on SPARK-2972:
--

+1 close this and open 2 feature requests, one for java and one for scala that 
mirror SPARK-3458

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-08 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125938#comment-14125938
 ] 

Matthew Farrellee commented on SPARK-2972:
--

 Thanks for answering. I guess it's a debatable question. I admit I expected 
 the context to shut itself down at application exit, a bit in the way that 
 files and other resources get closed.

i can understand that. those resources are ones that are cleaned up by the 
kernel, which doesn't have external dependencies on their cleanup, e.g. closing 
a file handle need not depend on writing to a log. it's always nice to have the 
lower level library handle things like this for you.

 Note that the way the examples are currently written (pi.py), an exception 
 anywhere in the code would bypass sc.stop() and the Spark application 
 disappears without leaving a trace in the history server. For this reason, my 
 scripts all contain try/finally blocks around the application code, which 
 seems like needless boilerplate that complicates life and can easily be 
 forgotten.

you're right! imho, this means your program is written better than the 
examples. it would be good to enhance the examples w/ try/finally semantics. 
however,

 Is there any specific reason not to use the application shutdown hooks 
 available in python/java to close the context(s)?

getting the shutdown semantics right is difficult, and may not apply broadly 
across applications. for instance, your application may want to catch a failure 
in stop() and retry to make sure that a history record is written. another 
application may be ok w/ best effort writing history events. still another 
application may want to exit w/o stop() to avoid having a history event written.

asking the context creator to do context destruction shifts burden to the 
application writer and maintains flexibility for applications.

that's my 2c

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-07 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124872#comment-14124872
 ] 

Matthew Farrellee commented on SPARK-2972:
--

[~roji] this was addressed for a pyspark shell in 
https://issues.apache.org/jira/browse/SPARK-2435. as for applications, it is 
the programmer's responsibility to stop the context before exit. this can be 
seen in all the example code provided with spark. are you looking for the 
SparkContext to stop itself?

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-07 Thread Shay Rojansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124873#comment-14124873
 ] 

Shay Rojansky commented on SPARK-2972:
--

Thanks for answering. I guess it's a debatable question. I admit I expected the 
context to shut itself down at application exit, a bit in the way that files 
and other resources get closed.

Note that the way the examples are currently written (pi.py), an exception 
anywhere in the code would bypass sc.stop() and the Spark application 
disappears without leaving a trace in the history server. For this reason, my 
scripts all contain try/finally blocks around the application code, which seems 
like needless boilerplate that complicates life and can easily be forgotten.

Is there any specific reason not to use the application shutdown hooks 
available in python/java to close the context(s)?

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org