subject:"\[jira\] \[Commented\] \(SPARK\-2972\) APPLICATION_COMPLETE not created in Python unless context explicitly stopped"

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-09 Thread Shay Rojansky (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126686#comment-14126686
]

Shay Rojansky commented on SPARK-2972:
--

you're right! imho, this means your program is written better than the
examples. it would be good to enhance the examples w/ try/finally semantics.
however,

Then I can submit a pull request for that, no problem.

getting the shutdown semantics right is difficult, and may not apply broadly
across applications. for instance, your application may want to catch a
failure in stop() and retry to make sure that a history record is written.
another application may be ok w/ best effort writing history events. still
another application may want to exit w/o stop() to avoid having a history
event written.

I don't think explicit stop() should be removed - of course users may choose to
manually manage stop(), catch exceptions and retry, etc. For me it's just a
question of what to do with a context that *didn't* get explicitly closed at
the end of the application.

As to apps that need to exit without a history event - it's a requirement
that's hard to imagine (for me). At least with YARN/Mesos you will be leaving
traces anyway, and these traces will be partial and difficult to understand,
since the corresponding Spark traces haven't been produced.

asking the context creator to do context destruction shifts burden to the
application writer and maintains flexibility for applications.

I guess it's a question of how high-level a tool you want Spark to be. It seems
a bit strange for Spark to handle so much of the troublesome low-level details,
while forcing the user to boilerplate-wrap all their programs with try/finally.

But I do understand the points you're making and it can be argued both ways. As
a minimum, I suggest having context implement the language-specific dispose
patterns ('using' in Java, 'with' in Python), so at least the code looks better?

APPLICATION_COMPLETE not created in Python unless context explicitly stopped

Key: SPARK-2972
URL: https://issues.apache.org/jira/browse/SPARK-2972
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.0.2
Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

If you don't explicitly stop a SparkContext at the end of a Python
application with sc.stop(), an APPLICATION_COMPLETE file isn't created and
the job doesn't get picked up by the history server.
This can be easily reproduced with pyspark (but affects scripts as well).
The current workaround is to wrap the entire script with a try/finally and
stop manually.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-09 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127016#comment-14127016
 ] 

Matthew Farrellee commented on SPARK-2972:
--

 I suggest having context implement the language-specific dispose patterns 
 ('using' in Java, 'with' in Python), so at least the code looks better?

that's a great idea. i'll spec this out for python, would you care to do it for 
java / scala?

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-09 Thread Shay Rojansky (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127171#comment-14127171
]

Shay Rojansky commented on SPARK-2972:
--

I'd love to help on this, but I know 0 Scala (I could have helped with the
Python though :)).

A quick search shows that Scala has no Python 'with' or Java Closeable
equivalent in Java. There are several third-party implementations out there,
but it doesn't seem right to bring in a non-core library for this kind of
thing. I think someone with real Scala knowledge should take a look at this.

We can close this issue and open a separate one for the Scala closeability if
you want.

APPLICATION_COMPLETE not created in Python unless context explicitly stopped

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-09 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127187#comment-14127187
 ] 

Matthew Farrellee commented on SPARK-2972:
--

+1 close this and open 2 feature requests, one for java and one for scala that 
mirror SPARK-3458

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-08 Thread Matthew Farrellee (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125938#comment-14125938
]

Matthew Farrellee commented on SPARK-2972:
--

Thanks for answering. I guess it's a debatable question. I admit I expected
the context to shut itself down at application exit, a bit in the way that
files and other resources get closed.

i can understand that. those resources are ones that are cleaned up by the
kernel, which doesn't have external dependencies on their cleanup, e.g. closing
a file handle need not depend on writing to a log. it's always nice to have the
lower level library handle things like this for you.

Note that the way the examples are currently written (pi.py), an exception
anywhere in the code would bypass sc.stop() and the Spark application
disappears without leaving a trace in the history server. For this reason, my
scripts all contain try/finally blocks around the application code, which
seems like needless boilerplate that complicates life and can easily be
forgotten.

you're right! imho, this means your program is written better than the
examples. it would be good to enhance the examples w/ try/finally semantics.
however,

Is there any specific reason not to use the application shutdown hooks
available in python/java to close the context(s)?

getting the shutdown semantics right is difficult, and may not apply broadly
across applications. for instance, your application may want to catch a failure
in stop() and retry to make sure that a history record is written. another
application may be ok w/ best effort writing history events. still another
application may want to exit w/o stop() to avoid having a history event written.

asking the context creator to do context destruction shifts burden to the
application writer and maintains flexibility for applications.

that's my 2c

APPLICATION_COMPLETE not created in Python unless context explicitly stopped

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-07 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124872#comment-14124872
 ] 

Matthew Farrellee commented on SPARK-2972:
--

[~roji] this was addressed for a pyspark shell in 
https://issues.apache.org/jira/browse/SPARK-2435. as for applications, it is 
the programmer's responsibility to stop the context before exit. this can be 
seen in all the example code provided with spark. are you looking for the 
SparkContext to stop itself?

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-07 Thread Shay Rojansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124873#comment-14124873
 ] 

Shay Rojansky commented on SPARK-2972:
--

Thanks for answering. I guess it's a debatable question. I admit I expected the 
context to shut itself down at application exit, a bit in the way that files 
and other resources get closed.

Note that the way the examples are currently written (pi.py), an exception 
anywhere in the code would bypass sc.stop() and the Spark application 
disappears without leaving a trace in the history server. For this reason, my 
scripts all contain try/finally blocks around the application code, which seems 
like needless boilerplate that complicates life and can easily be forgotten.

Is there any specific reason not to use the application shutdown hooks 
available in python/java to close the context(s)?

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

[jira] [Commented] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

7 matches

Site Navigation

Mail list logo

Footer information