[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2017-11-12 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248929#comment-16248929
 ] 

Brett Randall commented on SPARK-15685:
---

SPARK-16304 fixes part of this in commit 
https://github.com/apache/spark/commit/773fbfef1929b64229fbf97a91c45cdb1ec1fb1f 
.  A {{StackOverflowError}} would still abort the VM.

> StackOverflowError (VirtualMachineError) or NoClassDefFoundError 
> (LinkageError) should not System.exit() in local mode
> --
>
> Key: SPARK-15685
> URL: https://issues.apache.org/jira/browse/SPARK-15685
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
> Fix For: 2.0.2
>
>
> Spark, when running in local mode, can encounter certain types of {{Error}} 
> exceptions in developer-code or third-party libraries and call 
> {{System.exit()}}, potentially killing a long-running JVM/service.  The 
> caller should decide on the exception-handling and whether the error should 
> be deemed fatal.
> *Consider this scenario:*
> * Spark is being used in local master mode within a long-running JVM 
> microservice, e.g. a Jetty instance.
> * A task is run.  The task errors with particular types of unchecked 
> throwables:
> ** a) there some bad code and/or bad data that exposes a bug where there's 
> unterminated recursion, leading to a {{StackOverflowError}}, or
> ** b) a particular not-often used function is called - there's a packaging 
> error with the service, a third-party library is missing some dependencies, a 
> {{NoClassDefFoundError}} is found.
> *Expected behaviour:* Since we are running in local mode, we might expect 
> some unchecked exception to be thrown, to be optionally-handled by the Spark 
> caller.  In the case of Jetty, a request thread or some other background 
> worker thread might handle the exception or not, the thread might exit or 
> note an error.  The caller should decide how the error is handled.
> *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
> microservice is down and must be restarted.
> *Consequence:* Any local code or third-party library might cause Spark to 
> exit the long-running JVM/microservice, so Spark can be a problem in this 
> architecture.  I have seen this now on three separate occasions, leading to 
> service-down bug reports.
> *Analysis:*
> The line of code that seems to be the problem is: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405
> {code}
> // Don't forcibly exit unless the exception was inherently fatal, to avoid
> // stopping other tasks unnecessarily.
> if (Utils.isFatalError(t)) {
> SparkUncaughtExceptionHandler.uncaughtException(t)
> }
> {code}
> [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
>  first excludes Scala 
> [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
>  which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
> {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
> {{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
> {{NotImplementedError}} and {{ControlThrowable}}.
> Remaining are {{Error}} s such as {{StackOverflowError extends 
> VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which 
> occur in the aforementioned scenarios.  
> {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call 
> {{System.exit()}}.
> [Further 
> up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77]
>  in in {{Executor}} we see exclusions for registering 
> {{SparkUncaughtExceptionHandler}} if in local mode:
> {code}
>   if (!isLocal) {
> // Setup an uncaught exception handler for non-local mode.
> // Make any thread terminations due to uncaught exceptions kill the entire
> // executor process to avoid surprising stalls.
> Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
>   }
> {code}
> This same exclusion must be applied for local mode for "fatal" errors - 
> cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should 
> decide.
> A minimal test-case is supplied.  It installs a logging {{SecurityManager}} 
> to confirm that {{System.exit()}} was called from 
> {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}.  It 
> also hints at the workaround - install your own {{SecurityManager}} and 
> inspect the current stack in {{checkExit()}} to prevent Spark from exiting 
> the JVM.
> Test-case: https://github.com/javabrett/SPA

[jira] [Resolved] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2017-11-11 Thread Brett Randall (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brett Randall resolved SPARK-15685.
---
   Resolution: Fixed
Fix Version/s: 2.0.2

> StackOverflowError (VirtualMachineError) or NoClassDefFoundError 
> (LinkageError) should not System.exit() in local mode
> --
>
> Key: SPARK-15685
> URL: https://issues.apache.org/jira/browse/SPARK-15685
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
> Fix For: 2.0.2
>
>
> Spark, when running in local mode, can encounter certain types of {{Error}} 
> exceptions in developer-code or third-party libraries and call 
> {{System.exit()}}, potentially killing a long-running JVM/service.  The 
> caller should decide on the exception-handling and whether the error should 
> be deemed fatal.
> *Consider this scenario:*
> * Spark is being used in local master mode within a long-running JVM 
> microservice, e.g. a Jetty instance.
> * A task is run.  The task errors with particular types of unchecked 
> throwables:
> ** a) there some bad code and/or bad data that exposes a bug where there's 
> unterminated recursion, leading to a {{StackOverflowError}}, or
> ** b) a particular not-often used function is called - there's a packaging 
> error with the service, a third-party library is missing some dependencies, a 
> {{NoClassDefFoundError}} is found.
> *Expected behaviour:* Since we are running in local mode, we might expect 
> some unchecked exception to be thrown, to be optionally-handled by the Spark 
> caller.  In the case of Jetty, a request thread or some other background 
> worker thread might handle the exception or not, the thread might exit or 
> note an error.  The caller should decide how the error is handled.
> *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
> microservice is down and must be restarted.
> *Consequence:* Any local code or third-party library might cause Spark to 
> exit the long-running JVM/microservice, so Spark can be a problem in this 
> architecture.  I have seen this now on three separate occasions, leading to 
> service-down bug reports.
> *Analysis:*
> The line of code that seems to be the problem is: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405
> {code}
> // Don't forcibly exit unless the exception was inherently fatal, to avoid
> // stopping other tasks unnecessarily.
> if (Utils.isFatalError(t)) {
> SparkUncaughtExceptionHandler.uncaughtException(t)
> }
> {code}
> [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
>  first excludes Scala 
> [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
>  which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
> {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
> {{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
> {{NotImplementedError}} and {{ControlThrowable}}.
> Remaining are {{Error}} s such as {{StackOverflowError extends 
> VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which 
> occur in the aforementioned scenarios.  
> {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call 
> {{System.exit()}}.
> [Further 
> up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77]
>  in in {{Executor}} we see exclusions for registering 
> {{SparkUncaughtExceptionHandler}} if in local mode:
> {code}
>   if (!isLocal) {
> // Setup an uncaught exception handler for non-local mode.
> // Make any thread terminations due to uncaught exceptions kill the entire
> // executor process to avoid surprising stalls.
> Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
>   }
> {code}
> This same exclusion must be applied for local mode for "fatal" errors - 
> cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should 
> decide.
> A minimal test-case is supplied.  It installs a logging {{SecurityManager}} 
> to confirm that {{System.exit()}} was called from 
> {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}.  It 
> also hints at the workaround - install your own {{SecurityManager}} and 
> inspect the current stack in {{checkExit()}} to prevent Spark from exiting 
> the JVM.
> Test-case: https://github.com/javabrett/SPARK-15685 .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr

[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2017-11-11 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248776#comment-16248776
 ] 

Brett Randall commented on SPARK-15685:
---

[~srowen] this seems to be fixed in 2.0.2, but I don't know which commit is 
responsible.

> StackOverflowError (VirtualMachineError) or NoClassDefFoundError 
> (LinkageError) should not System.exit() in local mode
> --
>
> Key: SPARK-15685
> URL: https://issues.apache.org/jira/browse/SPARK-15685
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>
> Spark, when running in local mode, can encounter certain types of {{Error}} 
> exceptions in developer-code or third-party libraries and call 
> {{System.exit()}}, potentially killing a long-running JVM/service.  The 
> caller should decide on the exception-handling and whether the error should 
> be deemed fatal.
> *Consider this scenario:*
> * Spark is being used in local master mode within a long-running JVM 
> microservice, e.g. a Jetty instance.
> * A task is run.  The task errors with particular types of unchecked 
> throwables:
> ** a) there some bad code and/or bad data that exposes a bug where there's 
> unterminated recursion, leading to a {{StackOverflowError}}, or
> ** b) a particular not-often used function is called - there's a packaging 
> error with the service, a third-party library is missing some dependencies, a 
> {{NoClassDefFoundError}} is found.
> *Expected behaviour:* Since we are running in local mode, we might expect 
> some unchecked exception to be thrown, to be optionally-handled by the Spark 
> caller.  In the case of Jetty, a request thread or some other background 
> worker thread might handle the exception or not, the thread might exit or 
> note an error.  The caller should decide how the error is handled.
> *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
> microservice is down and must be restarted.
> *Consequence:* Any local code or third-party library might cause Spark to 
> exit the long-running JVM/microservice, so Spark can be a problem in this 
> architecture.  I have seen this now on three separate occasions, leading to 
> service-down bug reports.
> *Analysis:*
> The line of code that seems to be the problem is: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405
> {code}
> // Don't forcibly exit unless the exception was inherently fatal, to avoid
> // stopping other tasks unnecessarily.
> if (Utils.isFatalError(t)) {
> SparkUncaughtExceptionHandler.uncaughtException(t)
> }
> {code}
> [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
>  first excludes Scala 
> [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
>  which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
> {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
> {{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
> {{NotImplementedError}} and {{ControlThrowable}}.
> Remaining are {{Error}} s such as {{StackOverflowError extends 
> VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which 
> occur in the aforementioned scenarios.  
> {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call 
> {{System.exit()}}.
> [Further 
> up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77]
>  in in {{Executor}} we see exclusions for registering 
> {{SparkUncaughtExceptionHandler}} if in local mode:
> {code}
>   if (!isLocal) {
> // Setup an uncaught exception handler for non-local mode.
> // Make any thread terminations due to uncaught exceptions kill the entire
> // executor process to avoid surprising stalls.
> Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
>   }
> {code}
> This same exclusion must be applied for local mode for "fatal" errors - 
> cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should 
> decide.
> A minimal test-case is supplied.  It installs a logging {{SecurityManager}} 
> to confirm that {{System.exit()}} was called from 
> {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}.  It 
> also hints at the workaround - install your own {{SecurityManager}} and 
> inspect the current stack in {{checkExit()}} to prevent Spark from exiting 
> the JVM.
> Test-case: https://github.com/javabrett/SPARK-15685 .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2017-11-11 Thread Brett Randall (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brett Randall updated SPARK-15685:
--
Fix Version/s: 2.0.2

> StackOverflowError (VirtualMachineError) or NoClassDefFoundError 
> (LinkageError) should not System.exit() in local mode
> --
>
> Key: SPARK-15685
> URL: https://issues.apache.org/jira/browse/SPARK-15685
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>Priority: Critical
> Fix For: 2.0.2
>
>
> Spark, when running in local mode, can encounter certain types of {{Error}} 
> exceptions in developer-code or third-party libraries and call 
> {{System.exit()}}, potentially killing a long-running JVM/service.  The 
> caller should decide on the exception-handling and whether the error should 
> be deemed fatal.
> *Consider this scenario:*
> * Spark is being used in local master mode within a long-running JVM 
> microservice, e.g. a Jetty instance.
> * A task is run.  The task errors with particular types of unchecked 
> throwables:
> ** a) there some bad code and/or bad data that exposes a bug where there's 
> unterminated recursion, leading to a {{StackOverflowError}}, or
> ** b) a particular not-often used function is called - there's a packaging 
> error with the service, a third-party library is missing some dependencies, a 
> {{NoClassDefFoundError}} is found.
> *Expected behaviour:* Since we are running in local mode, we might expect 
> some unchecked exception to be thrown, to be optionally-handled by the Spark 
> caller.  In the case of Jetty, a request thread or some other background 
> worker thread might handle the exception or not, the thread might exit or 
> note an error.  The caller should decide how the error is handled.
> *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
> microservice is down and must be restarted.
> *Consequence:* Any local code or third-party library might cause Spark to 
> exit the long-running JVM/microservice, so Spark can be a problem in this 
> architecture.  I have seen this now on three separate occasions, leading to 
> service-down bug reports.
> *Analysis:*
> The line of code that seems to be the problem is: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405
> {code}
> // Don't forcibly exit unless the exception was inherently fatal, to avoid
> // stopping other tasks unnecessarily.
> if (Utils.isFatalError(t)) {
> SparkUncaughtExceptionHandler.uncaughtException(t)
> }
> {code}
> [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
>  first excludes Scala 
> [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
>  which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
> {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
> {{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
> {{NotImplementedError}} and {{ControlThrowable}}.
> Remaining are {{Error}} s such as {{StackOverflowError extends 
> VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which 
> occur in the aforementioned scenarios.  
> {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call 
> {{System.exit()}}.
> [Further 
> up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77]
>  in in {{Executor}} we see exclusions for registering 
> {{SparkUncaughtExceptionHandler}} if in local mode:
> {code}
>   if (!isLocal) {
> // Setup an uncaught exception handler for non-local mode.
> // Make any thread terminations due to uncaught exceptions kill the entire
> // executor process to avoid surprising stalls.
> Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
>   }
> {code}
> This same exclusion must be applied for local mode for "fatal" errors - 
> cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should 
> decide.
> A minimal test-case is supplied.  It installs a logging {{SecurityManager}} 
> to confirm that {{System.exit()}} was called from 
> {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}.  It 
> also hints at the workaround - install your own {{SecurityManager}} and 
> inspect the current stack in {{checkExit()}} to prevent Spark from exiting 
> the JVM.
> Test-case: https://github.com/javabrett/SPARK-15685 .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-uns

[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-05 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315938#comment-15315938
 ] 

Brett Randall commented on SPARK-15723:
---

Thanks for merging.  And thanks for the Scala repl test - I can confirm that 
this is driven by a combination of *both* default TimeZone and default Locale - 
the default Locale impacts the interpretation of the short TZ code, which makes 
sense.

{{Australia/Sydney/en_AU}} -> {color:red}*false*{color}
{noformat}
scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=AU < val time = (new 
java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424413277190

scala> time == 1424470877190L
res0: Boolean = false
{noformat}

{{Australia/Sydney/en_US}} -> {color:red}*false*{color}
{noformat}
scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=US < val time = (new 
java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424413277190

scala> time == 1424470877190L
res0: Boolean = false
{noformat}

{{America/New_York/en_US}} -> {color:green}*true*{color}
{noformat}
scala -J-Duser.timezone="America/New_York" -J-Duser.country=US < val time = (new 
java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424470877190

scala> time == 1424470877190L
res0: Boolean = true
{noformat}

So you were correct - this _can_ be disambiguated by applying a bias to the SDF 
in the code, but this would be necessarily a fixed bias, and it has to be done 
with a {{Calendar}} not a {{TimeZone}}:

{code}
sdf.setCalendar(Calendar.getInstance(TimeZone.getTimeZone("America/New_York"), 
new Locale("en_US")))
{code}

I'm not certain this is better or more correct though, but it would remove any 
ambiguity in the short TZ codes - could be documented - all short TZ codes are 
evaluated as if they were in this default TZ/Locale.  That might upset someone 
deploying that wants {{MST}} = Malaysia Standard Time and not Mountain Time.  
Make a note here if you think it is worth pursuing further, but I suspect we 
just have to honour the local env defaults and discourage abbreviated TZs.  And 
the test fix is merged now, so all-good, thanks.

> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ 
> name
> --
>
> Key: SPARK-15723
> URL: https://issues.apache.org/jira/browse/SPARK-15723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>Assignee: Brett Randall
>Priority: Minor
>  Labels: test
> Fix For: 1.6.2, 2.0.0
>
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
> new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the 
> local default timezone causes {{EST}} to be interpreted as something other 
> than US Eastern Standard Time.  If your local timezone is 
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} 
> when interpreting short zone names.  According to the {{TimeZone}} javadoc, 
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-04 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315725#comment-15315725
 ] 

Brett Randall edited comment on SPARK-15723 at 6/5/16 4:49 AM:
---

Unfortunately I don't think that will help in this case, or any other case 
where there is ambiguity for a client-supplied short-code TZ name 
as-interpreted by a certain server-side local default timezone, or as shown in 
this case, in a test.

{{SimpleDateFormat}} parsing has some unusual side-effects on its underlying 
{{Calendar}} instance, which are triggered by a {{parse()}}.  There is a 
warning of this behaviour in the JavaDoc for 
{{java.text.DateFormat.setTimeZone(TimeZone)}}:

{quote}
The {{TimeZone}} set by this method may be overwritten as a result of a call to 
the parse method.
{quote}

If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either 
{{TimeZone.getDefault()}} or as later set as is done in the second attempt in 
{{SimpleDateParam}}) which does not match a timezone that is decoded from a 
short-code TZ in the parsed string, SDF makes the assumption that the best 
"local" interpretation of the abbreviation is the correct one, and updates its 
local {{Calendar}} instance to the new TZ.  So in my test-case, if my 
{{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias 
of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says 
"they must mean Eastern (Australian) Standard Time, I'll interpret as that and 
update my calendar", whereas that code running with a different default TZ will 
pick {{America/New_York}} or {{-0500}}.  It does this during the parsing and 
before the computation of the datetime, so the result is in the guessed 
timezone, not any timezone you pre-seed with {{setTimeZone()}} or even 
{{setCalendar()}} - they will always be overwritten if the TZ is parsed.  You 
could block the update in an SDF sub-class, but you are still left with the 
same problem - how should {{EST}}, which is necessarily ambiguous according to 
the old list of short codes, be interpreted?

The JDK code is in method 
{{java.text.SimpleDateFormat.subParseZoneString\(String, int, 
CalendarBuilder\)}}, where it can be seen to call {{setTimeZone\(...\)}} on 
itself if it parses a TZ that differs from the existing SDF one.

So any pre-setting of a {{TimeZone}} or {{Calendar}} is moot if you allow the 
short-TZ code to be parsed - it is ambiguous for many values, but it is the 
caller's fault for using a deprecated short TZ form.  The result completely 
depends on {{TimeZone.getDefault()}}, which you don't want to change or 
rely-on.  It is unfortunate that SDF does this, but these short-forms are 
completely deprecated anyway.  Clients should not be using them - they should 
be using an RFC822 form, which avoids ambiguity.

Setting a fixed TZ and removing the opportunity to provide an ambiguous code 
would necessitate an API change, and you would no-longer be able to specify a 
TZ.  That might be OK, but being an API change you might want deprecation, I 
don't know.


was (Author: brett_s_r):
Unfortunately I don't think that will help in this case, or any other case 
where there is ambiguity for a client-supplied short-code TZ name 
as-interpreted by a certain server-side local default timezone, or as shown in 
this case, in a test.

{{SimpleDateFormat}} parsing has some unusual side-effects on its underlying 
{{Calendar}} instance, which are triggered by a {{parse()}}.  There is a 
warning of this behaviour in the JavaDoc for 
{{java.text.DateFormat.setTimeZone(TimeZone)}}:

{quote}
The {{TimeZone}} set by this method may be overwritten as a result of a call to 
the parse method.
{quote}

If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either 
{{TimeZone.getDefault()}} or as later set as is done in the second attempt in 
{{SimpleDateParam}}) which does not match a timezone that is decoded from a 
short-code TZ in the parsed string, SDF makes the assumption that the best 
"local" interpretation of the abbreviation is the correct one, and updates its 
local {{Calendar}} instance to the new TZ.  So in my test-case, if my 
{{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias 
of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says 
"they must mean Eastern (Australian) Standard Time, I'll interpret as that and 
update my calendar", whereas that code running with a different default TZ will 
pick {{America/New_York}} or {{-0500}}.  It does this during the parsing and 
before the computation of the datetime, so the result is in the guessed 
timezone, not any timezone you pre-seed with {{setTimeZone()}} or even 
{{setCalendar()}} - they will always be overwritten if the TZ is parsed.  You 
could block the update in an SDF sub-class, but you are still left with the 
same problem - how should {{ES

[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-04 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315725#comment-15315725
 ] 

Brett Randall commented on SPARK-15723:
---

Unfortunately I don't think that will help in this case, or any other case 
where there is ambiguity for a client-supplied short-code TZ name 
as-interpreted by a certain server-side local default timezone, or as shown in 
this case, in a test.

{{SimpleDateFormat}} parsing has some unusual side-effects on its underlying 
{{Calendar}} instance, which are triggered by a {{parse()}}.  There is a 
warning of this behaviour in the JavaDoc for 
{{java.text.DateFormat.setTimeZone(TimeZone)}}:

{quote}
The {{TimeZone}} set by this method may be overwritten as a result of a call to 
the parse method.
{quote}

If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either 
{{TimeZone.getDefault()}} or as later set as is done in the second attempt in 
{{SimpleDateParam}}) which does not match a timezone that is decoded from a 
short-code TZ in the parsed string, SDF makes the assumption that the best 
"local" interpretation of the abbreviation is the correct one, and updates its 
local {{Calendar}} instance to the new TZ.  So in my test-case, if my 
{{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias 
of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says 
"they must mean Eastern (Australian) Standard Time, I'll interpret as that and 
update my calendar", whereas that code running with a different default TZ will 
pick {{America/New_York}} or {{-0500}}.  It does this during the parsing and 
before the computation of the datetime, so the result is in the guessed 
timezone, not any timezone you pre-seed with {{setTimeZone()}} or even 
{{setCalendar()}} - they will always be overwritten if the TZ is parsed.  You 
could block the update in an SDF sub-class, but you are still left with the 
same problem - how should {{EST}}, which is necessarily ambiguous according to 
the old list of short codes, be interpreted?

The JDK code is in method 
{{java.text.SimpleDateFormat.subParseZoneString\(String, int, 
CalendarBuilder\)}}, where it can be seen to call {{setTimeZone\(...\)}} on 
itself if it parses a TZ that differs from the existing SDF one.

So any pre-setting of a {{TimeZone}} or {{Calendar}} is moot if you allow the 
short-TZ code to be parsed - it is ambiguous for many values, but it is the 
caller's fault for using a deprecated short TZ form.  The result completely 
depends on {{TimeZone.getDefault()}}, which you don't want to change or 
rely-on.  It is unfortunate that SDF does this, but these short-forms are 
completely deprecated anyway.  Clients should not be using them - they should 
be using an RFC822 form, which avoids ambiguity.


> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ 
> name
> --
>
> Key: SPARK-15723
> URL: https://issues.apache.org/jira/browse/SPARK-15723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>Priority: Minor
>  Labels: test
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
> new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the 
> local default timezone causes {{EST}} to be interpreted as something other 
> than US Eastern Standard Time.  If your local timezone is 
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} 
> when interpreting short zone names.  According to the {{TimeZone}} javadoc, 
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-02 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313479#comment-15313479
 ] 

Brett Randall commented on SPARK-15723:
---

Yes the fix is to the test only.

I looked at where {{SimpleDateParam}} is used, which is to bind {{minDate}} and 
{{maxDate}} parameters in 
{{core/src/main/scala/org/apache/spark/status/api/v1/ApplicationListResource.scala}}.
  Not being familiar with the use of the API, but I anticipate that these are 
some values that could be client/user supplied, in which case the onus would be 
on the caller to use non-deprecated and unambiguous TZ forms, in case the 
server was deployed in region that could make their abbreviation ambiguous.  
{{HistoryServerSuite}} has a {{GMT}} example, which is fine.  Possibly the 
notes in {{docs/monitoring.md}} could be clarified in this regard.

> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ 
> name
> --
>
> Key: SPARK-15723
> URL: https://issues.apache.org/jira/browse/SPARK-15723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>Priority: Minor
>  Labels: test
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
> new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the 
> local default timezone causes {{EST}} to be interpreted as something other 
> than US Eastern Standard Time.  If your local timezone is 
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} 
> when interpreting short zone names.  According to the {{TimeZone}} javadoc, 
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2016-06-01 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311801#comment-15311801
 ] 

Brett Randall commented on SPARK-15685:
---

I found an interesting commit for this topic: 
https://github.com/apache/spark/commit/af1f4da76268115c5a4cc3035d3236ad27f7240a 
for SPARK-13904 .  This talks about variations between different 
cluster-managers, and the desire not to necessarily take-down the parent 
process (AKA the JVM) during task failure or clean-up.

I think the same applies here for Spark running in local mode, so I wonder how 
this change can be applied to local mode.  {{Executor#killAllTasks}} has been 
coded, but is not yet used anywhere.  Is it just a matter of pulling-up the 
{{exitExecutor()}} method into {{ExecutorBackend}} so that 
{{LocalSchedulerBackendEndpoint}} can override it and call {{killAllTasks}} 
rather than {{System.exit()}}?

> StackOverflowError (VirtualMachineError) or NoClassDefFoundError 
> (LinkageError) should not System.exit() in local mode
> --
>
> Key: SPARK-15685
> URL: https://issues.apache.org/jira/browse/SPARK-15685
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>Priority: Critical
>
> Spark, when running in local mode, can encounter certain types of {{Error}} 
> exceptions in developer-code or third-party libraries and call 
> {{System.exit()}}, potentially killing a long-running JVM/service.  The 
> caller should decide on the exception-handling and whether the error should 
> be deemed fatal.
> *Consider this scenario:*
> * Spark is being used in local master mode within a long-running JVM 
> microservice, e.g. a Jetty instance.
> * A task is run.  The task errors with particular types of unchecked 
> throwables:
> ** a) there some bad code and/or bad data that exposes a bug where there's 
> unterminated recursion, leading to a {{StackOverflowError}}, or
> ** b) a particular not-often used function is called - there's a packaging 
> error with the service, a third-party library is missing some dependencies, a 
> {{NoClassDefFoundError}} is found.
> *Expected behaviour:* Since we are running in local mode, we might expect 
> some unchecked exception to be thrown, to be optionally-handled by the Spark 
> caller.  In the case of Jetty, a request thread or some other background 
> worker thread might handle the exception or not, the thread might exit or 
> note an error.  The caller should decide how the error is handled.
> *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
> microservice is down and must be restarted.
> *Consequence:* Any local code or third-party library might cause Spark to 
> exit the long-running JVM/microservice, so Spark can be a problem in this 
> architecture.  I have seen this now on three separate occasions, leading to 
> service-down bug reports.
> *Analysis:*
> The line of code that seems to be the problem is: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405
> {code}
> // Don't forcibly exit unless the exception was inherently fatal, to avoid
> // stopping other tasks unnecessarily.
> if (Utils.isFatalError(t)) {
> SparkUncaughtExceptionHandler.uncaughtException(t)
> }
> {code}
> [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
>  first excludes Scala 
> [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
>  which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
> {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
> {{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
> {{NotImplementedError}} and {{ControlThrowable}}.
> Remaining are {{Error}} s such as {{StackOverflowError extends 
> VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which 
> occur in the aforementioned scenarios.  
> {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call 
> {{System.exit()}}.
> [Further 
> up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77]
>  in in {{Executor}} we see exclusions for registering 
> {{SparkUncaughtExceptionHandler}} if in local mode:
> {code}
>   if (!isLocal) {
> // Setup an uncaught exception handler for non-local mode.
> // Make any thread terminations due to uncaught exceptions kill the entire
> // executor process to avoid surprising stalls.
> Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
>   }
> {code}
> This same exclusion must be 

[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-01 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311717#comment-15311717
 ] 

Brett Randall commented on SPARK-15723:
---

https://github.com/apache/spark/pull/13462

> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ 
> name
> --
>
> Key: SPARK-15723
> URL: https://issues.apache.org/jira/browse/SPARK-15723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>Priority: Minor
>  Labels: test
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
> new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the 
> local default timezone causes {{EST}} to be interpreted as something other 
> than US Eastern Standard Time.  If your local timezone is 
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} 
> when interpreting short zone names.  According to the {{TimeZone}} javadoc, 
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-01 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311701#comment-15311701
 ] 

Brett Randall commented on SPARK-15723:
---

I will propose a fix.

Note that to reproduce this problem in any locale/timezone, you can modify the 
{{scalatest-maven-plugin}} {{argLine}} to add a timezone:

{code:xml}-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Duser.timezone="Australia/Sydney"
{code}

and run {{$ mvn test 
-DwildcardSuites=org.apache.spark.status.api.v1.SimpleDateParamSuite 
-Dtest=none}}.  Equally this will fix it in an effected timezone:

{code:xml}-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Duser.timezone="America/New_York"
{code}

> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ 
> name
> --
>
> Key: SPARK-15723
> URL: https://issues.apache.org/jira/browse/SPARK-15723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>Priority: Minor
>  Labels: test
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
> new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the 
> local default timezone causes {{EST}} to be interpreted as something other 
> than US Eastern Standard Time.  If your local timezone is 
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} 
> when interpreting short zone names.  According to the {{TimeZone}} javadoc, 
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

2016-06-01 Thread Brett Randall (JIRA)
Brett Randall created SPARK-15723:
-

 Summary: SimpleDateParamSuite test is locale-fragile and relies on 
deprecated short TZ name
 Key: SPARK-15723
 URL: https://issues.apache.org/jira/browse/SPARK-15723
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.1
Reporter: Brett Randall
Priority: Minor


{{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:

{code}
new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
(1424470877190L)
{code}

This test is fragile and fails when executing in an environment where the local 
default timezone causes {{EST}} to be interpreted as something other than US 
Eastern Standard Time.  If your local timezone is {{Australia/Sydney}}, then 
{{EST}} equates to {{GMT+10}} and you will get:

{noformat}
date parsing *** FAILED ***
1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
{noformat}

In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} 
when interpreting short zone names.  According to the {{TimeZone}} javadoc, 
they ought not be used:

{quote}
Three-letter time zone IDs

For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
because the same abbreviation is often used for multiple time zones (for 
example, "CST" could be U.S. "Central Standard Time" and "China Standard 
Time"), and the Java platform can then only recognize one of them.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2016-06-01 Thread Brett Randall (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311149#comment-15311149
 ] 

Brett Randall commented on SPARK-15685:
---

Thanks [~srowen].

The {{SecurityManager}} is only in-play here for:

* Capturing the problem in a unit-test and making an assertion that Spark 
should not exit the JVM when running in local mode.  In the unit test it is 
removed again and replaced with the preexisting {{SecurityManager}} in the test 
finally block.
* As an unusual workaround in light of this problem, to prevent microservices 
server processes from being shutdown by Spark as it handles an error.

There's no suggestion here to install a {{SecurityManager}}.  That said, I 
don't expect it to have any performance impact - it only comes into play when 
some code attempts to call {{System.exit}}.  It shouldn't be necessary other 
than to block callers of {{System.exit()}}, and it would prevent an existing 
{{SecurityManager}}.

The long-running application here is a for example a microservice, 
orchestrating calculations which are performed by Spark.  So the service is 
long-running, but the Spark jobs are short-running asynchronous tasks.  This 
seems a reasonable use of Spark to me, short batch jobs running embedded in the 
local JVM, orchestrated by a long-running service.  For larger jobs, the 
service switches to use cluster mode and sends the Spark jobs off to a remote 
JVM e.g. via YARN.

I don't think it is typical for a calculations framework or batch-job framework 
to decide to shutdown the JVM with {{System.exit()}} rather than simply 
aborting the task thread and throwing some exception or notification up the 
stack for the caller to handle.

> StackOverflowError (VirtualMachineError) or NoClassDefFoundError 
> (LinkageError) should not System.exit() in local mode
> --
>
> Key: SPARK-15685
> URL: https://issues.apache.org/jira/browse/SPARK-15685
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Brett Randall
>Priority: Critical
>
> Spark, when running in local mode, can encounter certain types of {{Error}} 
> exceptions in developer-code or third-party libraries and call 
> {{System.exit()}}, potentially killing a long-running JVM/service.  The 
> caller should decide on the exception-handling and whether the error should 
> be deemed fatal.
> *Consider this scenario:*
> * Spark is being used in local master mode within a long-running JVM 
> microservice, e.g. a Jetty instance.
> * A task is run.  The task errors with particular types of unchecked 
> throwables:
> ** a) there some bad code and/or bad data that exposes a bug where there's 
> unterminated recursion, leading to a {{StackOverflowError}}, or
> ** b) a particular not-often used function is called - there's a packaging 
> error with the service, a third-party library is missing some dependencies, a 
> {{NoClassDefFoundError}} is found.
> *Expected behaviour:* Since we are running in local mode, we might expect 
> some unchecked exception to be thrown, to be optionally-handled by the Spark 
> caller.  In the case of Jetty, a request thread or some other background 
> worker thread might handle the exception or not, the thread might exit or 
> note an error.  The caller should decide how the error is handled.
> *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
> microservice is down and must be restarted.
> *Consequence:* Any local code or third-party library might cause Spark to 
> exit the long-running JVM/microservice, so Spark can be a problem in this 
> architecture.  I have seen this now on three separate occasions, leading to 
> service-down bug reports.
> *Analysis:*
> The line of code that seems to be the problem is: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405
> {code}
> // Don't forcibly exit unless the exception was inherently fatal, to avoid
> // stopping other tasks unnecessarily.
> if (Utils.isFatalError(t)) {
> SparkUncaughtExceptionHandler.uncaughtException(t)
> }
> {code}
> [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
>  first excludes Scala 
> [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
>  which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
> {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
> {{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
> {{NotImplementedError}} and {{ControlThrowable}}.
> Remaining are {{Error}} s such as {{StackOverflowError extend

[jira] [Updated] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2016-05-31 Thread Brett Randall (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brett Randall updated SPARK-15685:
--
Description: 
Spark, when running in local mode, can encounter certain types of {{Error}} 
exceptions in developer-code or third-party libraries and call 
{{System.exit()}}, potentially killing a long-running JVM/service.  The caller 
should decide on the exception-handling and whether the error should be deemed 
fatal.

*Consider this scenario:*

* Spark is being used in local master mode within a long-running JVM 
microservice, e.g. a Jetty instance.
* A task is run.  The task errors with particular types of unchecked throwables:
** a) there some bad code and/or bad data that exposes a bug where there's 
unterminated recursion, leading to a {{StackOverflowError}}, or
** b) a particular not-often used function is called - there's a packaging 
error with the service, a third-party library is missing some dependencies, a 
{{NoClassDefFoundError}} is found.

*Expected behaviour:* Since we are running in local mode, we might expect some 
unchecked exception to be thrown, to be optionally-handled by the Spark caller. 
 In the case of Jetty, a request thread or some other background worker thread 
might handle the exception or not, the thread might exit or note an error.  The 
caller should decide how the error is handled.

*Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
microservice is down and must be restarted.

*Consequence:* Any local code or third-party library might cause Spark to exit 
the long-running JVM/microservice, so Spark can be a problem in this 
architecture.  I have seen this now on three separate occasions, leading to 
service-down bug reports.

*Analysis:*

The line of code that seems to be the problem is: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405

{code}
// Don't forcibly exit unless the exception was inherently fatal, to avoid
// stopping other tasks unnecessarily.
if (Utils.isFatalError(t)) {
SparkUncaughtExceptionHandler.uncaughtException(t)
}
{code}

[Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
 first excludes Scala 
[NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
 which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
{{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
{{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
{{NotImplementedError}} and {{ControlThrowable}}.

Remaining are {{Error}} s such as {{StackOverflowError extends 
VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which 
occur in the aforementioned scenarios.  
{{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call 
{{System.exit()}}.

[Further 
up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77]
 in in {{Executor}} we see exclusions for registering 
{{SparkUncaughtExceptionHandler}} if in local mode:

{code}
  if (!isLocal) {
// Setup an uncaught exception handler for non-local mode.
// Make any thread terminations due to uncaught exceptions kill the entire
// executor process to avoid surprising stalls.
Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
  }
{code}

This same exclusion must be applied for local mode for "fatal" errors - cannot 
afford to shutdown the enclosing JVM (e.g. Jetty), the caller should decide.

A minimal test-case is supplied.  It installs a logging {{SecurityManager}} to 
confirm that {{System.exit()}} was called from 
{{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}.  It also 
hints at the workaround - install your own {{SecurityManager}} and inspect the 
current stack in {{checkExit()}} to prevent Spark from exiting the JVM.

Test-case: https://github.com/javabrett/SPARK-15685 .

  was:
Spark, when running in local mode, can encounter certain types of {{Error}} 
exceptions in developer-code or third-party libraries and call 
{{System.exit()}}, potentially killing a long-running JVM/service.  The caller 
should decide on the exception-handling and whether the error should be deemed 
fatal.

*Consider this scenario:*

* Spark is being used in local master mode within a long-running JVM 
microservice, e.g. a Jetty instance.
* A task is run.  The task errors with particular types of unchecked throwables:
** a) there some bad code and/or bad data that exposes a bug where there's 
unterminated recursion, leading to a {{StackOverflowError}}, or
** b) a particular not-often used function is called - there's a packaging 
error with the service, a third-party library is missing some dependencies, a 
{{NoClassDefFoundError}} is found.

*Expected behavi

[jira] [Updated] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2016-05-31 Thread Brett Randall (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brett Randall updated SPARK-15685:
--
Description: 
Spark, when running in local mode, can encounter certain types of {{Error}} 
exceptions in developer-code or third-party libraries and call 
{{System.exit()}}, potentially killing a long-running JVM/service.  The caller 
should decide on the exception-handling and whether the error should be deemed 
fatal.

*Consider this scenario:*

* Spark is being used in local master mode within a long-running JVM 
microservice, e.g. a Jetty instance.
* A task is run.  The task errors with particular types of unchecked throwables:
** a) there some bad code and/or bad data that exposes a bug where there's 
unterminated recursion, leading to a {{StackOverflowError}}, or
** b) a particular not-often used function is called - there's a packaging 
error with the service, a third-party library is missing some dependencies, a 
{{NoClassDefFoundError}} is found.

*Expected behaviour:* Since we are running in local mode, we might expect some 
unchecked exception to be thrown, to be optionally-handled by the Spark caller. 
 In the case of Jetty, a request thread or some other background worker thread 
might handle the exception or not, the thread might exit or note an error.  The 
caller should decide how the error is handled.

*Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
microservice is down and must be restarted.

*Consequence:* Any local code or third-party library might cause Spark to exit 
the long-running JVM/microservice, so Spark can be a problem in this 
architecture.  I have seen this now on three separate occasions, leading to 
service-down bug reports.

*Analysis:*

The line of code that seems to be the problem is: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405

{code}
// Don't forcibly exit unless the exception was inherently fatal, to avoid
// stopping other tasks unnecessarily.
if (Utils.isFatalError(t)) {
SparkUncaughtExceptionHandler.uncaughtException(t)
}
{code}

[Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
 first excludes Scala 
[NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
 which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
{{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
{{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
{{NotImplementedError}} and {{ControlThrowable}}.

Remaining are {{Error}}s such as {{StackOverflowError extends 
VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which 
occur in the aforementioned scenarios.  
{{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call 
{{System.exit()}}.

[Further 
up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77]
 in in {{Executor}} we see exclusions for registering 
{{SparkUncaughtExceptionHandler}} if in local mode:

{code}
  if (!isLocal) {
// Setup an uncaught exception handler for non-local mode.
// Make any thread terminations due to uncaught exceptions kill the entire
// executor process to avoid surprising stalls.
Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
  }
{code}

This same exclusion must be applied for local mode for "fatal" errors - cannot 
afford to shutdown the enclosing JVM (e.g. Jetty), the caller should decide.

A minimal test-case is supplied.  It installs a logging {{SecurityManager}} to 
confirm that {{System.exit()}} was called from 
{{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}.  It also 
hints at the workaround - install your own {{SecurityManager}} and inspect the 
current stack in {{checkExit()}} to prevent Spark from exiting the JVM.

Test-case: https://github.com/javabrett/SPARK-15685 .

  was:
Spark, when running in local mode, can encounter certain types of {{Error}} 
exceptions in developer-code or third-party libraries and call 
{{System.exit()}}, potentially killing a long-running JVM/service.  The caller 
should decide on the exception-handling and whether the error should be deemed 
fatal.

*Consider this scenario:*

* Spark is being used in local master mode within a long-running JVM 
microservice, e.g. a Jetty instance.
* A task is run.  The task errors with particular types of unchecked throwables:
** a) there some bad code and/or bad data that exposes a bug where there's 
unterminated recursion, leading to a {{StackOverflowError}}, or
** b) a particular not-often used function is called - there's a packaging 
error with the service, a third-party library is missing some dependencies, a 
{{NoClassDefFoundError}} is found.

*Expected behavio

[jira] [Created] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode

2016-05-31 Thread Brett Randall (JIRA)
Brett Randall created SPARK-15685:
-

 Summary: StackOverflowError (VirtualMachineError) or 
NoClassDefFoundError (LinkageError) should not System.exit() in local mode
 Key: SPARK-15685
 URL: https://issues.apache.org/jira/browse/SPARK-15685
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.1
Reporter: Brett Randall
Priority: Critical


Spark, when running in local mode, can encounter certain types of {{Error}} 
exceptions in developer-code or third-party libraries and call 
{{System.exit()}}, potentially killing a long-running JVM/service.  The caller 
should decide on the exception-handling and whether the error should be deemed 
fatal.

*Consider this scenario:*

* Spark is being used in local master mode within a long-running JVM 
microservice, e.g. a Jetty instance.
* A task is run.  The task errors with particular types of unchecked throwables:
** a) there some bad code and/or bad data that exposes a bug where there's 
unterminated recursion, leading to a {{StackOverflowError}}, or
** b) a particular not-often used function is called - there's a packaging 
error with the service, a third-party library is missing some dependencies, a 
{{NoClassDefFoundError}} is found.

*Expected behaviour:* Since we are running in local mode, we might expect some 
unchecked exception to be thrown, to be optionally-handled by the Spark caller. 
 In the case of Jetty, a request thread or some other background worker thread 
might handle the exception or not, the thread might exit or note an error.  The 
caller should decide how the error is handled.

*Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty 
microservice is down and must be restarted.

*Consequence:* Any local code or third-party library might cause Spark to exit 
the long-running JVM/microservice, so Spark can be a problem in this 
architecture.  I have seen this now on three separate occasions, leading to 
service-down bug reports.

*Analysis:*

The line of code that seems to be the problem is: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405

{code}
// Don't forcibly exit unless the exception was inherently fatal, to avoid
// stopping other tasks unnecessarily.
if (Utils.isFatalError(t)) {
SparkUncaughtExceptionHandler.uncaughtException(t)
}
{code}

[Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818]
 first excludes Scala 
[NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31],
 which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, 
{{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}.  
{{Utils.isFatalError()}} further excludes {{InterruptedException}}, 
{{NotImplementedError}} and {{ControlThrowable}}.

Remaining are {{Error}}s such as {{StackOverflowError extends 
VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which 
occur in the aforementioned scenarios.  
{{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call 
{{System.exit()}}.

[Further 
up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77]
 in in {{Executor}} we see exclusions for registering 
{{SparkUncaughtExceptionHandler}} if in local mode:

{code}
  if (!isLocal) {
// Setup an uncaught exception handler for non-local mode.
// Make any thread terminations due to uncaught exceptions kill the entire
// executor process to avoid surprising stalls.
Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler)
  }
{code}

This same exclusion must be applied for local mode for "fatal" errors - cannot 
afford to shutdown the enclosing JVM (e.g. Jetty), the caller should decide.

A minimal test-case is supplied.  It installs a logging {{SecurityManager}} to 
confirm that {{System.exit()}} was called from 
{{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}.  It also 
hints at the workaround - install your own {{SecurityManager}} and inspect the 
current stack in {{checkExit()}} to prevent Spark from exiting the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org