[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248929#comment-16248929 ] Brett Randall commented on SPARK-15685: --- SPARK-16304 fixes part of this in commit https://github.com/apache/spark/commit/773fbfef1929b64229fbf97a91c45cdb1ec1fb1f . A {{StackOverflowError}} would still abort the VM. > StackOverflowError (VirtualMachineError) or NoClassDefFoundError > (LinkageError) should not System.exit() in local mode > -- > > Key: SPARK-15685 > URL: https://issues.apache.org/jira/browse/SPARK-15685 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall > Fix For: 2.0.2 > > > Spark, when running in local mode, can encounter certain types of {{Error}} > exceptions in developer-code or third-party libraries and call > {{System.exit()}}, potentially killing a long-running JVM/service. The > caller should decide on the exception-handling and whether the error should > be deemed fatal. > *Consider this scenario:* > * Spark is being used in local master mode within a long-running JVM > microservice, e.g. a Jetty instance. > * A task is run. The task errors with particular types of unchecked > throwables: > ** a) there some bad code and/or bad data that exposes a bug where there's > unterminated recursion, leading to a {{StackOverflowError}}, or > ** b) a particular not-often used function is called - there's a packaging > error with the service, a third-party library is missing some dependencies, a > {{NoClassDefFoundError}} is found. > *Expected behaviour:* Since we are running in local mode, we might expect > some unchecked exception to be thrown, to be optionally-handled by the Spark > caller. In the case of Jetty, a request thread or some other background > worker thread might handle the exception or not, the thread might exit or > note an error. The caller should decide how the error is handled. > *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty > microservice is down and must be restarted. > *Consequence:* Any local code or third-party library might cause Spark to > exit the long-running JVM/microservice, so Spark can be a problem in this > architecture. I have seen this now on three separate occasions, leading to > service-down bug reports. > *Analysis:* > The line of code that seems to be the problem is: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 > {code} > // Don't forcibly exit unless the exception was inherently fatal, to avoid > // stopping other tasks unnecessarily. > if (Utils.isFatalError(t)) { > SparkUncaughtExceptionHandler.uncaughtException(t) > } > {code} > [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] > first excludes Scala > [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], > which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, > {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. > {{Utils.isFatalError()}} further excludes {{InterruptedException}}, > {{NotImplementedError}} and {{ControlThrowable}}. > Remaining are {{Error}} s such as {{StackOverflowError extends > VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which > occur in the aforementioned scenarios. > {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call > {{System.exit()}}. > [Further > up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] > in in {{Executor}} we see exclusions for registering > {{SparkUncaughtExceptionHandler}} if in local mode: > {code} > if (!isLocal) { > // Setup an uncaught exception handler for non-local mode. > // Make any thread terminations due to uncaught exceptions kill the entire > // executor process to avoid surprising stalls. > Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) > } > {code} > This same exclusion must be applied for local mode for "fatal" errors - > cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should > decide. > A minimal test-case is supplied. It installs a logging {{SecurityManager}} > to confirm that {{System.exit()}} was called from > {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}. It > also hints at the workaround - install your own {{SecurityManager}} and > inspect the current stack in {{checkExit()}} to prevent Spark from exiting > the JVM. > Test-case: https://github.com/javabrett/SPA
[jira] [Resolved] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brett Randall resolved SPARK-15685. --- Resolution: Fixed Fix Version/s: 2.0.2 > StackOverflowError (VirtualMachineError) or NoClassDefFoundError > (LinkageError) should not System.exit() in local mode > -- > > Key: SPARK-15685 > URL: https://issues.apache.org/jira/browse/SPARK-15685 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall > Fix For: 2.0.2 > > > Spark, when running in local mode, can encounter certain types of {{Error}} > exceptions in developer-code or third-party libraries and call > {{System.exit()}}, potentially killing a long-running JVM/service. The > caller should decide on the exception-handling and whether the error should > be deemed fatal. > *Consider this scenario:* > * Spark is being used in local master mode within a long-running JVM > microservice, e.g. a Jetty instance. > * A task is run. The task errors with particular types of unchecked > throwables: > ** a) there some bad code and/or bad data that exposes a bug where there's > unterminated recursion, leading to a {{StackOverflowError}}, or > ** b) a particular not-often used function is called - there's a packaging > error with the service, a third-party library is missing some dependencies, a > {{NoClassDefFoundError}} is found. > *Expected behaviour:* Since we are running in local mode, we might expect > some unchecked exception to be thrown, to be optionally-handled by the Spark > caller. In the case of Jetty, a request thread or some other background > worker thread might handle the exception or not, the thread might exit or > note an error. The caller should decide how the error is handled. > *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty > microservice is down and must be restarted. > *Consequence:* Any local code or third-party library might cause Spark to > exit the long-running JVM/microservice, so Spark can be a problem in this > architecture. I have seen this now on three separate occasions, leading to > service-down bug reports. > *Analysis:* > The line of code that seems to be the problem is: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 > {code} > // Don't forcibly exit unless the exception was inherently fatal, to avoid > // stopping other tasks unnecessarily. > if (Utils.isFatalError(t)) { > SparkUncaughtExceptionHandler.uncaughtException(t) > } > {code} > [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] > first excludes Scala > [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], > which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, > {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. > {{Utils.isFatalError()}} further excludes {{InterruptedException}}, > {{NotImplementedError}} and {{ControlThrowable}}. > Remaining are {{Error}} s such as {{StackOverflowError extends > VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which > occur in the aforementioned scenarios. > {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call > {{System.exit()}}. > [Further > up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] > in in {{Executor}} we see exclusions for registering > {{SparkUncaughtExceptionHandler}} if in local mode: > {code} > if (!isLocal) { > // Setup an uncaught exception handler for non-local mode. > // Make any thread terminations due to uncaught exceptions kill the entire > // executor process to avoid surprising stalls. > Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) > } > {code} > This same exclusion must be applied for local mode for "fatal" errors - > cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should > decide. > A minimal test-case is supplied. It installs a logging {{SecurityManager}} > to confirm that {{System.exit()}} was called from > {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}. It > also hints at the workaround - install your own {{SecurityManager}} and > inspect the current stack in {{checkExit()}} to prevent Spark from exiting > the JVM. > Test-case: https://github.com/javabrett/SPARK-15685 . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr
[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248776#comment-16248776 ] Brett Randall commented on SPARK-15685: --- [~srowen] this seems to be fixed in 2.0.2, but I don't know which commit is responsible. > StackOverflowError (VirtualMachineError) or NoClassDefFoundError > (LinkageError) should not System.exit() in local mode > -- > > Key: SPARK-15685 > URL: https://issues.apache.org/jira/browse/SPARK-15685 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall > > Spark, when running in local mode, can encounter certain types of {{Error}} > exceptions in developer-code or third-party libraries and call > {{System.exit()}}, potentially killing a long-running JVM/service. The > caller should decide on the exception-handling and whether the error should > be deemed fatal. > *Consider this scenario:* > * Spark is being used in local master mode within a long-running JVM > microservice, e.g. a Jetty instance. > * A task is run. The task errors with particular types of unchecked > throwables: > ** a) there some bad code and/or bad data that exposes a bug where there's > unterminated recursion, leading to a {{StackOverflowError}}, or > ** b) a particular not-often used function is called - there's a packaging > error with the service, a third-party library is missing some dependencies, a > {{NoClassDefFoundError}} is found. > *Expected behaviour:* Since we are running in local mode, we might expect > some unchecked exception to be thrown, to be optionally-handled by the Spark > caller. In the case of Jetty, a request thread or some other background > worker thread might handle the exception or not, the thread might exit or > note an error. The caller should decide how the error is handled. > *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty > microservice is down and must be restarted. > *Consequence:* Any local code or third-party library might cause Spark to > exit the long-running JVM/microservice, so Spark can be a problem in this > architecture. I have seen this now on three separate occasions, leading to > service-down bug reports. > *Analysis:* > The line of code that seems to be the problem is: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 > {code} > // Don't forcibly exit unless the exception was inherently fatal, to avoid > // stopping other tasks unnecessarily. > if (Utils.isFatalError(t)) { > SparkUncaughtExceptionHandler.uncaughtException(t) > } > {code} > [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] > first excludes Scala > [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], > which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, > {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. > {{Utils.isFatalError()}} further excludes {{InterruptedException}}, > {{NotImplementedError}} and {{ControlThrowable}}. > Remaining are {{Error}} s such as {{StackOverflowError extends > VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which > occur in the aforementioned scenarios. > {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call > {{System.exit()}}. > [Further > up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] > in in {{Executor}} we see exclusions for registering > {{SparkUncaughtExceptionHandler}} if in local mode: > {code} > if (!isLocal) { > // Setup an uncaught exception handler for non-local mode. > // Make any thread terminations due to uncaught exceptions kill the entire > // executor process to avoid surprising stalls. > Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) > } > {code} > This same exclusion must be applied for local mode for "fatal" errors - > cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should > decide. > A minimal test-case is supplied. It installs a logging {{SecurityManager}} > to confirm that {{System.exit()}} was called from > {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}. It > also hints at the workaround - install your own {{SecurityManager}} and > inspect the current stack in {{checkExit()}} to prevent Spark from exiting > the JVM. > Test-case: https://github.com/javabrett/SPARK-15685 . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brett Randall updated SPARK-15685: -- Fix Version/s: 2.0.2 > StackOverflowError (VirtualMachineError) or NoClassDefFoundError > (LinkageError) should not System.exit() in local mode > -- > > Key: SPARK-15685 > URL: https://issues.apache.org/jira/browse/SPARK-15685 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall >Priority: Critical > Fix For: 2.0.2 > > > Spark, when running in local mode, can encounter certain types of {{Error}} > exceptions in developer-code or third-party libraries and call > {{System.exit()}}, potentially killing a long-running JVM/service. The > caller should decide on the exception-handling and whether the error should > be deemed fatal. > *Consider this scenario:* > * Spark is being used in local master mode within a long-running JVM > microservice, e.g. a Jetty instance. > * A task is run. The task errors with particular types of unchecked > throwables: > ** a) there some bad code and/or bad data that exposes a bug where there's > unterminated recursion, leading to a {{StackOverflowError}}, or > ** b) a particular not-often used function is called - there's a packaging > error with the service, a third-party library is missing some dependencies, a > {{NoClassDefFoundError}} is found. > *Expected behaviour:* Since we are running in local mode, we might expect > some unchecked exception to be thrown, to be optionally-handled by the Spark > caller. In the case of Jetty, a request thread or some other background > worker thread might handle the exception or not, the thread might exit or > note an error. The caller should decide how the error is handled. > *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty > microservice is down and must be restarted. > *Consequence:* Any local code or third-party library might cause Spark to > exit the long-running JVM/microservice, so Spark can be a problem in this > architecture. I have seen this now on three separate occasions, leading to > service-down bug reports. > *Analysis:* > The line of code that seems to be the problem is: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 > {code} > // Don't forcibly exit unless the exception was inherently fatal, to avoid > // stopping other tasks unnecessarily. > if (Utils.isFatalError(t)) { > SparkUncaughtExceptionHandler.uncaughtException(t) > } > {code} > [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] > first excludes Scala > [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], > which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, > {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. > {{Utils.isFatalError()}} further excludes {{InterruptedException}}, > {{NotImplementedError}} and {{ControlThrowable}}. > Remaining are {{Error}} s such as {{StackOverflowError extends > VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which > occur in the aforementioned scenarios. > {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call > {{System.exit()}}. > [Further > up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] > in in {{Executor}} we see exclusions for registering > {{SparkUncaughtExceptionHandler}} if in local mode: > {code} > if (!isLocal) { > // Setup an uncaught exception handler for non-local mode. > // Make any thread terminations due to uncaught exceptions kill the entire > // executor process to avoid surprising stalls. > Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) > } > {code} > This same exclusion must be applied for local mode for "fatal" errors - > cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should > decide. > A minimal test-case is supplied. It installs a logging {{SecurityManager}} > to confirm that {{System.exit()}} was called from > {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}. It > also hints at the workaround - install your own {{SecurityManager}} and > inspect the current stack in {{checkExit()}} to prevent Spark from exiting > the JVM. > Test-case: https://github.com/javabrett/SPARK-15685 . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-uns
[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315938#comment-15315938 ] Brett Randall commented on SPARK-15723: --- Thanks for merging. And thanks for the Scala repl test - I can confirm that this is driven by a combination of *both* default TimeZone and default Locale - the default Locale impacts the interpretation of the short TZ code, which makes sense. {{Australia/Sydney/en_AU}} -> {color:red}*false*{color} {noformat} scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=AU < val time = (new java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time: Long = 1424413277190 scala> time == 1424470877190L res0: Boolean = false {noformat} {{Australia/Sydney/en_US}} -> {color:red}*false*{color} {noformat} scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=US < val time = (new java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time: Long = 1424413277190 scala> time == 1424470877190L res0: Boolean = false {noformat} {{America/New_York/en_US}} -> {color:green}*true*{color} {noformat} scala -J-Duser.timezone="America/New_York" -J-Duser.country=US < val time = (new java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time: Long = 1424470877190 scala> time == 1424470877190L res0: Boolean = true {noformat} So you were correct - this _can_ be disambiguated by applying a bias to the SDF in the code, but this would be necessarily a fixed bias, and it has to be done with a {{Calendar}} not a {{TimeZone}}: {code} sdf.setCalendar(Calendar.getInstance(TimeZone.getTimeZone("America/New_York"), new Locale("en_US"))) {code} I'm not certain this is better or more correct though, but it would remove any ambiguity in the short TZ codes - could be documented - all short TZ codes are evaluated as if they were in this default TZ/Locale. That might upset someone deploying that wants {{MST}} = Malaysia Standard Time and not Mountain Time. Make a note here if you think it is worth pursuing further, but I suspect we just have to honour the local env defaults and discourage abbreviated TZs. And the test fix is merged now, so all-good, thanks. > SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ > name > -- > > Key: SPARK-15723 > URL: https://issues.apache.org/jira/browse/SPARK-15723 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall >Assignee: Brett Randall >Priority: Minor > Labels: test > Fix For: 1.6.2, 2.0.0 > > > {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion: > {code} > new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be > (1424470877190L) > {code} > This test is fragile and fails when executing in an environment where the > local default timezone causes {{EST}} to be interpreted as something other > than US Eastern Standard Time. If your local timezone is > {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get: > {noformat} > date parsing *** FAILED *** > 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29) > {noformat} > In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} > when interpreting short zone names. According to the {{TimeZone}} javadoc, > they ought not be used: > {quote} > Three-letter time zone IDs > For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such > as "PST", "CTT", "AST") are also supported. However, their use is deprecated > because the same abbreviation is often used for multiple time zones (for > example, "CST" could be U.S. "Central Standard Time" and "China Standard > Time"), and the Java platform can then only recognize one of them. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315725#comment-15315725 ] Brett Randall edited comment on SPARK-15723 at 6/5/16 4:49 AM: --- Unfortunately I don't think that will help in this case, or any other case where there is ambiguity for a client-supplied short-code TZ name as-interpreted by a certain server-side local default timezone, or as shown in this case, in a test. {{SimpleDateFormat}} parsing has some unusual side-effects on its underlying {{Calendar}} instance, which are triggered by a {{parse()}}. There is a warning of this behaviour in the JavaDoc for {{java.text.DateFormat.setTimeZone(TimeZone)}}: {quote} The {{TimeZone}} set by this method may be overwritten as a result of a call to the parse method. {quote} If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either {{TimeZone.getDefault()}} or as later set as is done in the second attempt in {{SimpleDateParam}}) which does not match a timezone that is decoded from a short-code TZ in the parsed string, SDF makes the assumption that the best "local" interpretation of the abbreviation is the correct one, and updates its local {{Calendar}} instance to the new TZ. So in my test-case, if my {{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says "they must mean Eastern (Australian) Standard Time, I'll interpret as that and update my calendar", whereas that code running with a different default TZ will pick {{America/New_York}} or {{-0500}}. It does this during the parsing and before the computation of the datetime, so the result is in the guessed timezone, not any timezone you pre-seed with {{setTimeZone()}} or even {{setCalendar()}} - they will always be overwritten if the TZ is parsed. You could block the update in an SDF sub-class, but you are still left with the same problem - how should {{EST}}, which is necessarily ambiguous according to the old list of short codes, be interpreted? The JDK code is in method {{java.text.SimpleDateFormat.subParseZoneString\(String, int, CalendarBuilder\)}}, where it can be seen to call {{setTimeZone\(...\)}} on itself if it parses a TZ that differs from the existing SDF one. So any pre-setting of a {{TimeZone}} or {{Calendar}} is moot if you allow the short-TZ code to be parsed - it is ambiguous for many values, but it is the caller's fault for using a deprecated short TZ form. The result completely depends on {{TimeZone.getDefault()}}, which you don't want to change or rely-on. It is unfortunate that SDF does this, but these short-forms are completely deprecated anyway. Clients should not be using them - they should be using an RFC822 form, which avoids ambiguity. Setting a fixed TZ and removing the opportunity to provide an ambiguous code would necessitate an API change, and you would no-longer be able to specify a TZ. That might be OK, but being an API change you might want deprecation, I don't know. was (Author: brett_s_r): Unfortunately I don't think that will help in this case, or any other case where there is ambiguity for a client-supplied short-code TZ name as-interpreted by a certain server-side local default timezone, or as shown in this case, in a test. {{SimpleDateFormat}} parsing has some unusual side-effects on its underlying {{Calendar}} instance, which are triggered by a {{parse()}}. There is a warning of this behaviour in the JavaDoc for {{java.text.DateFormat.setTimeZone(TimeZone)}}: {quote} The {{TimeZone}} set by this method may be overwritten as a result of a call to the parse method. {quote} If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either {{TimeZone.getDefault()}} or as later set as is done in the second attempt in {{SimpleDateParam}}) which does not match a timezone that is decoded from a short-code TZ in the parsed string, SDF makes the assumption that the best "local" interpretation of the abbreviation is the correct one, and updates its local {{Calendar}} instance to the new TZ. So in my test-case, if my {{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says "they must mean Eastern (Australian) Standard Time, I'll interpret as that and update my calendar", whereas that code running with a different default TZ will pick {{America/New_York}} or {{-0500}}. It does this during the parsing and before the computation of the datetime, so the result is in the guessed timezone, not any timezone you pre-seed with {{setTimeZone()}} or even {{setCalendar()}} - they will always be overwritten if the TZ is parsed. You could block the update in an SDF sub-class, but you are still left with the same problem - how should {{ES
[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315725#comment-15315725 ] Brett Randall commented on SPARK-15723: --- Unfortunately I don't think that will help in this case, or any other case where there is ambiguity for a client-supplied short-code TZ name as-interpreted by a certain server-side local default timezone, or as shown in this case, in a test. {{SimpleDateFormat}} parsing has some unusual side-effects on its underlying {{Calendar}} instance, which are triggered by a {{parse()}}. There is a warning of this behaviour in the JavaDoc for {{java.text.DateFormat.setTimeZone(TimeZone)}}: {quote} The {{TimeZone}} set by this method may be overwritten as a result of a call to the parse method. {quote} If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either {{TimeZone.getDefault()}} or as later set as is done in the second attempt in {{SimpleDateParam}}) which does not match a timezone that is decoded from a short-code TZ in the parsed string, SDF makes the assumption that the best "local" interpretation of the abbreviation is the correct one, and updates its local {{Calendar}} instance to the new TZ. So in my test-case, if my {{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says "they must mean Eastern (Australian) Standard Time, I'll interpret as that and update my calendar", whereas that code running with a different default TZ will pick {{America/New_York}} or {{-0500}}. It does this during the parsing and before the computation of the datetime, so the result is in the guessed timezone, not any timezone you pre-seed with {{setTimeZone()}} or even {{setCalendar()}} - they will always be overwritten if the TZ is parsed. You could block the update in an SDF sub-class, but you are still left with the same problem - how should {{EST}}, which is necessarily ambiguous according to the old list of short codes, be interpreted? The JDK code is in method {{java.text.SimpleDateFormat.subParseZoneString\(String, int, CalendarBuilder\)}}, where it can be seen to call {{setTimeZone\(...\)}} on itself if it parses a TZ that differs from the existing SDF one. So any pre-setting of a {{TimeZone}} or {{Calendar}} is moot if you allow the short-TZ code to be parsed - it is ambiguous for many values, but it is the caller's fault for using a deprecated short TZ form. The result completely depends on {{TimeZone.getDefault()}}, which you don't want to change or rely-on. It is unfortunate that SDF does this, but these short-forms are completely deprecated anyway. Clients should not be using them - they should be using an RFC822 form, which avoids ambiguity. > SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ > name > -- > > Key: SPARK-15723 > URL: https://issues.apache.org/jira/browse/SPARK-15723 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall >Priority: Minor > Labels: test > > {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion: > {code} > new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be > (1424470877190L) > {code} > This test is fragile and fails when executing in an environment where the > local default timezone causes {{EST}} to be interpreted as something other > than US Eastern Standard Time. If your local timezone is > {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get: > {noformat} > date parsing *** FAILED *** > 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29) > {noformat} > In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} > when interpreting short zone names. According to the {{TimeZone}} javadoc, > they ought not be used: > {quote} > Three-letter time zone IDs > For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such > as "PST", "CTT", "AST") are also supported. However, their use is deprecated > because the same abbreviation is often used for multiple time zones (for > example, "CST" could be U.S. "Central Standard Time" and "China Standard > Time"), and the Java platform can then only recognize one of them. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313479#comment-15313479 ] Brett Randall commented on SPARK-15723: --- Yes the fix is to the test only. I looked at where {{SimpleDateParam}} is used, which is to bind {{minDate}} and {{maxDate}} parameters in {{core/src/main/scala/org/apache/spark/status/api/v1/ApplicationListResource.scala}}. Not being familiar with the use of the API, but I anticipate that these are some values that could be client/user supplied, in which case the onus would be on the caller to use non-deprecated and unambiguous TZ forms, in case the server was deployed in region that could make their abbreviation ambiguous. {{HistoryServerSuite}} has a {{GMT}} example, which is fine. Possibly the notes in {{docs/monitoring.md}} could be clarified in this regard. > SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ > name > -- > > Key: SPARK-15723 > URL: https://issues.apache.org/jira/browse/SPARK-15723 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall >Priority: Minor > Labels: test > > {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion: > {code} > new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be > (1424470877190L) > {code} > This test is fragile and fails when executing in an environment where the > local default timezone causes {{EST}} to be interpreted as something other > than US Eastern Standard Time. If your local timezone is > {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get: > {noformat} > date parsing *** FAILED *** > 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29) > {noformat} > In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} > when interpreting short zone names. According to the {{TimeZone}} javadoc, > they ought not be used: > {quote} > Three-letter time zone IDs > For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such > as "PST", "CTT", "AST") are also supported. However, their use is deprecated > because the same abbreviation is often used for multiple time zones (for > example, "CST" could be U.S. "Central Standard Time" and "China Standard > Time"), and the Java platform can then only recognize one of them. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311801#comment-15311801 ] Brett Randall commented on SPARK-15685: --- I found an interesting commit for this topic: https://github.com/apache/spark/commit/af1f4da76268115c5a4cc3035d3236ad27f7240a for SPARK-13904 . This talks about variations between different cluster-managers, and the desire not to necessarily take-down the parent process (AKA the JVM) during task failure or clean-up. I think the same applies here for Spark running in local mode, so I wonder how this change can be applied to local mode. {{Executor#killAllTasks}} has been coded, but is not yet used anywhere. Is it just a matter of pulling-up the {{exitExecutor()}} method into {{ExecutorBackend}} so that {{LocalSchedulerBackendEndpoint}} can override it and call {{killAllTasks}} rather than {{System.exit()}}? > StackOverflowError (VirtualMachineError) or NoClassDefFoundError > (LinkageError) should not System.exit() in local mode > -- > > Key: SPARK-15685 > URL: https://issues.apache.org/jira/browse/SPARK-15685 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall >Priority: Critical > > Spark, when running in local mode, can encounter certain types of {{Error}} > exceptions in developer-code or third-party libraries and call > {{System.exit()}}, potentially killing a long-running JVM/service. The > caller should decide on the exception-handling and whether the error should > be deemed fatal. > *Consider this scenario:* > * Spark is being used in local master mode within a long-running JVM > microservice, e.g. a Jetty instance. > * A task is run. The task errors with particular types of unchecked > throwables: > ** a) there some bad code and/or bad data that exposes a bug where there's > unterminated recursion, leading to a {{StackOverflowError}}, or > ** b) a particular not-often used function is called - there's a packaging > error with the service, a third-party library is missing some dependencies, a > {{NoClassDefFoundError}} is found. > *Expected behaviour:* Since we are running in local mode, we might expect > some unchecked exception to be thrown, to be optionally-handled by the Spark > caller. In the case of Jetty, a request thread or some other background > worker thread might handle the exception or not, the thread might exit or > note an error. The caller should decide how the error is handled. > *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty > microservice is down and must be restarted. > *Consequence:* Any local code or third-party library might cause Spark to > exit the long-running JVM/microservice, so Spark can be a problem in this > architecture. I have seen this now on three separate occasions, leading to > service-down bug reports. > *Analysis:* > The line of code that seems to be the problem is: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 > {code} > // Don't forcibly exit unless the exception was inherently fatal, to avoid > // stopping other tasks unnecessarily. > if (Utils.isFatalError(t)) { > SparkUncaughtExceptionHandler.uncaughtException(t) > } > {code} > [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] > first excludes Scala > [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], > which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, > {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. > {{Utils.isFatalError()}} further excludes {{InterruptedException}}, > {{NotImplementedError}} and {{ControlThrowable}}. > Remaining are {{Error}} s such as {{StackOverflowError extends > VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which > occur in the aforementioned scenarios. > {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call > {{System.exit()}}. > [Further > up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] > in in {{Executor}} we see exclusions for registering > {{SparkUncaughtExceptionHandler}} if in local mode: > {code} > if (!isLocal) { > // Setup an uncaught exception handler for non-local mode. > // Make any thread terminations due to uncaught exceptions kill the entire > // executor process to avoid surprising stalls. > Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) > } > {code} > This same exclusion must be
[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311717#comment-15311717 ] Brett Randall commented on SPARK-15723: --- https://github.com/apache/spark/pull/13462 > SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ > name > -- > > Key: SPARK-15723 > URL: https://issues.apache.org/jira/browse/SPARK-15723 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall >Priority: Minor > Labels: test > > {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion: > {code} > new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be > (1424470877190L) > {code} > This test is fragile and fails when executing in an environment where the > local default timezone causes {{EST}} to be interpreted as something other > than US Eastern Standard Time. If your local timezone is > {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get: > {noformat} > date parsing *** FAILED *** > 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29) > {noformat} > In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} > when interpreting short zone names. According to the {{TimeZone}} javadoc, > they ought not be used: > {quote} > Three-letter time zone IDs > For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such > as "PST", "CTT", "AST") are also supported. However, their use is deprecated > because the same abbreviation is often used for multiple time zones (for > example, "CST" could be U.S. "Central Standard Time" and "China Standard > Time"), and the Java platform can then only recognize one of them. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311701#comment-15311701 ] Brett Randall commented on SPARK-15723: --- I will propose a fix. Note that to reproduce this problem in any locale/timezone, you can modify the {{scalatest-maven-plugin}} {{argLine}} to add a timezone: {code:xml}-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=${CodeCacheSize} -Duser.timezone="Australia/Sydney" {code} and run {{$ mvn test -DwildcardSuites=org.apache.spark.status.api.v1.SimpleDateParamSuite -Dtest=none}}. Equally this will fix it in an effected timezone: {code:xml}-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=${CodeCacheSize} -Duser.timezone="America/New_York" {code} > SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ > name > -- > > Key: SPARK-15723 > URL: https://issues.apache.org/jira/browse/SPARK-15723 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall >Priority: Minor > Labels: test > > {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion: > {code} > new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be > (1424470877190L) > {code} > This test is fragile and fails when executing in an environment where the > local default timezone causes {{EST}} to be interpreted as something other > than US Eastern Standard Time. If your local timezone is > {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get: > {noformat} > date parsing *** FAILED *** > 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29) > {noformat} > In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} > when interpreting short zone names. According to the {{TimeZone}} javadoc, > they ought not be used: > {quote} > Three-letter time zone IDs > For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such > as "PST", "CTT", "AST") are also supported. However, their use is deprecated > because the same abbreviation is often used for multiple time zones (for > example, "CST" could be U.S. "Central Standard Time" and "China Standard > Time"), and the Java platform can then only recognize one of them. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
Brett Randall created SPARK-15723: - Summary: SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name Key: SPARK-15723 URL: https://issues.apache.org/jira/browse/SPARK-15723 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.1 Reporter: Brett Randall Priority: Minor {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion: {code} new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be (1424470877190L) {code} This test is fragile and fails when executing in an environment where the local default timezone causes {{EST}} to be interpreted as something other than US Eastern Standard Time. If your local timezone is {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get: {noformat} date parsing *** FAILED *** 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29) {noformat} In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} when interpreting short zone names. According to the {{TimeZone}} javadoc, they ought not be used: {quote} Three-letter time zone IDs For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such as "PST", "CTT", "AST") are also supported. However, their use is deprecated because the same abbreviation is often used for multiple time zones (for example, "CST" could be U.S. "Central Standard Time" and "China Standard Time"), and the Java platform can then only recognize one of them. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311149#comment-15311149 ] Brett Randall commented on SPARK-15685: --- Thanks [~srowen]. The {{SecurityManager}} is only in-play here for: * Capturing the problem in a unit-test and making an assertion that Spark should not exit the JVM when running in local mode. In the unit test it is removed again and replaced with the preexisting {{SecurityManager}} in the test finally block. * As an unusual workaround in light of this problem, to prevent microservices server processes from being shutdown by Spark as it handles an error. There's no suggestion here to install a {{SecurityManager}}. That said, I don't expect it to have any performance impact - it only comes into play when some code attempts to call {{System.exit}}. It shouldn't be necessary other than to block callers of {{System.exit()}}, and it would prevent an existing {{SecurityManager}}. The long-running application here is a for example a microservice, orchestrating calculations which are performed by Spark. So the service is long-running, but the Spark jobs are short-running asynchronous tasks. This seems a reasonable use of Spark to me, short batch jobs running embedded in the local JVM, orchestrated by a long-running service. For larger jobs, the service switches to use cluster mode and sends the Spark jobs off to a remote JVM e.g. via YARN. I don't think it is typical for a calculations framework or batch-job framework to decide to shutdown the JVM with {{System.exit()}} rather than simply aborting the task thread and throwing some exception or notification up the stack for the caller to handle. > StackOverflowError (VirtualMachineError) or NoClassDefFoundError > (LinkageError) should not System.exit() in local mode > -- > > Key: SPARK-15685 > URL: https://issues.apache.org/jira/browse/SPARK-15685 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Brett Randall >Priority: Critical > > Spark, when running in local mode, can encounter certain types of {{Error}} > exceptions in developer-code or third-party libraries and call > {{System.exit()}}, potentially killing a long-running JVM/service. The > caller should decide on the exception-handling and whether the error should > be deemed fatal. > *Consider this scenario:* > * Spark is being used in local master mode within a long-running JVM > microservice, e.g. a Jetty instance. > * A task is run. The task errors with particular types of unchecked > throwables: > ** a) there some bad code and/or bad data that exposes a bug where there's > unterminated recursion, leading to a {{StackOverflowError}}, or > ** b) a particular not-often used function is called - there's a packaging > error with the service, a third-party library is missing some dependencies, a > {{NoClassDefFoundError}} is found. > *Expected behaviour:* Since we are running in local mode, we might expect > some unchecked exception to be thrown, to be optionally-handled by the Spark > caller. In the case of Jetty, a request thread or some other background > worker thread might handle the exception or not, the thread might exit or > note an error. The caller should decide how the error is handled. > *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty > microservice is down and must be restarted. > *Consequence:* Any local code or third-party library might cause Spark to > exit the long-running JVM/microservice, so Spark can be a problem in this > architecture. I have seen this now on three separate occasions, leading to > service-down bug reports. > *Analysis:* > The line of code that seems to be the problem is: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 > {code} > // Don't forcibly exit unless the exception was inherently fatal, to avoid > // stopping other tasks unnecessarily. > if (Utils.isFatalError(t)) { > SparkUncaughtExceptionHandler.uncaughtException(t) > } > {code} > [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] > first excludes Scala > [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], > which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, > {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. > {{Utils.isFatalError()}} further excludes {{InterruptedException}}, > {{NotImplementedError}} and {{ControlThrowable}}. > Remaining are {{Error}} s such as {{StackOverflowError extend
[jira] [Updated] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brett Randall updated SPARK-15685: -- Description: Spark, when running in local mode, can encounter certain types of {{Error}} exceptions in developer-code or third-party libraries and call {{System.exit()}}, potentially killing a long-running JVM/service. The caller should decide on the exception-handling and whether the error should be deemed fatal. *Consider this scenario:* * Spark is being used in local master mode within a long-running JVM microservice, e.g. a Jetty instance. * A task is run. The task errors with particular types of unchecked throwables: ** a) there some bad code and/or bad data that exposes a bug where there's unterminated recursion, leading to a {{StackOverflowError}}, or ** b) a particular not-often used function is called - there's a packaging error with the service, a third-party library is missing some dependencies, a {{NoClassDefFoundError}} is found. *Expected behaviour:* Since we are running in local mode, we might expect some unchecked exception to be thrown, to be optionally-handled by the Spark caller. In the case of Jetty, a request thread or some other background worker thread might handle the exception or not, the thread might exit or note an error. The caller should decide how the error is handled. *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty microservice is down and must be restarted. *Consequence:* Any local code or third-party library might cause Spark to exit the long-running JVM/microservice, so Spark can be a problem in this architecture. I have seen this now on three separate occasions, leading to service-down bug reports. *Analysis:* The line of code that seems to be the problem is: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 {code} // Don't forcibly exit unless the exception was inherently fatal, to avoid // stopping other tasks unnecessarily. if (Utils.isFatalError(t)) { SparkUncaughtExceptionHandler.uncaughtException(t) } {code} [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] first excludes Scala [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. {{Utils.isFatalError()}} further excludes {{InterruptedException}}, {{NotImplementedError}} and {{ControlThrowable}}. Remaining are {{Error}} s such as {{StackOverflowError extends VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which occur in the aforementioned scenarios. {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call {{System.exit()}}. [Further up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] in in {{Executor}} we see exclusions for registering {{SparkUncaughtExceptionHandler}} if in local mode: {code} if (!isLocal) { // Setup an uncaught exception handler for non-local mode. // Make any thread terminations due to uncaught exceptions kill the entire // executor process to avoid surprising stalls. Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) } {code} This same exclusion must be applied for local mode for "fatal" errors - cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should decide. A minimal test-case is supplied. It installs a logging {{SecurityManager}} to confirm that {{System.exit()}} was called from {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}. It also hints at the workaround - install your own {{SecurityManager}} and inspect the current stack in {{checkExit()}} to prevent Spark from exiting the JVM. Test-case: https://github.com/javabrett/SPARK-15685 . was: Spark, when running in local mode, can encounter certain types of {{Error}} exceptions in developer-code or third-party libraries and call {{System.exit()}}, potentially killing a long-running JVM/service. The caller should decide on the exception-handling and whether the error should be deemed fatal. *Consider this scenario:* * Spark is being used in local master mode within a long-running JVM microservice, e.g. a Jetty instance. * A task is run. The task errors with particular types of unchecked throwables: ** a) there some bad code and/or bad data that exposes a bug where there's unterminated recursion, leading to a {{StackOverflowError}}, or ** b) a particular not-often used function is called - there's a packaging error with the service, a third-party library is missing some dependencies, a {{NoClassDefFoundError}} is found. *Expected behavi
[jira] [Updated] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brett Randall updated SPARK-15685: -- Description: Spark, when running in local mode, can encounter certain types of {{Error}} exceptions in developer-code or third-party libraries and call {{System.exit()}}, potentially killing a long-running JVM/service. The caller should decide on the exception-handling and whether the error should be deemed fatal. *Consider this scenario:* * Spark is being used in local master mode within a long-running JVM microservice, e.g. a Jetty instance. * A task is run. The task errors with particular types of unchecked throwables: ** a) there some bad code and/or bad data that exposes a bug where there's unterminated recursion, leading to a {{StackOverflowError}}, or ** b) a particular not-often used function is called - there's a packaging error with the service, a third-party library is missing some dependencies, a {{NoClassDefFoundError}} is found. *Expected behaviour:* Since we are running in local mode, we might expect some unchecked exception to be thrown, to be optionally-handled by the Spark caller. In the case of Jetty, a request thread or some other background worker thread might handle the exception or not, the thread might exit or note an error. The caller should decide how the error is handled. *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty microservice is down and must be restarted. *Consequence:* Any local code or third-party library might cause Spark to exit the long-running JVM/microservice, so Spark can be a problem in this architecture. I have seen this now on three separate occasions, leading to service-down bug reports. *Analysis:* The line of code that seems to be the problem is: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 {code} // Don't forcibly exit unless the exception was inherently fatal, to avoid // stopping other tasks unnecessarily. if (Utils.isFatalError(t)) { SparkUncaughtExceptionHandler.uncaughtException(t) } {code} [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] first excludes Scala [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. {{Utils.isFatalError()}} further excludes {{InterruptedException}}, {{NotImplementedError}} and {{ControlThrowable}}. Remaining are {{Error}}s such as {{StackOverflowError extends VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which occur in the aforementioned scenarios. {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call {{System.exit()}}. [Further up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] in in {{Executor}} we see exclusions for registering {{SparkUncaughtExceptionHandler}} if in local mode: {code} if (!isLocal) { // Setup an uncaught exception handler for non-local mode. // Make any thread terminations due to uncaught exceptions kill the entire // executor process to avoid surprising stalls. Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) } {code} This same exclusion must be applied for local mode for "fatal" errors - cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should decide. A minimal test-case is supplied. It installs a logging {{SecurityManager}} to confirm that {{System.exit()}} was called from {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}. It also hints at the workaround - install your own {{SecurityManager}} and inspect the current stack in {{checkExit()}} to prevent Spark from exiting the JVM. Test-case: https://github.com/javabrett/SPARK-15685 . was: Spark, when running in local mode, can encounter certain types of {{Error}} exceptions in developer-code or third-party libraries and call {{System.exit()}}, potentially killing a long-running JVM/service. The caller should decide on the exception-handling and whether the error should be deemed fatal. *Consider this scenario:* * Spark is being used in local master mode within a long-running JVM microservice, e.g. a Jetty instance. * A task is run. The task errors with particular types of unchecked throwables: ** a) there some bad code and/or bad data that exposes a bug where there's unterminated recursion, leading to a {{StackOverflowError}}, or ** b) a particular not-often used function is called - there's a packaging error with the service, a third-party library is missing some dependencies, a {{NoClassDefFoundError}} is found. *Expected behavio
[jira] [Created] (SPARK-15685) StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode
Brett Randall created SPARK-15685: - Summary: StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode Key: SPARK-15685 URL: https://issues.apache.org/jira/browse/SPARK-15685 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.1 Reporter: Brett Randall Priority: Critical Spark, when running in local mode, can encounter certain types of {{Error}} exceptions in developer-code or third-party libraries and call {{System.exit()}}, potentially killing a long-running JVM/service. The caller should decide on the exception-handling and whether the error should be deemed fatal. *Consider this scenario:* * Spark is being used in local master mode within a long-running JVM microservice, e.g. a Jetty instance. * A task is run. The task errors with particular types of unchecked throwables: ** a) there some bad code and/or bad data that exposes a bug where there's unterminated recursion, leading to a {{StackOverflowError}}, or ** b) a particular not-often used function is called - there's a packaging error with the service, a third-party library is missing some dependencies, a {{NoClassDefFoundError}} is found. *Expected behaviour:* Since we are running in local mode, we might expect some unchecked exception to be thrown, to be optionally-handled by the Spark caller. In the case of Jetty, a request thread or some other background worker thread might handle the exception or not, the thread might exit or note an error. The caller should decide how the error is handled. *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty microservice is down and must be restarted. *Consequence:* Any local code or third-party library might cause Spark to exit the long-running JVM/microservice, so Spark can be a problem in this architecture. I have seen this now on three separate occasions, leading to service-down bug reports. *Analysis:* The line of code that seems to be the problem is: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 {code} // Don't forcibly exit unless the exception was inherently fatal, to avoid // stopping other tasks unnecessarily. if (Utils.isFatalError(t)) { SparkUncaughtExceptionHandler.uncaughtException(t) } {code} [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] first excludes Scala [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. {{Utils.isFatalError()}} further excludes {{InterruptedException}}, {{NotImplementedError}} and {{ControlThrowable}}. Remaining are {{Error}}s such as {{StackOverflowError extends VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which occur in the aforementioned scenarios. {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call {{System.exit()}}. [Further up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] in in {{Executor}} we see exclusions for registering {{SparkUncaughtExceptionHandler}} if in local mode: {code} if (!isLocal) { // Setup an uncaught exception handler for non-local mode. // Make any thread terminations due to uncaught exceptions kill the entire // executor process to avoid surprising stalls. Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) } {code} This same exclusion must be applied for local mode for "fatal" errors - cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should decide. A minimal test-case is supplied. It installs a logging {{SecurityManager}} to confirm that {{System.exit()}} was called from {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}. It also hints at the workaround - install your own {{SecurityManager}} and inspect the current stack in {{checkExit()}} to prevent Spark from exiting the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org