[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)
[ https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249028#comment-14249028 ] Ding Yuan commented on SPARK-1148: -- of course, please just close this one then. Thanks! > Suggestions for exception handling (avoid potential bugs) > - > > Key: SPARK-1148 > URL: https://issues.apache.org/jira/browse/SPARK-1148 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 0.9.0 >Reporter: Ding Yuan > > Hi Spark developers, > We are a group of researchers on software reliability. Recently we did a > study and found that majority of the most severe failures in data-analytic > systems are caused by bugs in exception handlers – that it is hard to > anticipate all the possible real-world error scenarios. Therefore we built a > simple checking tool that automatically detects some bug patterns that have > caused some very severe real-world failures. I am reporting a few cases here. > Any feedback is much appreciated! > Ding > = > Case 1: > Line: 1249, File: "org/apache/spark/SparkContext.scala" > {noformat} > 1244: val scheduler = try { > 1245: val clazz = > Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler") > 1246: val cons = clazz.getConstructor(classOf[SparkContext]) > 1247: cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl] > 1248: } catch { > 1249: // TODO: Enumerate the exact reasons why it can fail > 1250: // But irrespective of it, it means we cannot proceed ! > 1251: case th: Throwable => { > 1252: throw new SparkException("YARN mode not available ?", th) > 1253: } > {noformat} > The comment suggests the specific exceptions should be enumerated here. > The try block could throw the following exceptions: > ClassNotFoundException > NegativeArraySizeException > NoSuchMethodException > SecurityException > InstantiationException > IllegalAccessException > IllegalArgumentException > InvocationTargetException > ClassCastException > == > = > Case 2: > Line: 282, File: "org/apache/spark/executor/Executor.scala" > {noformat} > 265: case t: Throwable => { > 266: val serviceTime = (System.currentTimeMillis() - > taskStart).toInt > 267: val metrics = attemptedTask.flatMap(t => t.metrics) > 268: for (m <- metrics) { > 269: m.executorRunTime = serviceTime > 270: m.jvmGCTime = gcTime - startGCTime > 271: } > 272: val reason = ExceptionFailure(t.getClass.getName, t.toString, > t.getStackTrace, metrics) > 273: execBackend.statusUpdate(taskId, TaskState.FAILED, > ser.serialize(reason)) > 274: > 275: // TODO: Should we exit the whole executor here? On the one > hand, the failed task may > 276: // have left some weird state around depending on when the > exception was thrown, but on > 277: // the other hand, maybe we could detect that when future > tasks fail and exit then. > 278: logError("Exception in task ID " + taskId, t) > 279: //System.exit(1) > 280: } > 281: } finally { > 282: // TODO: Unregister shuffle memory only for ResultTask > 283: val shuffleMemoryMap = env.shuffleMemoryMap > 284: shuffleMemoryMap.synchronized { > 285: shuffleMemoryMap.remove(Thread.currentThread().getId) > 286: } > 287: runningTasks.remove(taskId) > 288: } > {noformat} > From the comment in this Throwable exception handler it seems to suggest that > the system should just exit? > == > = > Case 3: > Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java" > {noformat} > 66: try { > 67: ctx.write(new DefaultFileRegion(new FileInputStream(file) > 68: .getChannel(), fileSegment.offset(), fileSegment.length())); > 69: } catch (Exception e) { > 70: LOG.error("Exception: ", e); > 71: } > {noformat} > "Exception" is too general. The try block only throws "FileNotFoundException". > Although there is nothing wrong with it now, but later if code evolves this > might cause some other exceptions to be swallowed. > == -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)
[ https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248941#comment-14248941 ] Sean Owen commented on SPARK-1148: -- [~d.yuan] I don't know that these have been addressed; some issues like it have been, if I recall correctly. Whatever the status, I think it's useful to focus on master and/or later branches. Would it be OK to close this in favor of SPARK-4863? > Suggestions for exception handling (avoid potential bugs) > - > > Key: SPARK-1148 > URL: https://issues.apache.org/jira/browse/SPARK-1148 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 0.9.0 >Reporter: Ding Yuan > > Hi Spark developers, > We are a group of researchers on software reliability. Recently we did a > study and found that majority of the most severe failures in data-analytic > systems are caused by bugs in exception handlers – that it is hard to > anticipate all the possible real-world error scenarios. Therefore we built a > simple checking tool that automatically detects some bug patterns that have > caused some very severe real-world failures. I am reporting a few cases here. > Any feedback is much appreciated! > Ding > = > Case 1: > Line: 1249, File: "org/apache/spark/SparkContext.scala" > {noformat} > 1244: val scheduler = try { > 1245: val clazz = > Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler") > 1246: val cons = clazz.getConstructor(classOf[SparkContext]) > 1247: cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl] > 1248: } catch { > 1249: // TODO: Enumerate the exact reasons why it can fail > 1250: // But irrespective of it, it means we cannot proceed ! > 1251: case th: Throwable => { > 1252: throw new SparkException("YARN mode not available ?", th) > 1253: } > {noformat} > The comment suggests the specific exceptions should be enumerated here. > The try block could throw the following exceptions: > ClassNotFoundException > NegativeArraySizeException > NoSuchMethodException > SecurityException > InstantiationException > IllegalAccessException > IllegalArgumentException > InvocationTargetException > ClassCastException > == > = > Case 2: > Line: 282, File: "org/apache/spark/executor/Executor.scala" > {noformat} > 265: case t: Throwable => { > 266: val serviceTime = (System.currentTimeMillis() - > taskStart).toInt > 267: val metrics = attemptedTask.flatMap(t => t.metrics) > 268: for (m <- metrics) { > 269: m.executorRunTime = serviceTime > 270: m.jvmGCTime = gcTime - startGCTime > 271: } > 272: val reason = ExceptionFailure(t.getClass.getName, t.toString, > t.getStackTrace, metrics) > 273: execBackend.statusUpdate(taskId, TaskState.FAILED, > ser.serialize(reason)) > 274: > 275: // TODO: Should we exit the whole executor here? On the one > hand, the failed task may > 276: // have left some weird state around depending on when the > exception was thrown, but on > 277: // the other hand, maybe we could detect that when future > tasks fail and exit then. > 278: logError("Exception in task ID " + taskId, t) > 279: //System.exit(1) > 280: } > 281: } finally { > 282: // TODO: Unregister shuffle memory only for ResultTask > 283: val shuffleMemoryMap = env.shuffleMemoryMap > 284: shuffleMemoryMap.synchronized { > 285: shuffleMemoryMap.remove(Thread.currentThread().getId) > 286: } > 287: runningTasks.remove(taskId) > 288: } > {noformat} > From the comment in this Throwable exception handler it seems to suggest that > the system should just exit? > == > = > Case 3: > Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java" > {noformat} > 66: try { > 67: ctx.write(new DefaultFileRegion(new FileInputStream(file) > 68: .getChannel(), fileSegment.offset(), fileSegment.length())); > 69: } catch (Exception e) { > 70: LOG.error("Exception: ", e); > 71: } > {noformat} > "Exception" is too general. The try block only throws "FileNotFoundException". > Although there is nothing wrong with it now, but later if code evolves this > might cause some other exceptions to be swallowed. > == -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.a
[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)
[ https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248707#comment-14248707 ] Ding Yuan commented on SPARK-1148: -- Glad to know these cases are addressed! As for version 0.9.0, I have reported all the cases (only three of them). I have opened another JIRA: https://issues.apache.org/jira/browse/SPARK-4863 to report the warnings on spark-1.1.1. > Suggestions for exception handling (avoid potential bugs) > - > > Key: SPARK-1148 > URL: https://issues.apache.org/jira/browse/SPARK-1148 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 0.9.0 >Reporter: Ding Yuan > > Hi Spark developers, > We are a group of researchers on software reliability. Recently we did a > study and found that majority of the most severe failures in data-analytic > systems are caused by bugs in exception handlers – that it is hard to > anticipate all the possible real-world error scenarios. Therefore we built a > simple checking tool that automatically detects some bug patterns that have > caused some very severe real-world failures. I am reporting a few cases here. > Any feedback is much appreciated! > Ding > = > Case 1: > Line: 1249, File: "org/apache/spark/SparkContext.scala" > {noformat} > 1244: val scheduler = try { > 1245: val clazz = > Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler") > 1246: val cons = clazz.getConstructor(classOf[SparkContext]) > 1247: cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl] > 1248: } catch { > 1249: // TODO: Enumerate the exact reasons why it can fail > 1250: // But irrespective of it, it means we cannot proceed ! > 1251: case th: Throwable => { > 1252: throw new SparkException("YARN mode not available ?", th) > 1253: } > {noformat} > The comment suggests the specific exceptions should be enumerated here. > The try block could throw the following exceptions: > ClassNotFoundException > NegativeArraySizeException > NoSuchMethodException > SecurityException > InstantiationException > IllegalAccessException > IllegalArgumentException > InvocationTargetException > ClassCastException > == > = > Case 2: > Line: 282, File: "org/apache/spark/executor/Executor.scala" > {noformat} > 265: case t: Throwable => { > 266: val serviceTime = (System.currentTimeMillis() - > taskStart).toInt > 267: val metrics = attemptedTask.flatMap(t => t.metrics) > 268: for (m <- metrics) { > 269: m.executorRunTime = serviceTime > 270: m.jvmGCTime = gcTime - startGCTime > 271: } > 272: val reason = ExceptionFailure(t.getClass.getName, t.toString, > t.getStackTrace, metrics) > 273: execBackend.statusUpdate(taskId, TaskState.FAILED, > ser.serialize(reason)) > 274: > 275: // TODO: Should we exit the whole executor here? On the one > hand, the failed task may > 276: // have left some weird state around depending on when the > exception was thrown, but on > 277: // the other hand, maybe we could detect that when future > tasks fail and exit then. > 278: logError("Exception in task ID " + taskId, t) > 279: //System.exit(1) > 280: } > 281: } finally { > 282: // TODO: Unregister shuffle memory only for ResultTask > 283: val shuffleMemoryMap = env.shuffleMemoryMap > 284: shuffleMemoryMap.synchronized { > 285: shuffleMemoryMap.remove(Thread.currentThread().getId) > 286: } > 287: runningTasks.remove(taskId) > 288: } > {noformat} > From the comment in this Throwable exception handler it seems to suggest that > the system should just exit? > == > = > Case 3: > Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java" > {noformat} > 66: try { > 67: ctx.write(new DefaultFileRegion(new FileInputStream(file) > 68: .getChannel(), fileSegment.offset(), fileSegment.length())); > 69: } catch (Exception e) { > 70: LOG.error("Exception: ", e); > 71: } > {noformat} > "Exception" is too general. The try block only throws "FileNotFoundException". > Although there is nothing wrong with it now, but later if code evolves this > might cause some other exceptions to be swallowed. > == -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For
[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)
[ https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224305#comment-14224305 ] Sean Owen commented on SPARK-1148: -- [~d.yuan] Several problems like this have been fixed since. Can you open a PR to resolve any others that your tool has found or is it all fixed now? > Suggestions for exception handling (avoid potential bugs) > - > > Key: SPARK-1148 > URL: https://issues.apache.org/jira/browse/SPARK-1148 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 0.9.0 >Reporter: Ding Yuan > > Hi Spark developers, > We are a group of researchers on software reliability. Recently we did a > study and found that majority of the most severe failures in data-analytic > systems are caused by bugs in exception handlers – that it is hard to > anticipate all the possible real-world error scenarios. Therefore we built a > simple checking tool that automatically detects some bug patterns that have > caused some very severe real-world failures. I am reporting a few cases here. > Any feedback is much appreciated! > Ding > = > Case 1: > Line: 1249, File: "org/apache/spark/SparkContext.scala" > {noformat} > 1244: val scheduler = try { > 1245: val clazz = > Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler") > 1246: val cons = clazz.getConstructor(classOf[SparkContext]) > 1247: cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl] > 1248: } catch { > 1249: // TODO: Enumerate the exact reasons why it can fail > 1250: // But irrespective of it, it means we cannot proceed ! > 1251: case th: Throwable => { > 1252: throw new SparkException("YARN mode not available ?", th) > 1253: } > {noformat} > The comment suggests the specific exceptions should be enumerated here. > The try block could throw the following exceptions: > ClassNotFoundException > NegativeArraySizeException > NoSuchMethodException > SecurityException > InstantiationException > IllegalAccessException > IllegalArgumentException > InvocationTargetException > ClassCastException > == > = > Case 2: > Line: 282, File: "org/apache/spark/executor/Executor.scala" > {noformat} > 265: case t: Throwable => { > 266: val serviceTime = (System.currentTimeMillis() - > taskStart).toInt > 267: val metrics = attemptedTask.flatMap(t => t.metrics) > 268: for (m <- metrics) { > 269: m.executorRunTime = serviceTime > 270: m.jvmGCTime = gcTime - startGCTime > 271: } > 272: val reason = ExceptionFailure(t.getClass.getName, t.toString, > t.getStackTrace, metrics) > 273: execBackend.statusUpdate(taskId, TaskState.FAILED, > ser.serialize(reason)) > 274: > 275: // TODO: Should we exit the whole executor here? On the one > hand, the failed task may > 276: // have left some weird state around depending on when the > exception was thrown, but on > 277: // the other hand, maybe we could detect that when future > tasks fail and exit then. > 278: logError("Exception in task ID " + taskId, t) > 279: //System.exit(1) > 280: } > 281: } finally { > 282: // TODO: Unregister shuffle memory only for ResultTask > 283: val shuffleMemoryMap = env.shuffleMemoryMap > 284: shuffleMemoryMap.synchronized { > 285: shuffleMemoryMap.remove(Thread.currentThread().getId) > 286: } > 287: runningTasks.remove(taskId) > 288: } > {noformat} > From the comment in this Throwable exception handler it seems to suggest that > the system should just exit? > == > = > Case 3: > Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java" > {noformat} > 66: try { > 67: ctx.write(new DefaultFileRegion(new FileInputStream(file) > 68: .getChannel(), fileSegment.offset(), fileSegment.length())); > 69: } catch (Exception e) { > 70: LOG.error("Exception: ", e); > 71: } > {noformat} > "Exception" is too general. The try block only throws "FileNotFoundException". > Although there is nothing wrong with it now, but later if code evolves this > might cause some other exceptions to be swallowed. > == -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org