[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)

2014-12-16 Thread Ding Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249028#comment-14249028
 ] 

Ding Yuan commented on SPARK-1148:
--

of course, please just close this one then. Thanks!

> Suggestions for exception handling (avoid potential bugs)
> -
>
> Key: SPARK-1148
> URL: https://issues.apache.org/jira/browse/SPARK-1148
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 0.9.0
>Reporter: Ding Yuan
>
> Hi Spark developers,
> We are a group of researchers on software reliability. Recently we did a 
> study and found that majority of the most severe failures in data-analytic 
> systems are caused by bugs in exception handlers – that it is hard to 
> anticipate all the possible real-world error scenarios. Therefore we built a 
> simple checking tool that automatically detects some bug patterns that have 
> caused some very severe real-world failures. I am reporting a few cases here. 
> Any feedback is much appreciated!
> Ding
> =
> Case 1:
> Line: 1249, File: "org/apache/spark/SparkContext.scala"
> {noformat}
> 1244: val scheduler = try {
> 1245:   val clazz = 
> Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
> 1246:   val cons = clazz.getConstructor(classOf[SparkContext])
> 1247:   cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
> 1248: } catch {
> 1249:   // TODO: Enumerate the exact reasons why it can fail
> 1250:   // But irrespective of it, it means we cannot proceed !
> 1251:   case th: Throwable => {
> 1252: throw new SparkException("YARN mode not available ?", th)
> 1253:   }
> {noformat}
> The comment suggests the specific exceptions should be enumerated here.
> The try block could throw the following exceptions:
> ClassNotFoundException
> NegativeArraySizeException
> NoSuchMethodException
> SecurityException
> InstantiationException
> IllegalAccessException
> IllegalArgumentException
> InvocationTargetException
> ClassCastException
> ==
> =
> Case 2:
> Line: 282, File: "org/apache/spark/executor/Executor.scala"
> {noformat}
> 265: case t: Throwable => {
> 266:   val serviceTime = (System.currentTimeMillis() - 
> taskStart).toInt
> 267:   val metrics = attemptedTask.flatMap(t => t.metrics)
> 268:   for (m <- metrics) {
> 269: m.executorRunTime = serviceTime
> 270: m.jvmGCTime = gcTime - startGCTime
> 271:   }
> 272:   val reason = ExceptionFailure(t.getClass.getName, t.toString, 
> t.getStackTrace, metrics)
> 273:   execBackend.statusUpdate(taskId, TaskState.FAILED, 
> ser.serialize(reason))
> 274:
> 275:   // TODO: Should we exit the whole executor here? On the one 
> hand, the failed task may
> 276:   // have left some weird state around depending on when the 
> exception was thrown, but on
> 277:   // the other hand, maybe we could detect that when future 
> tasks fail and exit then.
> 278:   logError("Exception in task ID " + taskId, t)
> 279:   //System.exit(1)
> 280: }
> 281:   } finally {
> 282: // TODO: Unregister shuffle memory only for ResultTask
> 283: val shuffleMemoryMap = env.shuffleMemoryMap
> 284: shuffleMemoryMap.synchronized {
> 285:   shuffleMemoryMap.remove(Thread.currentThread().getId)
> 286: }
> 287: runningTasks.remove(taskId)
> 288:   }
> {noformat}
> From the comment in this Throwable exception handler it seems to suggest that 
> the system should just exit?
> ==
> =
> Case 3:
> Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java"
> {noformat}
> 66:   try {
> 67: ctx.write(new DefaultFileRegion(new FileInputStream(file)
> 68:   .getChannel(), fileSegment.offset(), fileSegment.length()));
> 69:   } catch (Exception e) {
> 70:   LOG.error("Exception: ", e);
> 71:   }
> {noformat}
> "Exception" is too general. The try block only throws "FileNotFoundException".
> Although there is nothing wrong with it now, but later if code evolves this
> might cause some other exceptions to be swallowed.
> ==



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)

2014-12-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248941#comment-14248941
 ] 

Sean Owen commented on SPARK-1148:
--

[~d.yuan] I don't know that these have been addressed; some issues like it have 
been, if I recall correctly. Whatever the status, I think it's useful to focus 
on master and/or later branches. Would it be OK to close this in favor of 
SPARK-4863?

> Suggestions for exception handling (avoid potential bugs)
> -
>
> Key: SPARK-1148
> URL: https://issues.apache.org/jira/browse/SPARK-1148
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 0.9.0
>Reporter: Ding Yuan
>
> Hi Spark developers,
> We are a group of researchers on software reliability. Recently we did a 
> study and found that majority of the most severe failures in data-analytic 
> systems are caused by bugs in exception handlers – that it is hard to 
> anticipate all the possible real-world error scenarios. Therefore we built a 
> simple checking tool that automatically detects some bug patterns that have 
> caused some very severe real-world failures. I am reporting a few cases here. 
> Any feedback is much appreciated!
> Ding
> =
> Case 1:
> Line: 1249, File: "org/apache/spark/SparkContext.scala"
> {noformat}
> 1244: val scheduler = try {
> 1245:   val clazz = 
> Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
> 1246:   val cons = clazz.getConstructor(classOf[SparkContext])
> 1247:   cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
> 1248: } catch {
> 1249:   // TODO: Enumerate the exact reasons why it can fail
> 1250:   // But irrespective of it, it means we cannot proceed !
> 1251:   case th: Throwable => {
> 1252: throw new SparkException("YARN mode not available ?", th)
> 1253:   }
> {noformat}
> The comment suggests the specific exceptions should be enumerated here.
> The try block could throw the following exceptions:
> ClassNotFoundException
> NegativeArraySizeException
> NoSuchMethodException
> SecurityException
> InstantiationException
> IllegalAccessException
> IllegalArgumentException
> InvocationTargetException
> ClassCastException
> ==
> =
> Case 2:
> Line: 282, File: "org/apache/spark/executor/Executor.scala"
> {noformat}
> 265: case t: Throwable => {
> 266:   val serviceTime = (System.currentTimeMillis() - 
> taskStart).toInt
> 267:   val metrics = attemptedTask.flatMap(t => t.metrics)
> 268:   for (m <- metrics) {
> 269: m.executorRunTime = serviceTime
> 270: m.jvmGCTime = gcTime - startGCTime
> 271:   }
> 272:   val reason = ExceptionFailure(t.getClass.getName, t.toString, 
> t.getStackTrace, metrics)
> 273:   execBackend.statusUpdate(taskId, TaskState.FAILED, 
> ser.serialize(reason))
> 274:
> 275:   // TODO: Should we exit the whole executor here? On the one 
> hand, the failed task may
> 276:   // have left some weird state around depending on when the 
> exception was thrown, but on
> 277:   // the other hand, maybe we could detect that when future 
> tasks fail and exit then.
> 278:   logError("Exception in task ID " + taskId, t)
> 279:   //System.exit(1)
> 280: }
> 281:   } finally {
> 282: // TODO: Unregister shuffle memory only for ResultTask
> 283: val shuffleMemoryMap = env.shuffleMemoryMap
> 284: shuffleMemoryMap.synchronized {
> 285:   shuffleMemoryMap.remove(Thread.currentThread().getId)
> 286: }
> 287: runningTasks.remove(taskId)
> 288:   }
> {noformat}
> From the comment in this Throwable exception handler it seems to suggest that 
> the system should just exit?
> ==
> =
> Case 3:
> Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java"
> {noformat}
> 66:   try {
> 67: ctx.write(new DefaultFileRegion(new FileInputStream(file)
> 68:   .getChannel(), fileSegment.offset(), fileSegment.length()));
> 69:   } catch (Exception e) {
> 70:   LOG.error("Exception: ", e);
> 71:   }
> {noformat}
> "Exception" is too general. The try block only throws "FileNotFoundException".
> Although there is nothing wrong with it now, but later if code evolves this
> might cause some other exceptions to be swallowed.
> ==



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.a

[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)

2014-12-16 Thread Ding Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248707#comment-14248707
 ] 

Ding Yuan commented on SPARK-1148:
--

Glad to know these cases are addressed! As for version 0.9.0, I have reported 
all the cases (only three of them). I have opened another JIRA: 
https://issues.apache.org/jira/browse/SPARK-4863 to report the warnings on 
spark-1.1.1. 

> Suggestions for exception handling (avoid potential bugs)
> -
>
> Key: SPARK-1148
> URL: https://issues.apache.org/jira/browse/SPARK-1148
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 0.9.0
>Reporter: Ding Yuan
>
> Hi Spark developers,
> We are a group of researchers on software reliability. Recently we did a 
> study and found that majority of the most severe failures in data-analytic 
> systems are caused by bugs in exception handlers – that it is hard to 
> anticipate all the possible real-world error scenarios. Therefore we built a 
> simple checking tool that automatically detects some bug patterns that have 
> caused some very severe real-world failures. I am reporting a few cases here. 
> Any feedback is much appreciated!
> Ding
> =
> Case 1:
> Line: 1249, File: "org/apache/spark/SparkContext.scala"
> {noformat}
> 1244: val scheduler = try {
> 1245:   val clazz = 
> Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
> 1246:   val cons = clazz.getConstructor(classOf[SparkContext])
> 1247:   cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
> 1248: } catch {
> 1249:   // TODO: Enumerate the exact reasons why it can fail
> 1250:   // But irrespective of it, it means we cannot proceed !
> 1251:   case th: Throwable => {
> 1252: throw new SparkException("YARN mode not available ?", th)
> 1253:   }
> {noformat}
> The comment suggests the specific exceptions should be enumerated here.
> The try block could throw the following exceptions:
> ClassNotFoundException
> NegativeArraySizeException
> NoSuchMethodException
> SecurityException
> InstantiationException
> IllegalAccessException
> IllegalArgumentException
> InvocationTargetException
> ClassCastException
> ==
> =
> Case 2:
> Line: 282, File: "org/apache/spark/executor/Executor.scala"
> {noformat}
> 265: case t: Throwable => {
> 266:   val serviceTime = (System.currentTimeMillis() - 
> taskStart).toInt
> 267:   val metrics = attemptedTask.flatMap(t => t.metrics)
> 268:   for (m <- metrics) {
> 269: m.executorRunTime = serviceTime
> 270: m.jvmGCTime = gcTime - startGCTime
> 271:   }
> 272:   val reason = ExceptionFailure(t.getClass.getName, t.toString, 
> t.getStackTrace, metrics)
> 273:   execBackend.statusUpdate(taskId, TaskState.FAILED, 
> ser.serialize(reason))
> 274:
> 275:   // TODO: Should we exit the whole executor here? On the one 
> hand, the failed task may
> 276:   // have left some weird state around depending on when the 
> exception was thrown, but on
> 277:   // the other hand, maybe we could detect that when future 
> tasks fail and exit then.
> 278:   logError("Exception in task ID " + taskId, t)
> 279:   //System.exit(1)
> 280: }
> 281:   } finally {
> 282: // TODO: Unregister shuffle memory only for ResultTask
> 283: val shuffleMemoryMap = env.shuffleMemoryMap
> 284: shuffleMemoryMap.synchronized {
> 285:   shuffleMemoryMap.remove(Thread.currentThread().getId)
> 286: }
> 287: runningTasks.remove(taskId)
> 288:   }
> {noformat}
> From the comment in this Throwable exception handler it seems to suggest that 
> the system should just exit?
> ==
> =
> Case 3:
> Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java"
> {noformat}
> 66:   try {
> 67: ctx.write(new DefaultFileRegion(new FileInputStream(file)
> 68:   .getChannel(), fileSegment.offset(), fileSegment.length()));
> 69:   } catch (Exception e) {
> 70:   LOG.error("Exception: ", e);
> 71:   }
> {noformat}
> "Exception" is too general. The try block only throws "FileNotFoundException".
> Although there is nothing wrong with it now, but later if code evolves this
> might cause some other exceptions to be swallowed.
> ==



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For 

[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)

2014-11-25 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224305#comment-14224305
 ] 

Sean Owen commented on SPARK-1148:
--

[~d.yuan] Several problems like this have been fixed since. Can you open a PR 
to resolve any others that your tool has found or is it all fixed now?

> Suggestions for exception handling (avoid potential bugs)
> -
>
> Key: SPARK-1148
> URL: https://issues.apache.org/jira/browse/SPARK-1148
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 0.9.0
>Reporter: Ding Yuan
>
> Hi Spark developers,
> We are a group of researchers on software reliability. Recently we did a 
> study and found that majority of the most severe failures in data-analytic 
> systems are caused by bugs in exception handlers – that it is hard to 
> anticipate all the possible real-world error scenarios. Therefore we built a 
> simple checking tool that automatically detects some bug patterns that have 
> caused some very severe real-world failures. I am reporting a few cases here. 
> Any feedback is much appreciated!
> Ding
> =
> Case 1:
> Line: 1249, File: "org/apache/spark/SparkContext.scala"
> {noformat}
> 1244: val scheduler = try {
> 1245:   val clazz = 
> Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
> 1246:   val cons = clazz.getConstructor(classOf[SparkContext])
> 1247:   cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
> 1248: } catch {
> 1249:   // TODO: Enumerate the exact reasons why it can fail
> 1250:   // But irrespective of it, it means we cannot proceed !
> 1251:   case th: Throwable => {
> 1252: throw new SparkException("YARN mode not available ?", th)
> 1253:   }
> {noformat}
> The comment suggests the specific exceptions should be enumerated here.
> The try block could throw the following exceptions:
> ClassNotFoundException
> NegativeArraySizeException
> NoSuchMethodException
> SecurityException
> InstantiationException
> IllegalAccessException
> IllegalArgumentException
> InvocationTargetException
> ClassCastException
> ==
> =
> Case 2:
> Line: 282, File: "org/apache/spark/executor/Executor.scala"
> {noformat}
> 265: case t: Throwable => {
> 266:   val serviceTime = (System.currentTimeMillis() - 
> taskStart).toInt
> 267:   val metrics = attemptedTask.flatMap(t => t.metrics)
> 268:   for (m <- metrics) {
> 269: m.executorRunTime = serviceTime
> 270: m.jvmGCTime = gcTime - startGCTime
> 271:   }
> 272:   val reason = ExceptionFailure(t.getClass.getName, t.toString, 
> t.getStackTrace, metrics)
> 273:   execBackend.statusUpdate(taskId, TaskState.FAILED, 
> ser.serialize(reason))
> 274:
> 275:   // TODO: Should we exit the whole executor here? On the one 
> hand, the failed task may
> 276:   // have left some weird state around depending on when the 
> exception was thrown, but on
> 277:   // the other hand, maybe we could detect that when future 
> tasks fail and exit then.
> 278:   logError("Exception in task ID " + taskId, t)
> 279:   //System.exit(1)
> 280: }
> 281:   } finally {
> 282: // TODO: Unregister shuffle memory only for ResultTask
> 283: val shuffleMemoryMap = env.shuffleMemoryMap
> 284: shuffleMemoryMap.synchronized {
> 285:   shuffleMemoryMap.remove(Thread.currentThread().getId)
> 286: }
> 287: runningTasks.remove(taskId)
> 288:   }
> {noformat}
> From the comment in this Throwable exception handler it seems to suggest that 
> the system should just exit?
> ==
> =
> Case 3:
> Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java"
> {noformat}
> 66:   try {
> 67: ctx.write(new DefaultFileRegion(new FileInputStream(file)
> 68:   .getChannel(), fileSegment.offset(), fileSegment.length()));
> 69:   } catch (Exception e) {
> 70:   LOG.error("Exception: ", e);
> 71:   }
> {noformat}
> "Exception" is too general. The try block only throws "FileNotFoundException".
> Although there is nothing wrong with it now, but later if code evolves this
> might cause some other exceptions to be swallowed.
> ==



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org