[ 
https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248941#comment-14248941
 ] 

Sean Owen commented on SPARK-1148:
----------------------------------

[~d.yuan] I don't know that these have been addressed; some issues like it have 
been, if I recall correctly. Whatever the status, I think it's useful to focus 
on master and/or later branches. Would it be OK to close this in favor of 
SPARK-4863?

> Suggestions for exception handling (avoid potential bugs)
> ---------------------------------------------------------
>
>                 Key: SPARK-1148
>                 URL: https://issues.apache.org/jira/browse/SPARK-1148
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 0.9.0
>            Reporter: Ding Yuan
>
> Hi Spark developers,
> We are a group of researchers on software reliability. Recently we did a 
> study and found that majority of the most severe failures in data-analytic 
> systems are caused by bugs in exception handlers – that it is hard to 
> anticipate all the possible real-world error scenarios. Therefore we built a 
> simple checking tool that automatically detects some bug patterns that have 
> caused some very severe real-world failures. I am reporting a few cases here. 
> Any feedback is much appreciated!
> Ding
> =========================
> Case 1:
> Line: 1249, File: "org/apache/spark/SparkContext.scala"
> {noformat}
> 1244:         val scheduler = try {
> 1245:           val clazz = 
> Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
> 1246:           val cons = clazz.getConstructor(classOf[SparkContext])
> 1247:           cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
> 1248:         } catch {
> 1249:           // TODO: Enumerate the exact reasons why it can fail
> 1250:           // But irrespective of it, it means we cannot proceed !
> 1251:           case th: Throwable => {
> 1252:             throw new SparkException("YARN mode not available ?", th)
> 1253:           }
> {noformat}
> The comment suggests the specific exceptions should be enumerated here.
> The try block could throw the following exceptions:
> ClassNotFoundException
> NegativeArraySizeException
> NoSuchMethodException
> SecurityException
> InstantiationException
> IllegalAccessException
> IllegalArgumentException
> InvocationTargetException
> ClassCastException
> ==========================================
> =========================
> Case 2:
> Line: 282, File: "org/apache/spark/executor/Executor.scala"
> {noformat}
> 265:         case t: Throwable => {
> 266:           val serviceTime = (System.currentTimeMillis() - 
> taskStart).toInt
> 267:           val metrics = attemptedTask.flatMap(t => t.metrics)
> 268:           for (m <- metrics) {
> 269:             m.executorRunTime = serviceTime
> 270:             m.jvmGCTime = gcTime - startGCTime
> 271:           }
> 272:           val reason = ExceptionFailure(t.getClass.getName, t.toString, 
> t.getStackTrace, metrics)
> 273:           execBackend.statusUpdate(taskId, TaskState.FAILED, 
> ser.serialize(reason))
> 274:
> 275:           // TODO: Should we exit the whole executor here? On the one 
> hand, the failed task may
> 276:           // have left some weird state around depending on when the 
> exception was thrown, but on
> 277:           // the other hand, maybe we could detect that when future 
> tasks fail and exit then.
> 278:           logError("Exception in task ID " + taskId, t)
> 279:           //System.exit(1)
> 280:         }
> 281:       } finally {
> 282:         // TODO: Unregister shuffle memory only for ResultTask
> 283:         val shuffleMemoryMap = env.shuffleMemoryMap
> 284:         shuffleMemoryMap.synchronized {
> 285:           shuffleMemoryMap.remove(Thread.currentThread().getId)
> 286:         }
> 287:         runningTasks.remove(taskId)
> 288:       }
> {noformat}
> From the comment in this Throwable exception handler it seems to suggest that 
> the system should just exit?
> ==========================================
> =========================
> Case 3:
> Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java"
> {noformat}
> 66:       try {
> 67:         ctx.write(new DefaultFileRegion(new FileInputStream(file)
> 68:           .getChannel(), fileSegment.offset(), fileSegment.length()));
> 69:       } catch (Exception e) {
> 70:           LOG.error("Exception: ", e);
> 71:       }
> {noformat}
> "Exception" is too general. The try block only throws "FileNotFoundException".
> Although there is nothing wrong with it now, but later if code evolves this
> might cause some other exceptions to be swallowed.
> ==========================================



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to