[ https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248941#comment-14248941 ]
Sean Owen commented on SPARK-1148: ---------------------------------- [~d.yuan] I don't know that these have been addressed; some issues like it have been, if I recall correctly. Whatever the status, I think it's useful to focus on master and/or later branches. Would it be OK to close this in favor of SPARK-4863? > Suggestions for exception handling (avoid potential bugs) > --------------------------------------------------------- > > Key: SPARK-1148 > URL: https://issues.apache.org/jira/browse/SPARK-1148 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 0.9.0 > Reporter: Ding Yuan > > Hi Spark developers, > We are a group of researchers on software reliability. Recently we did a > study and found that majority of the most severe failures in data-analytic > systems are caused by bugs in exception handlers – that it is hard to > anticipate all the possible real-world error scenarios. Therefore we built a > simple checking tool that automatically detects some bug patterns that have > caused some very severe real-world failures. I am reporting a few cases here. > Any feedback is much appreciated! > Ding > ========================= > Case 1: > Line: 1249, File: "org/apache/spark/SparkContext.scala" > {noformat} > 1244: val scheduler = try { > 1245: val clazz = > Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler") > 1246: val cons = clazz.getConstructor(classOf[SparkContext]) > 1247: cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl] > 1248: } catch { > 1249: // TODO: Enumerate the exact reasons why it can fail > 1250: // But irrespective of it, it means we cannot proceed ! > 1251: case th: Throwable => { > 1252: throw new SparkException("YARN mode not available ?", th) > 1253: } > {noformat} > The comment suggests the specific exceptions should be enumerated here. > The try block could throw the following exceptions: > ClassNotFoundException > NegativeArraySizeException > NoSuchMethodException > SecurityException > InstantiationException > IllegalAccessException > IllegalArgumentException > InvocationTargetException > ClassCastException > ========================================== > ========================= > Case 2: > Line: 282, File: "org/apache/spark/executor/Executor.scala" > {noformat} > 265: case t: Throwable => { > 266: val serviceTime = (System.currentTimeMillis() - > taskStart).toInt > 267: val metrics = attemptedTask.flatMap(t => t.metrics) > 268: for (m <- metrics) { > 269: m.executorRunTime = serviceTime > 270: m.jvmGCTime = gcTime - startGCTime > 271: } > 272: val reason = ExceptionFailure(t.getClass.getName, t.toString, > t.getStackTrace, metrics) > 273: execBackend.statusUpdate(taskId, TaskState.FAILED, > ser.serialize(reason)) > 274: > 275: // TODO: Should we exit the whole executor here? On the one > hand, the failed task may > 276: // have left some weird state around depending on when the > exception was thrown, but on > 277: // the other hand, maybe we could detect that when future > tasks fail and exit then. > 278: logError("Exception in task ID " + taskId, t) > 279: //System.exit(1) > 280: } > 281: } finally { > 282: // TODO: Unregister shuffle memory only for ResultTask > 283: val shuffleMemoryMap = env.shuffleMemoryMap > 284: shuffleMemoryMap.synchronized { > 285: shuffleMemoryMap.remove(Thread.currentThread().getId) > 286: } > 287: runningTasks.remove(taskId) > 288: } > {noformat} > From the comment in this Throwable exception handler it seems to suggest that > the system should just exit? > ========================================== > ========================= > Case 3: > Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java" > {noformat} > 66: try { > 67: ctx.write(new DefaultFileRegion(new FileInputStream(file) > 68: .getChannel(), fileSegment.offset(), fileSegment.length())); > 69: } catch (Exception e) { > 70: LOG.error("Exception: ", e); > 71: } > {noformat} > "Exception" is too general. The try block only throws "FileNotFoundException". > Although there is nothing wrong with it now, but later if code evolves this > might cause some other exceptions to be swallowed. > ========================================== -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org