We already have SparkException, indeed. The ID is an interesting idea; simple to implement and might help disambiguate.
Does it solve a lot of problems of this form? if something is squelching Exception or SparkException the result will be the same. #2 is something we can sniff out with static analysis pretty easily, but not as much #1. Ideally we'd just fix blocks like this but I bet there are lots of them. I like the idea but for a different reason, and that's that it's probably best to control exceptions that propagate from the public API, since in some cases they're a meaningful part of the API (see https://issues.apache.org/jira/browse/SPARK-8393 which I'm hoping to fix now) And the catch there is -- throwing checked exceptions from Scala code in a way that Java code can catch requires annotating lots of methods. On Mon, Apr 18, 2016 at 8:16 PM, Reynold Xin <r...@databricks.com> wrote: > Josh's pull request on rpc exception handling got me to think ... > > In my experience, there have been a few things related exceptions that > created a lot of trouble for us in production debugging: > > 1. Some exception is thrown, but is caught by some try/catch that does not > do any logging nor rethrow. > 2. Some exception is thrown, but is caught by some try/catch that does not > do any logging, but do rethrow. But the original exception is now masked. > 2. Multiple exceptions are logged at different places close to each other, > but we don't know whether they are caused by the same problem or not. > > > To mitigate some of the above, here's an idea ... > > (1) Create a common root class for all the exceptions (e.g. call it > SparkException) used in Spark. We should make sure every time we catch an > exception from a 3rd party library, we rethrow them as SparkException (a lot > of places already do that). In SparkException's constructor, log the > exception and the stacktrace. > > (2) SparkException has a monotonically increasing ID, and this ID appears in > the exception error message (say at the end). > > > I think (1) will eliminate most of the cases that an exception gets > swallowed. The main downside I can think of is we might log an exception > multiple times. However, I'd argue exceptions should be rare, and it is not > that big of a deal to log them twice or three times. The unique ID (2) can > help us correlate exceptions if they appear multiple times. > > Thoughts? > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org