Antonio Murgia created SPARK-11350: -------------------------------------- Summary: There is no best practice to handle warnings or messages produced by Executors in a distributed manner Key: SPARK-11350 URL: https://issues.apache.org/jira/browse/SPARK-11350 Project: Spark Issue Type: Wish Components: Spark Core Reporter: Antonio Murgia
I looked around on the web and I couldn’t find any way to deal, in a distributed way with malformed/faulty records during computation. All I was able to find was the flatMap/Some/None technique + logging. I’m facing this problem because I have a processing algorithm that extracts more than one value from each record, but can fail in extracting one of those multiple values, and I want to keep track of them. Logging is not feasible because this “warning” happens so frequently that the logs would become overwhelming and impossibile to read. Since I have 3 different possible outcomes from my processing I modeled it with this class hierarchy: http://i.imgur.com/NIesYUm.png?1 That holds result and/or warnings. Since Result implements Traversable it can be used in a flatMap, discarding all warnings and failure results, in the other hand, if we want to keep track of warnings, we can elaborate them and output them if we need. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org