Re: Best practices to handle corrupted records
Either[FailureResult[T], Either[SuccessWithWarnings[T], SuccessResult[T]]] maybe ? On Thu, Oct 15, 2015 at 5:31 PM, Antonio Murgia < antonio.murg...@studio.unibo.it> wrote: > 'Either' does not cover the case where the outcome was successful but > generated warnings. I already looked into it and also at 'Try' from which I > got inspired. Thanks for pointing it out anyway! > > #A.M. > > Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN < > eallain.po...@gmail.com> ha scritto: > > What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ? > > > On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu> wrote: > >> I came to a similar solution to a similar problem. I deal with a lot of >> CSV files from many different sources and they are often malformed. >> HOwever, I just have success/failure. Maybe you should make >> SuccessWithWarnings a subclass of success, or getting rid of it altogether >> making the warnings optional. >> I was thinking of making this cleaning/conforming library open source if >> you're interested. >> >> R. >> >> 2015-10-15 5:28 GMT-07:00 Antonio Murgia > >: >> >>> Hello, >>> I looked around on the web and I couldn’t find any way to deal in a >>> structured way with malformed/faulty records during computation. All I was >>> able to find was the flatMap/Some/None technique + logging. >>> I’m facing this problem because I have a processing algorithm that >>> extracts more than one value from each record, but can fail in extracting >>> one of those multiple values, and I want to keep track of them. Logging is >>> not feasible because this “warning” happens so frequently that the logs >>> would become overwhelming and impossibile to read. >>> Since I have 3 different possible outcomes from my processing I modeled >>> it with this class hierarchy: >>> That holds result and/or warnings. >>> Since Result implements Traversable it can be used in a flatMap, >>> discarding all warnings and failure results, in the other hand, if we want >>> to keep track of warnings, we can elaborate them and output them if we need. >>> >>> Kind Regards >>> #A.M. >>> >> >> >> >> -- >> -- >> "Good judgment comes from experience. >> Experience comes from bad judgment" >> -- >> > >
Re: Best practices to handle corrupted records
+1 Erwan.. May be a trivial solution like this - class Result (msg: String, record: Record) class Success (msgSuccess: String, val msg: String, val record: Record) extends Result(msg, record) class Failure (msgFailure: String, val msg: String, val record: Record) extends Result (msg, record) trait Warning { } class SuccessWithWarning(msgWaring: String, val msgSuccess:String, override val msg: String, override val record: Record) extends Success(msgSuccess, msg, record) with Warning val record1 = new Record("k1", "val11", "val21") val record2 = new Record("k2", "val12", "val22") val record3 = new Record("k3", "val13", "val23") val record4 = new Record("k4", "val14", "val24") val records : List[Record] = List (record1, record2, record3, record4 ) def processRecord(record: Record) : Either[Result,Result] = { //(record, new Result) val result: Either[Result,Result] = { if (record.key.equals("k1")) Left(new Failure("failed", "result", record)) else successHandler(record) } result } def successHandler (record: Record): Either[Result, Result] = { val result: Either[Result, Result] = { if (record.key.equals("k2")) Left(new Success("success", "result", record)) else Right(new SuccessWithWarning("warning", "success", "result", record)) } result } for(record <- records) { println (processRecord(record)) } On Fri, Oct 16, 2015 at 1:45 PM Erwan ALLAINwrote: > Either[FailureResult[T], Either[SuccessWithWarnings[T], > SuccessResult[T]]] maybe ? > > > On Thu, Oct 15, 2015 at 5:31 PM, Antonio Murgia < > antonio.murg...@studio.unibo.it> wrote: > >> 'Either' does not cover the case where the outcome was successful but >> generated warnings. I already looked into it and also at 'Try' from which I >> got inspired. Thanks for pointing it out anyway! >> >> #A.M. >> >> Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN < >> eallain.po...@gmail.com> ha scritto: >> >> What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ? >> >> >> On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu > > wrote: >> >>> I came to a similar solution to a similar problem. I deal with a lot of >>> CSV files from many different sources and they are often malformed. >>> HOwever, I just have success/failure. Maybe you should make >>> SuccessWithWarnings a subclass of success, or getting rid of it altogether >>> making the warnings optional. >>> I was thinking of making this cleaning/conforming library open source if >>> you're interested. >>> >>> R. >>> >>> 2015-10-15 5:28 GMT-07:00 Antonio Murgia < >>> antonio.murg...@studio.unibo.it>: >>> Hello, I looked around on the web and I couldn’t find any way to deal in a structured way with malformed/faulty records during computation. All I was able to find was the flatMap/Some/None technique + logging. I’m facing this problem because I have a processing algorithm that extracts more than one value from each record, but can fail in extracting one of those multiple values, and I want to keep track of them. Logging is not feasible because this “warning” happens so frequently that the logs would become overwhelming and impossibile to read. Since I have 3 different possible outcomes from my processing I modeled it with this class hierarchy: That holds result and/or warnings. Since Result implements Traversable it can be used in a flatMap, discarding all warnings and failure results, in the other hand, if we want to keep track of warnings, we can elaborate them and output them if we need. Kind Regards #A.M. >>> >>> >>> >>> -- >>> -- >>> "Good judgment comes from experience. >>> Experience comes from bad judgment" >>> -- >>> >> >> >
Re: Best practices to handle corrupted records
Unfortunately Either doesn’t accept 3 type parameters but only 2 so Either solution is not viable. My solution is pretty similar to Ravindra one. This “post” was to find out if there was a common and established solution to this problem, in the spark “world”. On Oct 16, 2015, at 11:05 AM, Ravindra> wrote: +1 Erwan.. May be a trivial solution like this - class Result (msg: String, record: Record) class Success (msgSuccess: String, val msg: String, val record: Record) extends Result(msg, record) class Failure (msgFailure: String, val msg: String, val record: Record) extends Result (msg, record) trait Warning { } class SuccessWithWarning(msgWaring: String, val msgSuccess:String, override val msg: String, override val record: Record) extends Success(msgSuccess, msg, record) with Warning val record1 = new Record("k1", "val11", "val21") val record2 = new Record("k2", "val12", "val22") val record3 = new Record("k3", "val13", "val23") val record4 = new Record("k4", "val14", "val24") val records : List[Record] = List (record1, record2, record3, record4 ) def processRecord(record: Record) : Either[Result,Result] = { //(record, new Result) val result: Either[Result,Result] = { if (record.key.equals("k1")) Left(new Failure("failed", "result", record)) else successHandler(record) } result } def successHandler (record: Record): Either[Result, Result] = { val result: Either[Result, Result] = { if (record.key.equals("k2")) Left(new Success("success", "result", record)) else Right(new SuccessWithWarning("warning", "success", "result", record)) } result } for(record <- records) { println (processRecord(record)) } On Fri, Oct 16, 2015 at 1:45 PM Erwan ALLAIN > wrote: Either[FailureResult[T], Either[SuccessWithWarnings[T], SuccessResult[T]]] maybe ? On Thu, Oct 15, 2015 at 5:31 PM, Antonio Murgia > wrote: 'Either' does not cover the case where the outcome was successful but generated warnings. I already looked into it and also at 'Try' from which I got inspired. Thanks for pointing it out anyway! #A.M. Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN > ha scritto: What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ? On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu > wrote: I came to a similar solution to a similar problem. I deal with a lot of CSV files from many different sources and they are often malformed. HOwever, I just have success/failure. Maybe you should make SuccessWithWarnings a subclass of success, or getting rid of it altogether making the warnings optional. I was thinking of making this cleaning/conforming library open source if you're interested. R. 2015-10-15 5:28 GMT-07:00 Antonio Murgia >: Hello, I looked around on the web and I couldn’t find any way to deal in a structured way with malformed/faulty records during computation. All I was able to find was the flatMap/Some/None technique + logging. I’m facing this problem because I have a processing algorithm that extracts more than one value from each record, but can fail in extracting one of those multiple values, and I want to keep track of them. Logging is not feasible because this “warning” happens so frequently that the logs would become overwhelming and impossibile to read. Since I have 3 different possible outcomes from my processing I modeled it with this class hierarchy: That holds result and/or warnings. Since Result implements Traversable it can be used in a flatMap, discarding all warnings and failure results, in the other hand, if we want to keep track of warnings, we can elaborate them and output them if we need. Kind Regards #A.M. -- -- "Good judgment comes from experience. Experience comes from bad judgment" --
Re: Best practices to handle corrupted records
I came to a similar solution to a similar problem. I deal with a lot of CSV files from many different sources and they are often malformed. HOwever, I just have success/failure. Maybe you should make SuccessWithWarnings a subclass of success, or getting rid of it altogether making the warnings optional. I was thinking of making this cleaning/conforming library open source if you're interested. R. 2015-10-15 5:28 GMT-07:00 Antonio Murgia: > Hello, > I looked around on the web and I couldn’t find any way to deal in a > structured way with malformed/faulty records during computation. All I was > able to find was the flatMap/Some/None technique + logging. > I’m facing this problem because I have a processing algorithm that > extracts more than one value from each record, but can fail in extracting > one of those multiple values, and I want to keep track of them. Logging is > not feasible because this “warning” happens so frequently that the logs > would become overwhelming and impossibile to read. > Since I have 3 different possible outcomes from my processing I modeled it > with this class hierarchy: > That holds result and/or warnings. > Since Result implements Traversable it can be used in a flatMap, > discarding all warnings and failure results, in the other hand, if we want > to keep track of warnings, we can elaborate them and output them if we need. > > Kind Regards > #A.M. > -- -- "Good judgment comes from experience. Experience comes from bad judgment" --
Re: Best practices to handle corrupted records
'Either' does not cover the case where the outcome was successful but generated warnings. I already looked into it and also at 'Try' from which I got inspired. Thanks for pointing it out anyway! #A.M. Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN> ha scritto: What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ? On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu > wrote: I came to a similar solution to a similar problem. I deal with a lot of CSV files from many different sources and they are often malformed. HOwever, I just have success/failure. Maybe you should make SuccessWithWarnings a subclass of success, or getting rid of it altogether making the warnings optional. I was thinking of making this cleaning/conforming library open source if you're interested. R. 2015-10-15 5:28 GMT-07:00 Antonio Murgia >: Hello, I looked around on the web and I couldn't find any way to deal in a structured way with malformed/faulty records during computation. All I was able to find was the flatMap/Some/None technique + logging. I'm facing this problem because I have a processing algorithm that extracts more than one value from each record, but can fail in extracting one of those multiple values, and I want to keep track of them. Logging is not feasible because this "warning" happens so frequently that the logs would become overwhelming and impossibile to read. Since I have 3 different possible outcomes from my processing I modeled it with this class hierarchy: [cid:935118B9-A7BA-4D67-815A-B861FA866DAF] That holds result and/or warnings. Since Result implements Traversable it can be used in a flatMap, discarding all warnings and failure results, in the other hand, if we want to keep track of warnings, we can elaborate them and output them if we need. Kind Regards #A.M. -- -- "Good judgment comes from experience. Experience comes from bad judgment" --
Re: Best practices to handle corrupted records
What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ? On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiuwrote: > I came to a similar solution to a similar problem. I deal with a lot of > CSV files from many different sources and they are often malformed. > HOwever, I just have success/failure. Maybe you should make > SuccessWithWarnings a subclass of success, or getting rid of it altogether > making the warnings optional. > I was thinking of making this cleaning/conforming library open source if > you're interested. > > R. > > 2015-10-15 5:28 GMT-07:00 Antonio Murgia > : > >> Hello, >> I looked around on the web and I couldn’t find any way to deal in a >> structured way with malformed/faulty records during computation. All I was >> able to find was the flatMap/Some/None technique + logging. >> I’m facing this problem because I have a processing algorithm that >> extracts more than one value from each record, but can fail in extracting >> one of those multiple values, and I want to keep track of them. Logging is >> not feasible because this “warning” happens so frequently that the logs >> would become overwhelming and impossibile to read. >> Since I have 3 different possible outcomes from my processing I modeled >> it with this class hierarchy: >> That holds result and/or warnings. >> Since Result implements Traversable it can be used in a flatMap, >> discarding all warnings and failure results, in the other hand, if we want >> to keep track of warnings, we can elaborate them and output them if we need. >> >> Kind Regards >> #A.M. >> > > > > -- > -- > "Good judgment comes from experience. > Experience comes from bad judgment" > -- >