Re: Best practices to handle corrupted records

2015-10-16 Thread Erwan ALLAIN
Either[FailureResult[T], Either[SuccessWithWarnings[T], SuccessResult[T]]]
maybe ?


On Thu, Oct 15, 2015 at 5:31 PM, Antonio Murgia <
antonio.murg...@studio.unibo.it> wrote:

> 'Either' does not cover the case where the outcome was successful but
> generated warnings. I already looked into it and also at 'Try' from which I
> got inspired. Thanks for pointing it out anyway!
>
> #A.M.
>
> Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN <
> eallain.po...@gmail.com> ha scritto:
>
> What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ?
>
>
> On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu 
> wrote:
>
>> I came to a similar solution to a similar problem. I deal with a lot of
>> CSV files from many different sources and they are often malformed.
>> HOwever, I just have success/failure. Maybe you should  make
>> SuccessWithWarnings a subclass of success, or getting rid of it altogether
>> making the warnings optional.
>> I was thinking of making this cleaning/conforming library open source if
>> you're interested.
>>
>> R.
>>
>> 2015-10-15 5:28 GMT-07:00 Antonio Murgia > >:
>>
>>> Hello,
>>> I looked around on the web and I couldn’t find any way to deal in a
>>> structured way with malformed/faulty records during computation. All I was
>>> able to find was the flatMap/Some/None technique + logging.
>>> I’m facing this problem because I have a processing algorithm that
>>> extracts more than one value from each record, but can fail in extracting
>>> one of those multiple values, and I want to keep track of them. Logging is
>>> not feasible because this “warning” happens so frequently that the logs
>>> would become overwhelming and impossibile to read.
>>> Since I have 3 different possible outcomes from my processing I modeled
>>> it with this class hierarchy:
>>> That holds result and/or warnings.
>>> Since Result implements Traversable it can be used in a flatMap,
>>> discarding all warnings and failure results, in the other hand, if we want
>>> to keep track of warnings, we can elaborate them and output them if we need.
>>>
>>> Kind Regards
>>> #A.M.
>>>
>>
>>
>>
>> --
>> --
>> "Good judgment comes from experience.
>> Experience comes from bad judgment"
>> --
>>
>
>


Re: Best practices to handle corrupted records

2015-10-16 Thread Ravindra
+1 Erwan..

May be a trivial solution like this -
class Result (msg: String, record: Record)

class Success (msgSuccess: String, val msg: String, val record: Record)
extends Result(msg, record)

class Failure (msgFailure: String, val msg: String, val record: Record)
extends Result (msg, record)

trait Warning {

}
class SuccessWithWarning(msgWaring: String, val msgSuccess:String,
override val msg: String, override val record: Record)
extends Success(msgSuccess, msg, record) with Warning


val record1 = new Record("k1", "val11", "val21")
val record2 = new Record("k2", "val12", "val22")
val record3 = new Record("k3", "val13", "val23")
val record4 = new Record("k4", "val14", "val24")
val records : List[Record] = List (record1, record2, record3, record4 )

def processRecord(record: Record) : Either[Result,Result] = {
//(record, new Result)
val result: Either[Result,Result] = {
if (record.key.equals("k1"))
Left(new Failure("failed", "result", record))
else
successHandler(record)
}
result
}

def successHandler (record: Record): Either[Result, Result] = {
val result: Either[Result, Result] = {
if (record.key.equals("k2"))
Left(new Success("success", "result", record))
else Right(new SuccessWithWarning("warning", "success", "result", record))
}
result
}

for(record <- records) {
println (processRecord(record))
}


On Fri, Oct 16, 2015 at 1:45 PM Erwan ALLAIN 
wrote:

> Either[FailureResult[T], Either[SuccessWithWarnings[T],
> SuccessResult[T]]]  maybe ?
>
>
> On Thu, Oct 15, 2015 at 5:31 PM, Antonio Murgia <
> antonio.murg...@studio.unibo.it> wrote:
>
>> 'Either' does not cover the case where the outcome was successful but
>> generated warnings. I already looked into it and also at 'Try' from which I
>> got inspired. Thanks for pointing it out anyway!
>>
>> #A.M.
>>
>> Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN <
>> eallain.po...@gmail.com> ha scritto:
>>
>> What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ?
>>
>>
>> On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu > > wrote:
>>
>>> I came to a similar solution to a similar problem. I deal with a lot of
>>> CSV files from many different sources and they are often malformed.
>>> HOwever, I just have success/failure. Maybe you should  make
>>> SuccessWithWarnings a subclass of success, or getting rid of it altogether
>>> making the warnings optional.
>>> I was thinking of making this cleaning/conforming library open source if
>>> you're interested.
>>>
>>> R.
>>>
>>> 2015-10-15 5:28 GMT-07:00 Antonio Murgia <
>>> antonio.murg...@studio.unibo.it>:
>>>
 Hello,
 I looked around on the web and I couldn’t find any way to deal in a
 structured way with malformed/faulty records during computation. All I was
 able to find was the flatMap/Some/None technique + logging.
 I’m facing this problem because I have a processing algorithm that
 extracts more than one value from each record, but can fail in extracting
 one of those multiple values, and I want to keep track of them. Logging is
 not feasible because this “warning” happens so frequently that the logs
 would become overwhelming and impossibile to read.
 Since I have 3 different possible outcomes from my processing I modeled
 it with this class hierarchy:
 That holds result and/or warnings.
 Since Result implements Traversable it can be used in a flatMap,
 discarding all warnings and failure results, in the other hand, if we want
 to keep track of warnings, we can elaborate them and output them if we 
 need.

 Kind Regards
 #A.M.

>>>
>>>
>>>
>>> --
>>> --
>>> "Good judgment comes from experience.
>>> Experience comes from bad judgment"
>>> --
>>>
>>
>>
>


Re: Best practices to handle corrupted records

2015-10-16 Thread Antonio Murgia
Unfortunately Either doesn’t accept 3 type parameters but only 2 so Either 
solution is not viable.
My solution is pretty similar to Ravindra one. This “post” was to find out if 
there was a common and established solution to this problem, in the spark 
“world”.
On Oct 16, 2015, at 11:05 AM, Ravindra 
> wrote:

+1 Erwan..

May be a trivial solution like this -
class Result (msg: String, record: Record)

class Success (msgSuccess: String, val msg: String, val record: Record)
extends Result(msg, record)

class Failure (msgFailure: String, val msg: String, val record: Record)
extends Result (msg, record)

trait Warning {

}
class SuccessWithWarning(msgWaring: String, val msgSuccess:String,
override val msg: String, override val record: Record)
extends Success(msgSuccess, msg, record) with Warning


val record1 = new Record("k1", "val11", "val21")
val record2 = new Record("k2", "val12", "val22")
val record3 = new Record("k3", "val13", "val23")
val record4 = new Record("k4", "val14", "val24")
val records : List[Record] = List (record1, record2, record3, record4 )

def processRecord(record: Record) : Either[Result,Result] = {
//(record, new Result)
val result: Either[Result,Result] = {
if (record.key.equals("k1"))
Left(new Failure("failed", "result", record))
else
successHandler(record)
}
result
}

def successHandler (record: Record): Either[Result, Result] = {
val result: Either[Result, Result] = {
if (record.key.equals("k2"))
Left(new Success("success", "result", record))
else Right(new SuccessWithWarning("warning", "success", "result", record))
}
result
}

for(record <- records) {
println (processRecord(record))
}


On Fri, Oct 16, 2015 at 1:45 PM Erwan ALLAIN 
> wrote:
Either[FailureResult[T], Either[SuccessWithWarnings[T], SuccessResult[T]]]  
maybe ?


On Thu, Oct 15, 2015 at 5:31 PM, Antonio Murgia 
> wrote:
'Either' does not cover the case where the outcome was successful but generated 
warnings. I already looked into it and also at 'Try' from which I got inspired. 
Thanks for pointing it out anyway!

#A.M.

Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN 
> ha scritto:

What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ?


On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu 
> wrote:
I came to a similar solution to a similar problem. I deal with a lot of CSV 
files from many different sources and they are often malformed.
HOwever, I just have success/failure. Maybe you should  make 
SuccessWithWarnings a subclass of success, or getting rid of it altogether 
making the warnings optional.
I was thinking of making this cleaning/conforming library open source if you're 
interested.

R.

2015-10-15 5:28 GMT-07:00 Antonio Murgia 
>:
Hello,
I looked around on the web and I couldn’t find any way to deal in a structured 
way with malformed/faulty records during computation. All I was able to find 
was the flatMap/Some/None technique + logging.
I’m facing this problem because I have a processing algorithm that extracts 
more than one value from each record, but can fail in extracting one of those 
multiple values, and I want to keep track of them. Logging is not feasible 
because this “warning” happens so frequently that the logs would become 
overwhelming and impossibile to read.
Since I have 3 different possible outcomes from my processing I modeled it with 
this class hierarchy:

That holds result and/or warnings.
Since Result implements Traversable it can be used in a flatMap, discarding all 
warnings and failure results, in the other hand, if we want to keep track of 
warnings, we can elaborate them and output them if we need.

Kind Regards
#A.M.



--
--
"Good judgment comes from experience.
Experience comes from bad judgment"
--





Re: Best practices to handle corrupted records

2015-10-15 Thread Roberto Congiu
I came to a similar solution to a similar problem. I deal with a lot of CSV
files from many different sources and they are often malformed.
HOwever, I just have success/failure. Maybe you should  make
SuccessWithWarnings a subclass of success, or getting rid of it altogether
making the warnings optional.
I was thinking of making this cleaning/conforming library open source if
you're interested.

R.

2015-10-15 5:28 GMT-07:00 Antonio Murgia :

> Hello,
> I looked around on the web and I couldn’t find any way to deal in a
> structured way with malformed/faulty records during computation. All I was
> able to find was the flatMap/Some/None technique + logging.
> I’m facing this problem because I have a processing algorithm that
> extracts more than one value from each record, but can fail in extracting
> one of those multiple values, and I want to keep track of them. Logging is
> not feasible because this “warning” happens so frequently that the logs
> would become overwhelming and impossibile to read.
> Since I have 3 different possible outcomes from my processing I modeled it
> with this class hierarchy:
> That holds result and/or warnings.
> Since Result implements Traversable it can be used in a flatMap,
> discarding all warnings and failure results, in the other hand, if we want
> to keep track of warnings, we can elaborate them and output them if we need.
>
> Kind Regards
> #A.M.
>



-- 
--
"Good judgment comes from experience.
Experience comes from bad judgment"
--


Re: Best practices to handle corrupted records

2015-10-15 Thread Antonio Murgia
'Either' does not cover the case where the outcome was successful but generated 
warnings. I already looked into it and also at 'Try' from which I got inspired. 
Thanks for pointing it out anyway!

#A.M.

Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN 
> ha scritto:

What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ?


On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu 
> wrote:
I came to a similar solution to a similar problem. I deal with a lot of CSV 
files from many different sources and they are often malformed.
HOwever, I just have success/failure. Maybe you should  make 
SuccessWithWarnings a subclass of success, or getting rid of it altogether 
making the warnings optional.
I was thinking of making this cleaning/conforming library open source if you're 
interested.

R.

2015-10-15 5:28 GMT-07:00 Antonio Murgia 
>:
Hello,
I looked around on the web and I couldn't find any way to deal in a structured 
way with malformed/faulty records during computation. All I was able to find 
was the flatMap/Some/None technique + logging.
I'm facing this problem because I have a processing algorithm that extracts 
more than one value from each record, but can fail in extracting one of those 
multiple values, and I want to keep track of them. Logging is not feasible 
because this "warning" happens so frequently that the logs would become 
overwhelming and impossibile to read.
Since I have 3 different possible outcomes from my processing I modeled it with 
this class hierarchy:
[cid:935118B9-A7BA-4D67-815A-B861FA866DAF]
That holds result and/or warnings.
Since Result implements Traversable it can be used in a flatMap, discarding all 
warnings and failure results, in the other hand, if we want to keep track of 
warnings, we can elaborate them and output them if we need.

Kind Regards
#A.M.



--
--
"Good judgment comes from experience.
Experience comes from bad judgment"
--



Re: Best practices to handle corrupted records

2015-10-15 Thread Erwan ALLAIN
What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ?


On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu 
wrote:

> I came to a similar solution to a similar problem. I deal with a lot of
> CSV files from many different sources and they are often malformed.
> HOwever, I just have success/failure. Maybe you should  make
> SuccessWithWarnings a subclass of success, or getting rid of it altogether
> making the warnings optional.
> I was thinking of making this cleaning/conforming library open source if
> you're interested.
>
> R.
>
> 2015-10-15 5:28 GMT-07:00 Antonio Murgia 
> :
>
>> Hello,
>> I looked around on the web and I couldn’t find any way to deal in a
>> structured way with malformed/faulty records during computation. All I was
>> able to find was the flatMap/Some/None technique + logging.
>> I’m facing this problem because I have a processing algorithm that
>> extracts more than one value from each record, but can fail in extracting
>> one of those multiple values, and I want to keep track of them. Logging is
>> not feasible because this “warning” happens so frequently that the logs
>> would become overwhelming and impossibile to read.
>> Since I have 3 different possible outcomes from my processing I modeled
>> it with this class hierarchy:
>> That holds result and/or warnings.
>> Since Result implements Traversable it can be used in a flatMap,
>> discarding all warnings and failure results, in the other hand, if we want
>> to keep track of warnings, we can elaborate them and output them if we need.
>>
>> Kind Regards
>> #A.M.
>>
>
>
>
> --
> --
> "Good judgment comes from experience.
> Experience comes from bad judgment"
> --
>