hey guys
One of my input records has an problem that makes the code fail.
var demoRddFilter = demoRdd.filter(line =
!line.contains(ISR$CASE$I_F_COD$FOLL_SEQ) ||
!line.contains(primaryid$caseid$caseversion))
var demoRddFilterMap = demoRddFilter.map(line = line.split('$')(0) + ~ +
: How to identify erroneous input record ?
hey guys
One of my input records has an problem that makes the code fail.
var demoRddFilter = demoRdd.filter(line =
!line.contains(ISR$CASE$I_F_COD$FOLL_SEQ) ||
!line.contains(primaryid$caseid$caseversion))
var demoRddFilterMap = demoRddFilter.map(line
('$')(5) + ~ +
line.split('$')(11) + ~ + line.split('$')(12)
}
})
From: Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID
To: user@spark.apache.org user@spark.apache.org
Sent: Wednesday, December 24, 2014 8:28 AM
Subject: How to identify erroneous
AM
Subject: Re: How to identify erroneous input record ?
I don't believe that works since your map function does not return a
value for lines shorter than 13 tokens. You should use flatMap and
Some/None. (You probably want to not parse the string 5 times too.)
val demoRddFilterMap