How to identify erroneous input record ?

2014-12-24 Thread Sanjay Subramanian
hey guys  One of my input records has an problem that makes the code fail. var demoRddFilter = demoRdd.filter(line = !line.contains(ISR$CASE$I_F_COD$FOLL_SEQ) || !line.contains(primaryid$caseid$caseversion)) var demoRddFilterMap = demoRddFilter.map(line = line.split('$')(0) + ~ +

Re: How to identify erroneous input record ?

2014-12-24 Thread Sanjay Subramanian
: How to identify erroneous input record ? hey guys  One of my input records has an problem that makes the code fail. var demoRddFilter = demoRdd.filter(line = !line.contains(ISR$CASE$I_F_COD$FOLL_SEQ) || !line.contains(primaryid$caseid$caseversion)) var demoRddFilterMap = demoRddFilter.map(line

Re: How to identify erroneous input record ?

2014-12-24 Thread Sean Owen
('$')(5) + ~ + line.split('$')(11) + ~ + line.split('$')(12) } }) From: Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID To: user@spark.apache.org user@spark.apache.org Sent: Wednesday, December 24, 2014 8:28 AM Subject: How to identify erroneous

Re: How to identify erroneous input record ?

2014-12-24 Thread Sanjay Subramanian
AM Subject: Re: How to identify erroneous input record ? I don't believe that works since your map function does not return a value for lines shorter than 13 tokens. You should use flatMap and Some/None. (You probably want to not parse the string 5 times too.) val demoRddFilterMap