RDD Manipulation in Scala.

2014-03-04 Thread trottdw
Hello, I am using Spark with Scala and I am attempting to understand the
different filtering and mapping capabilities available.  I haven't found an
example of the specific task I would like to do.

I am trying to read in a tab spaced text file and filter specific entries. 
I would like this filter to be applied to different columns and not lines.  
I was using the following to split the data but attempts to filter by
column afterwards are not working.
-
   val data = sc.textFile(test_data.txt)
   var parsedData = data.map( _.split(\t).map(_.toString))
--

To try to give a more concrete example of my goal,
Suppose the data file is:
A1A2 A3 A4
B1B2 A3 A4
C1A2 C2 C3


How would I filter the data based on the second column to only return those
entries which have A2 in column two?  So, that the resulting RDD would just
be:

A1A2 A3 A4
C1A2 C2 C3

Is there a convenient way to do this?  Any suggestions or assistance would
be appreciated.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Manipulation-in-Scala-tp2285.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: RDD Manipulation in Scala.

2014-03-04 Thread trottdw
Thanks Sean, I think that is doing what I needed.  It was much simpler than
what I had been attempting.

Is it possible to do an OR statement filter?  So, that for example column 2
can be filtered by A2 appearances and column 3 by A4?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Manipulation-in-Scala-tp2285p2287.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.