I have a large of files within HDFS that I would like to do a group by
statement ala

val table = sc.textFile("hdfs://....")
val tabs = table.map(_.split("\t"))

I'm trying to do something similar to
tabs.map(c => (c._(167), c._(110), c._(200))

where I create a new RDD that only has
but that isn't quite right because I'm not really manipulating sequences.

BTW, I cannot use SparkSQL / case right now because my table has 200
columns (and I'm on Scala 2.10.3)

Thanks!
Denny

Reply via email to