I have a large of files within HDFS that I would like to do a group by statement ala
val table = sc.textFile("hdfs://....") val tabs = table.map(_.split("\t")) I'm trying to do something similar to tabs.map(c => (c._(167), c._(110), c._(200)) where I create a new RDD that only has but that isn't quite right because I'm not really manipulating sequences. BTW, I cannot use SparkSQL / case right now because my table has 200 columns (and I'm on Scala 2.10.3) Thanks! Denny