Hi, I don't get what the problem is. That map to selected columns looks like the way to go given the context. What's not working?
Kr, Gerard On Dec 14, 2014 5:17 PM, "Denny Lee" <denny.g....@gmail.com> wrote: > I have a large of files within HDFS that I would like to do a group by > statement ala > > val table = sc.textFile("hdfs://....") > val tabs = table.map(_.split("\t")) > > I'm trying to do something similar to > tabs.map(c => (c._(167), c._(110), c._(200)) > > where I create a new RDD that only has > but that isn't quite right because I'm not really manipulating sequences. > > BTW, I cannot use SparkSQL / case right now because my table has 200 > columns (and I'm on Scala 2.10.3) > > Thanks! > Denny > >