Re: Limit the # of columns in Spark Scala

Gerard Maas Sun, 14 Dec 2014 08:57:07 -0800

Hi,

I don't get what the problem is. That map to selected columns looks like
the way to go given the context. What's not working?


Kr, Gerard
On Dec 14, 2014 5:17 PM, "Denny Lee" <denny.g....@gmail.com> wrote:

> I have a large of files within HDFS that I would like to do a group by
> statement ala
>
> val table = sc.textFile("hdfs://....")
> val tabs = table.map(_.split("\t"))
>
> I'm trying to do something similar to
> tabs.map(c => (c._(167), c._(110), c._(200))
>
> where I create a new RDD that only has
> but that isn't quite right because I'm not really manipulating sequences.
>
> BTW, I cannot use SparkSQL / case right now because my table has 200
> columns (and I'm on Scala 2.10.3)
>
> Thanks!
> Denny
>
>

Re: Limit the # of columns in Spark Scala

Reply via email to