You can do something like:

df.collect().map {
  case Row(name: String, age1: Int, age2: Int) => ...
}

On Tue, Mar 31, 2015 at 4:05 PM, roni <roni.epi...@gmail.com> wrote:

> I have 2 paraquet files with format e.g  name , age, town
> I read them  and then join them to get  all the names which are in both
> towns  .
> the resultant dataset is
>
> res4: Array[org.apache.spark.sql.Row] = Array([name1, age1,
> town1,name2,age2,town2]....)
>
> Name 1 and name 2 are same as I am joining .
> Now , I want to get only to the format (name , age1, age2)
>
> But I cant seem to getting to manipulate the spark.sql.row.
>
> Trying something like map(_.split (",")).map(a=> (a(0),
> a(1).trim().toInt)) does not work .
>
> Can you suggest a way ?
>
> Thanks
> -R
>
>

Reply via email to