Re: How to implement zipWithIndex as a UDF?

Benyi Wang Wed, 28 Oct 2015 21:51:42 -0700

Thanks Michael.

I should make my question more clear. This is the data type:

StructType(Seq(
   StructField("uid", LongType),
   StructField("infos", ArrayType(
      StructType(Seq(
         StructType("cid", LongType),
         StructType("cnt", LongType)
      ))
   ))
))

I want to explode “infos” to get three columns “uid”, “index” and “info”.
The only way I figured out is to explode the whole nested data type into a
tuple of primary data types like this:

df.explode("infos") { (r: Row) =>
    val arr = row.getSeq[Row](0)
    arr.zipWithIndex.map {
      case (info, idx) =>
        (idx, info.getLong(0), info.getLong(1))
    }

What I really want is to keep info as a struct type.

df.explode("infos") { (r: Row) =>
    val arr = row.getSeq[Row](0)
    arr.zipWithIndex.map {
      case (info, idx) =>
        (idx, info)
    }

Unfortunately the current DataFrame API doesn’t support it: the explode
methods try to figure out the schema for the exploded data, but could not
handle Any or Row type in reflection, and the caller has no way to pass
through a schema for the exploded data.

On Fri, Oct 23, 2015 at 12:44 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> The user facing type mapping is documented here:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#data-types
>
> On Fri, Oct 23, 2015 at 12:10 PM, Benyi Wang <bewang.t...@gmail.com>
> wrote:
>
>> If I have two columns
>>
>> StructType(Seq(
>>   StructField("id", LongType),
>>   StructField("phones", ArrayType(StringType))))
>>
>> I want to add index for “phones” before I explode it.
>>
>> Can this be implemented as GenericUDF?
>>
>> I tried DataFrame.explode. It worked for simple types like string, but I
>> could not figure out how to handle a nested type like StructType.
>>
>> Can somebody shed a light?
>>
>> I’m using spark 1.5.1.
>> 
>>
>
>

Re: How to implement zipWithIndex as a UDF?

Reply via email to