Re: How to iterate the element of an array in DataFrame?

2016-10-24 Thread Yan Facai
scala> mblog_tags.dtypes res13: Array[(String, String)] = Array((tags,ArrayType(StructType(StructField(category,StringType,true), StructField(weight,StringType,true)),true))) scala> val testUDF = udf{ s: Seq[Tags] => s(0).weight } testUDF: org.apache.spark.sql.expressions.UserDefinedFunction =

Re: How to iterate the element of an array in DataFrame?

2016-10-24 Thread Yan Facai
Thanks, Cheng Lian. I try to use case class: scala> case class Tags (category: String, weight: String) scala> val testUDF = udf{ s: Seq[Tags] => s(0).weight } testUDF: org.apache.spark.sql.expressions.UserDefinedFunction =

Re: How to iterate the element of an array in DataFrame?

2016-10-21 Thread Cheng Lian
You may either use SQL function "array" and "named_struct" or define a case class with expected field names. Cheng On 10/21/16 2:45 AM, 颜发才(Yan Facai) wrote: My expectation is: root |-- tag: vector namely, I want to extract from: [[tagCategory_060, 0.8], [tagCategory_029, 0.7]]| to:

Re: Re: How to iterate the element of an array in DataFrame?

2016-10-21 Thread Yan Facai
My expectation is: root |-- tag: vector namely, I want to extract from: [[tagCategory_060, 0.8], [tagCategory_029, 0.7]]| to: Vectors.sparse(100, Array(60, 29), Array(0.8, 0.7)) I believe it needs two step: 1. val tag2vec = {tag: Array[Structure] => Vector} 2. mblog_tags.withColumn("vec",

Re: Re: How to iterate the element of an array in DataFrame?

2016-10-21 Thread lk_spark
how about change Schema from root |-- category.firstCategory: array (nullable = true) ||-- element: struct (containsNull = true) |||-- category: string (nullable = true) |||-- weight: string (nullable = true) to: root |-- category: string (nullable = true) |-- weight:

Re: How to iterate the element of an array in DataFrame?

2016-10-21 Thread Yan Facai
I don't know how to construct `array>`. Could anyone help me? I try to get the array by : scala> mblog_tags.map(_.getSeq[(String, String)](0)) while the result is: res40: org.apache.spark.sql.Dataset[Seq[(String, String)]] = [value:

How to iterate the element of an array in DataFrame?

2016-10-20 Thread Yan Facai
Hi, I want to extract the attribute `weight` of an array, and combine them to construct a sparse vector. ### My data is like this: scala> mblog_tags.printSchema root |-- category.firstCategory: array (nullable = true) ||-- element: struct (containsNull = true) |||-- category: