scala> mblog_tags.dtypes
res13: Array[(String, String)] =
Array((tags,ArrayType(StructType(StructField(category,StringType,true),
StructField(weight,StringType,true)),true)))
scala> val testUDF = udf{ s: Seq[Tags] => s(0).weight }
testUDF: org.apache.spark.sql.expressions.UserDefinedFunction =
Thanks, Cheng Lian.
I try to use case class:
scala> case class Tags (category: String, weight: String)
scala> val testUDF = udf{ s: Seq[Tags] => s(0).weight }
testUDF: org.apache.spark.sql.expressions.UserDefinedFunction =
You may either use SQL function "array" and "named_struct" or define a
case class with expected field names.
Cheng
On 10/21/16 2:45 AM, 颜发才(Yan Facai) wrote:
My expectation is:
root
|-- tag: vector
namely, I want to extract from:
[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]|
to:
My expectation is:
root
|-- tag: vector
namely, I want to extract from:
[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]|
to:
Vectors.sparse(100, Array(60, 29), Array(0.8, 0.7))
I believe it needs two step:
1. val tag2vec = {tag: Array[Structure] => Vector}
2. mblog_tags.withColumn("vec",
how about change Schema from
root
|-- category.firstCategory: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- category: string (nullable = true)
|||-- weight: string (nullable = true)
to:
root
|-- category: string (nullable = true)
|-- weight:
I don't know how to construct
`array>`.
Could anyone help me?
I try to get the array by :
scala> mblog_tags.map(_.getSeq[(String, String)](0))
while the result is:
res40: org.apache.spark.sql.Dataset[Seq[(String, String)]] = [value:
Hi, I want to extract the attribute `weight` of an array, and combine them
to construct a sparse vector.
### My data is like this:
scala> mblog_tags.printSchema
root
|-- category.firstCategory: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- category: