Spark SQL: Merge Arrays/Sets

Pedro Rodriguez Mon, 11 Jul 2016 13:41:07 -0700

Is it possible with Spark SQL to merge columns whose types are Arrays or
Sets?


My use case would be something like this:

DF types
id: String
words: Array[String]

I would want to do something like

df.groupBy('id).agg(merge_arrays('words)) -> list of all words
df.groupBy('id).agg(merge_sets('words)) -> list of distinct words

Thanks,
-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Spark SQL: Merge Arrays/Sets

Reply via email to