Is it possible with Spark SQL to merge columns whose types are Arrays or
Sets?

My use case would be something like this:

DF types
id: String
words: Array[String]

I would want to do something like

df.groupBy('id).agg(merge_arrays('words)) -> list of all words
df.groupBy('id).agg(merge_sets('words)) -> list of distinct words

Thanks,
-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Reply via email to