Please see if this works -- aggregate array into map of element of count SELECT aggregate(array(1,2,3,4,5), map('cnt',0), (acc,x) -> map('cnt', acc.cnt+1)) as array_count
thanks Vijay On 2023/05/05 19:32:04 Yong Zhang wrote: > Hi, This is on Spark 3.1 environment. > > For some reason, I can ONLY do this in Spark SQL, instead of either Scala or > PySpark environment. > > I want to aggregate an array into a Map of element count, within that array, > but in Spark SQL. > I know that there is an aggregate function available like > > aggregate(expr, start, merge [, finish]) > > But I want to know if this can be done in the Spark SQL only, and: > > * How to represent an empty Map as "start" element above > * How to merge each element (as String type) into Map (as adding count if > exist in the Map, or add as (element -> 1) as new entry in the Map if not > exist) > > Like the following example -> > https://docs.databricks.com/sql/language-manual/functions/aggregate.html > > SELECT aggregate(array(1, 2, 3, 4), > named_struct('sum', 0, 'cnt', 0), > (acc, x) -> named_struct('sum', acc.sum + x, 'cnt', > acc.cnt + 1), > acc -> acc.sum / acc.cnt) AS avg > > I wonder: > select > aggregate( > array('a','b','a')), > map('', 0), > (acc, x) -> ??? > acc -> acc) as output > > How to do the logic after "(acc, x) -> ", so I can output a map of count of > each element in the array? > I know I can "explode", then groupby + count, but since I have multi array > columns need to transform, so I want to do more a high order function way, > and in pure Spark SQL. > > Thanks > >