Re: Spark Aggregator for array of doubles

Anton Okolnychyi Wed, 04 Jan 2017 14:34:33 -0800

Hi,

take a look at this pull request that is not merged yet:
https://github.com/apache/spark/pull/16329 . It contains examples in Java
and Scala that can be helpful.


Best regards,
Anton Okolnychyi

On Jan 4, 2017 23:23, "Anil Langote" <anillangote0...@gmail.com> wrote:

> Hi All,
>
> I have been working on a use case where I have a DF which has 25 columns,
> 24 columns are of type string and last column is array of doubles. For a
> given set of columns I have to apply group by and add the array of doubles,
> I have implemented UDAF which works fine but it's expensive in order to
> tune the solution I came across Aggregators which can be implemented and
> used with agg function, my question is how can we implement a aggregator
> which takes array of doubles as input and returns the array of double.
>
> I learned that it's not possible to implement the aggregator in Java can
> be done in scala only how can define the aggregator which takes array of
> doubles as input, note that I have parquet file as my input.
>
> Any pointers are highly appreciated, I read that spark UDAF is slow and
> aggregators are the way to go.
>
> Best Regards,
>
> Anil Langote
>
> +1-425-633-9747
>

Re: Spark Aggregator for array of doubles

Reply via email to