What version of Spark are you using? We introduced a Spark native collect_list in 2.0.
It still has the usual caveats, but it should quite a bit faster. On Tue, Oct 25, 2016 at 6:16 AM, Matt Smith <matt.smith...@gmail.com> wrote: > Is there an alternative function or design pattern for the collect_list > UDAF that can used without taking a dependency on HiveContext? How does > one typically roll things up into an array when outputting JSON? >