super duper :)
On Tue, Jul 24, 2018 at 7:11 PM, Patrick McCarthy <
pmccar...@dstillery.com.invalid> wrote:
> Thanks Byran. I think it was ultimately groupings that were too large -
> after setting spark.sql.shuffle.partitions to a much higher number I was
> able to get the UDF to execute.
>
> On
Thanks Byran. I think it was ultimately groupings that were too large -
after setting spark.sql.shuffle.partitions to a much higher number I was
able to get the UDF to execute.
On Fri, Jul 20, 2018 at 12:45 AM, Bryan Cutler wrote:
> Hi Patrick,
>
> It looks like it's failing in Scala before it
Hi Patrick,
It looks like it's failing in Scala before it even gets to Python to
execute your udf, which is why it doesn't seem to matter what's in your
udf. Since you are doing a grouped map udf maybe your group sizes are too
big or skewed? Could you try to reduce the size of your groups by
PySpark 2.3.1 on YARN, Python 3.6, PyArrow 0.8.
I'm trying to run a pandas UDF, but I seem to get nonsensical exceptions in
the last stage of the job regardless of my output type.
The problem I'm trying to solve:
I have a column of scalar values, and each value on the same row has a
sorted