subject:"Arrow type issue with Pandas UDF"

Re: Arrow type issue with Pandas UDF

2018-07-24 Thread Gourav Sengupta

super duper :) On Tue, Jul 24, 2018 at 7:11 PM, Patrick McCarthy < pmccar...@dstillery.com.invalid> wrote: > Thanks Byran. I think it was ultimately groupings that were too large - > after setting spark.sql.shuffle.partitions to a much higher number I was > able to get the UDF to execute. > > On

Re: Arrow type issue with Pandas UDF

2018-07-24 Thread Patrick McCarthy

Thanks Byran. I think it was ultimately groupings that were too large - after setting spark.sql.shuffle.partitions to a much higher number I was able to get the UDF to execute. On Fri, Jul 20, 2018 at 12:45 AM, Bryan Cutler wrote: > Hi Patrick, > > It looks like it's failing in Scala before it

Re: Arrow type issue with Pandas UDF

2018-07-19 Thread Bryan Cutler

Hi Patrick, It looks like it's failing in Scala before it even gets to Python to execute your udf, which is why it doesn't seem to matter what's in your udf. Since you are doing a grouped map udf maybe your group sizes are too big or skewed? Could you try to reduce the size of your groups by

Arrow type issue with Pandas UDF

2018-07-19 Thread Patrick McCarthy

PySpark 2.3.1 on YARN, Python 3.6, PyArrow 0.8. I'm trying to run a pandas UDF, but I seem to get nonsensical exceptions in the last stage of the job regardless of my output type. The problem I'm trying to solve: I have a column of scalar values, and each value on the same row has a sorted

Re: Arrow type issue with Pandas UDF

Re: Arrow type issue with Pandas UDF

Re: Arrow type issue with Pandas UDF

Arrow type issue with Pandas UDF

4 matches

Site Navigation

Mail list logo

Footer information