subject:"Pandas UDF for PySpark error. Big Dataset"

Re: Pandas UDF for PySpark error. Big Dataset

2018-05-29 Thread Bryan Cutler

Can you share some of the code used, or at least the pandas_udf plus the stacktrace? Also does decreasing your dataset size fix the oom? On Mon, May 28, 2018, 4:22 PM Traku traku wrote: > Hi. > > I'm trying to use the new feature but I can't use it with a big dataset > (about 5 million rows).

Pandas UDF for PySpark error. Big Dataset

2018-05-28 Thread Traku traku

Hi. I'm trying to use the new feature but I can't use it with a big dataset (about 5 million rows). I tried increasing executor memory, driver memory, partition number, but any solution can help me to solve the problem. One of the executor task increase the shufle memory until fails. Error is