Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23021
  
    > I'd break the pandas udf one into smaller pieces too, as you suggested. 
We should also investigate why the runtime didn't improve ...
    
    One suspection from my investigation is, it requires to stop and start the 
context for each test which is costly. I also expected it's going to slightly 
decrease the time but actually it looks that slightly increased the time (I 
guess 2 ~ 3 mins in total? - shouldn't be a big deal).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to