Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 > I'd break the pandas udf one into smaller pieces too, as you suggested. We should also investigate why the runtime didn't improve ... One suspection from my investigation is, it requires to stop and start the context for each test which is costly. I also expected it's going to slightly decrease the time but actually it looks that slightly increased the time (I guess 2 ~ 3 mins in total? - shouldn't be a big deal).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org