Hyukjin Kwon created SPARK-28536:
------------------------------------

             Summary: Reduce shuffle partitions in Python UDF tests in 
SQLQueryTestSuite
                 Key: SPARK-28536
                 URL: https://issues.apache.org/jira/browse/SPARK-28536
             Project: Spark
          Issue Type: Test
          Components: PySpark, SQL
    Affects Versions: 3.0.0
            Reporter: Hyukjin Kwon


Currently, some SQL tests with Python UDFs takes long.

In my local:


{code:java}
[info] SQLQueryTestSuite:
[info] - udf/udf-window.sql - Scala UDF (58 seconds, 558 milliseconds)
[info] - udf/udf-window.sql - Regular Python UDF (58 seconds, 371 milliseconds)
[info] - udf/udf-window.sql - Scalar Pandas UDF (1 minute, 8 seconds){code}

and it takes up to 9 mins in Jenkins currently.



In Python UDF tests, the number of shuffle partitions matter considerably in 
testing time because it requires to fork and communicate between external 
processes. We should reduce the number of it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to