Hyukjin Kwon created SPARK-28536: ------------------------------------ Summary: Reduce shuffle partitions in Python UDF tests in SQLQueryTestSuite Key: SPARK-28536 URL: https://issues.apache.org/jira/browse/SPARK-28536 Project: Spark Issue Type: Test Components: PySpark, SQL Affects Versions: 3.0.0 Reporter: Hyukjin Kwon
Currently, some SQL tests with Python UDFs takes long. In my local: {code:java} [info] SQLQueryTestSuite: [info] - udf/udf-window.sql - Scala UDF (58 seconds, 558 milliseconds) [info] - udf/udf-window.sql - Regular Python UDF (58 seconds, 371 milliseconds) [info] - udf/udf-window.sql - Scalar Pandas UDF (1 minute, 8 seconds){code} and it takes up to 9 mins in Jenkins currently. In Python UDF tests, the number of shuffle partitions matter considerably in testing time because it requires to fork and communicate between external processes. We should reduce the number of it. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org