[ https://issues.apache.org/jira/browse/SPARK-25344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605250#comment-16605250 ]
Imran Rashid commented on SPARK-25344: -------------------------------------- kinda related, maybe this should get its own jira -- when you run the "pyspark-sql" tests, it also somehow runs {{SparkSubmitTests}}, which really should only be in the "pyspark-core" module. For me they take 80s, would be nice to eliminate that. I don't really understand why they get run in that module, but it does seem if I comment out the import in sql/tests.py, then they don't get run that extra time. We can't really do that, as the import is needed for the {{HiveSparkSubmitTests}}. But we should figure out why just importing it makes them run, and if we can do avoid that. > Break large tests.py files into smaller files > --------------------------------------------- > > Key: SPARK-25344 > URL: https://issues.apache.org/jira/browse/SPARK-25344 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Imran Rashid > Priority: Major > Labels: newbie > > We've got a ton of tests in one humongous tests.py file, rather than breaking > it out into smaller files. > Having one huge file doesn't seem great for code organization, and it also > makes the test parallelization in run-tests.py not work as well. On my > laptop, tests.py takes 150s, and the next longest test file takes only 20s. > There are similarly large files in other pyspark modules, eg. sql/tests.py, > ml/tests.py, mllib/tests.py, streaming/tests.py. > It seems that at least for some of these files, its already broken into > independent test classes, so it shouldn't be too hard to just move them into > their own files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org