GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16266
[SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in processes for local-cluster mode to work around the path length limitation on Windows ## What changes were proposed in this pull request? Currently, some tests are being failed and hanging on Windows due to this problem. For the reason in SPARK-18718, some tests using `local-cluster` mode were disabled on Windows due to the length limitation by paths given to classpaths. The limitation seems roughly 32K (see the [blog in MS](https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/) and [another reference](https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)) but in `local-cluster` mode, executors were being launched as processes with the command such as [here](https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea) in (only) tests. This length is roughly 40K due to the classpaths given to `java` command. However, it seems there are duplicates more than half. So, if we de-duplicate the paths, it seems reduced to roughly 20K with the command, [here](https://gist.github.com/HyukjinKwon/dad0c8db897e5e094684a2dc6a417790). Maybe, we should consider as some more paths are added in the future but it seems better than disabling all the tests for now with minimised changes. Therefore, this PR proposes to deduplicate the paths in classpaths in case of launching executors as processes in `local-cluster` mode. ## How was this patch tested? Existing tests in `ShuffleSuite` and `BroadcastJoinSuite` manually via AppVeyor You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark disable-local-cluster-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16266.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16266 ---- commit 41752b8c8b84552c01a78591e81cc89a25af6ec5 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-12-13T13:35:42Z Deduplicate paths in classpath to workaround length limitation on Windows ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org