GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/16266

    [SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in 
processes for local-cluster mode to work around the path length limitation on 
Windows

    ## What changes were proposed in this pull request?
    
    Currently, some tests are being failed and hanging on Windows due to this 
problem. For the reason in SPARK-18718, some tests using `local-cluster` mode 
were disabled on Windows due to the length limitation by paths given to 
classpaths.
    
    The limitation seems roughly 32K (see the [blog in 
MS](https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/) and 
[another 
reference](https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows))
 but in `local-cluster` mode, executors were being launched as processes with 
the command such as 
[here](https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea) in 
(only) tests.
    
    This length is roughly 40K due to the classpaths given to `java` command. 
However, it seems there are duplicates more than half. So, if we de-duplicate 
the paths, it seems reduced to roughly 20K with the command, 
[here](https://gist.github.com/HyukjinKwon/dad0c8db897e5e094684a2dc6a417790).
    
    Maybe, we should consider as some more paths are added in the future but it 
seems better than disabling all the tests for now with minimised changes.
    
    Therefore, this PR proposes to deduplicate the paths in classpaths in case 
of launching executors as processes in `local-cluster` mode.
    
    
    ## How was this patch tested?
    
    Existing tests in `ShuffleSuite` and `BroadcastJoinSuite` manually via 
AppVeyor
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark disable-local-cluster-tests

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16266
    
----
commit 41752b8c8b84552c01a78591e81cc89a25af6ec5
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2016-12-13T13:35:42Z

    Deduplicate paths in classpath to workaround length limitation on Windows

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to