GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/19934

    [SPARK-3685][CORE] Prints explicit warnings when configured local 
directories are set to URIs

    ## What changes were proposed in this pull request?
    
    This PR proposes to print warnings before creating local by `java.io.File`. 
    
    I think we can't just simply disallow and throw an exception for such cases 
of `hdfs:/tmp/foo` case because it might break compatibility. Note that 
`hdfs:/tmp/foo` creates a directory called `hdfs:/`.
    
    There were many discussion here about whether we should support this in 
other file systems or now; however, since the JIRA targets "Spark's local dir 
should accept only local paths", here, I tried to simply print warnings. 
    
    I think we could open another JIRA and design doc if this is something we 
should support, separately.
    
    **Before**
    
    ```
    ./bin/spark-shell --conf spark.local.dir=file:/a/b/c
    ```
    
    This creates a local directory as below:
    
    ```
     file:/
    └── a
        └── b
            └── c
            ...
    ```
    
    **After**
    
    ```bash
    ./bin/spark-shell --conf spark.local.dir=file:/a/b/c
    ```
    
    Now, it prints a warning as below:
    
    ```
    ...
    17/12/09 21:58:49 WARN Utils: The configured local directories are not 
expected to be URIs; however, got suspicious values [file:/a/b/c]. Please check 
your configured local directories.
    ...
    ```
    
    ```bash
    ./bin/spark-shell --conf spark.local.dir=file:/a/b/c,/tmp/a/b/c,hdfs:/a/b/c
    ```
    
    It also works with comma-separated ones:
    
    ```
    ...
    17/12/09 22:05:01 WARN Utils: The configured local directories are not 
expected to be URIs; however, got suspicious values [file:/a/b/c, hdfs:/a/b/c]. 
Please check your configured local directories.
    ...
     ```
    
    
    ## How was this patch tested?
    
     Manually tested:
    
     ```
     ./bin/spark-shell --conf spark.local.dir=C:\\a\\b\\c
     ./bin/spark-shell --conf spark.local.dir=/tmp/a/b/c
     ./bin/spark-shell --conf spark.local.dir=a/b/c
     ./bin/spark-shell --conf spark.local.dir=a/b/c,/tmp/a/b/c,C:\\a\\b\\c
     ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-3685

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19934.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19934
    
----
commit 0db4bf1c1b447ce39f790d7c81fc3bb2619e156a
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-12-09T13:10:00Z

    Prints explicit warnings when configured local directories are set to URIs

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to