GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/20812

    [SPARK-23669] Executors fetch jars and name the jars with md5 prefix

    ## What changes were proposed in this pull request?
    
    In our cluster, there are lots of UDF jars, some of them have the same 
filename but different path, for example:
    ```
    hdfs://A/B/udf.jar  -> udfA
    hdfs://C/D/udf.jar  -> udfB
    ```
    When user uses udfA and udfB in same sql, executor will fetch both 
`hdfs://A/B/udf.jar` and `hdfs://C/D/udf.jar` to local. There will be a 
conflict for the same name. 
    
    Can we config to fetch jars and save with a filename with MD5 prefix, so 
there will be no conflict.
    
    ## How was this patch tested?
     UT 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-23669

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20812
    
----
commit 5791edb4d325f24be63485032bf01125cc2aa28b
Author: jinxing <jinxing6042@...>
Date:   2018-03-13T14:15:56Z

    [SPARK-23669] Executors fetch jars and name the jars with md5 prefix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to