Manos Tsagkias created SPARK-32134:
--------------------------------------

             Summary: YARN: archives rename with # doesn't work for https
                 Key: SPARK-32134
                 URL: https://issues.apache.org/jira/browse/SPARK-32134
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 2.3.0
            Reporter: Manos Tsagkias


This is related to SPARK-10858

The YARN distributed cache feature with --archives where you can rename the 
archive using a # symbol does not work with the http(s) scheme:


{{--archives 
http://mirror.sfo12.us.leaseweb.net/centos/6.10/isos/i386/sha1sum.txt#sha1sum}}

This is because URLs can have fragments and therefore the # is interpreted as 
part of the fragment. We could use a similar trick as we do for the other two 
schemes file:// and hdfs:// in which first we remove the last fragment, parse 
the URL, and then reattach the fragment. The [code 
exists|https://github.com/apache/spark/pull/9035/files] but it is not applied 
to URLs.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to