[ 
https://issues.apache.org/jira/browse/SPARK-33425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33425:
------------------------------------

    Assignee:     (was: Apache Spark)

> Credentials are not passed in the `doFetchFile` when running spark-submit 
> with https url
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-33425
>                 URL: https://issues.apache.org/jira/browse/SPARK-33425
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 3.0.1
>            Reporter: Piotr Przetacznik
>            Priority: Minor
>
> I'm running spark-submit https url containing username and password. It's 
> said in the documentation - 
> [https://spark.a|https://spark.apache.org/docs/latest/submitting-applications.html]
>  
> [pache.org/docs/latest/submitting-applications.html|https://spark.apache.org/docs/latest/submitting-applications.html]
> {quote}(Note that credentials for password-protected repositories can be 
> supplied in some cases in the repository URI, such as in 
> {{https://user:password@host/...}}. Be careful when supplying credentials 
> this way.)
> {quote}
> However, when using that, I receive the following error:
>  
> {code:java}
> INFO - 20/11/11 12:59:25 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> INFO - Exception in thread "main" java.io.IOException: Server returned HTTP 
> response code: 401 for URL: 
> https://username:*****@host.com/my_app/pipeline.jar
> INFO - at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1924)
> INFO - at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
> INFO - at 
> java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
> INFO - at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:729)
> INFO - at 
> org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:138)
> INFO - at 
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
> INFO - at scala.Option.map(Option.scala:230)
> INFO - at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
> INFO - at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> INFO - at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> INFO - at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> INFO - at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> INFO - at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> INFO - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
>  
> When downloading my file manually using wget, at first I receive a 401 error 
> but then there's a retry with credentials:
> {code:java}
> HTTP request sent, awaiting response... 401 Unauthorized
> Authentication selected: Basic realm="Restricted"
> Reusing existing connection to host.com:443.
> HTTP request sent, awaiting response... 200 OK
> {code}
> When I do use ` --auth-no-challenge` in wget the credentials are passed 
> directly in the first request and I receive 200 OK. The problem with the 
> first wget is that, it tries to download a file without passing credentials 
> and after 401 it's challenged to pass credentials so it goes in two steps. 
> That is similar to my issue where credentials are not passed in the first 
> query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to