[ https://issues.apache.org/jira/browse/SPARK-33425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230036#comment-17230036 ]
Apache Spark commented on SPARK-33425: -------------------------------------- User 'pprzetacznik' has created a pull request for this issue: https://github.com/apache/spark/pull/30337 > Credentials are not passed in the `doFetchFile` when running spark-submit > with https url > ---------------------------------------------------------------------------------------- > > Key: SPARK-33425 > URL: https://issues.apache.org/jira/browse/SPARK-33425 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 3.0.1 > Reporter: Piotr Przetacznik > Priority: Minor > > I'm running spark-submit https url containing username and password. It's > said in the documentation - > [https://spark.a|https://spark.apache.org/docs/latest/submitting-applications.html] > > [pache.org/docs/latest/submitting-applications.html|https://spark.apache.org/docs/latest/submitting-applications.html] > {quote}(Note that credentials for password-protected repositories can be > supplied in some cases in the repository URI, such as in > {{https://user:password@host/...}}. Be careful when supplying credentials > this way.) > {quote} > However, when using that, I receive the following error: > > {code:java} > INFO - 20/11/11 12:59:25 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > INFO - Exception in thread "main" java.io.IOException: Server returned HTTP > response code: 401 for URL: > https://username:*****@host.com/my_app/pipeline.jar > INFO - at > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1924) > INFO - at > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520) > INFO - at > java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250) > INFO - at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:729) > INFO - at > org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:138) > INFO - at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376) > INFO - at scala.Option.map(Option.scala:230) > INFO - at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376) > INFO - at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > INFO - at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > INFO - at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > INFO - at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > INFO - at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > INFO - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > > When downloading my file manually using wget, at first I receive a 401 error > but then there's a retry with credentials: > {code:java} > HTTP request sent, awaiting response... 401 Unauthorized > Authentication selected: Basic realm="Restricted" > Reusing existing connection to host.com:443. > HTTP request sent, awaiting response... 200 OK > {code} > When I do use ` --auth-no-challenge` in wget the credentials are passed > directly in the first request and I receive 200 OK. The problem with the > first wget is that, it tries to download a file without passing credentials > and after 401 it's challenged to pass credentials so it goes in two steps. > That is similar to my issue where credentials are not passed in the first > query. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org