[ 
https://issues.apache.org/jira/browse/SPARK-42425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699248#comment-17699248
 ] 

Arseniy Tashoyan commented on SPARK-42425:
------------------------------------------

The doc says to declare this dependency as provided, hence assumes this jar is 
bundled in the Spark distro. Either the doc is wrong or the distro is missing 
the lib.

> spark-hadoop-cloud is not provided in the default Spark distribution
> --------------------------------------------------------------------
>
>                 Key: SPARK-42425
>                 URL: https://issues.apache.org/jira/browse/SPARK-42425
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 3.3.1
>            Reporter: Arseniy Tashoyan
>            Priority: Major
>
> The library spark-hadoop-cloud is absent in the default Spark distribution 
> (as well as its dependencies like hadoop-aws). Therefore the dependency 
> management section described in [Integration with Cloud 
> Infrastructures|https://spark.apache.org/docs/3.3.1/cloud-integration.html#installation]
>  is invalid. Actually the libraries for cloud integration are not provided.
> A naive workaround would be to add the spark-hadoop-cloud library as a 
> compile-scope dependency. However, this does not work due to Spark classpath 
> hierarchy. Spark system classloader does not see classes loaded by the 
> application classloader.
> Therefore a proper fix would be to enable the hadoop-cloud build profile by 
> default: -Phadoop-cloud



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to