[jira] [Commented] (SPARK-42537) Remove obsolete/superfluous imports in spark-hadoop-cloud module

Steve Loughran (Jira) Thu, 23 Feb 2023 02:21:10 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-42537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692617#comment-17692617
 ]


Steve Loughran commented on SPARK-42537:
----------------------------------------

FYI +[~dannycjones].
I'm getting build issues related to compiling spark against the hadoop 3.3.5 
RC1 because sparks jackson-cbor maven download is playing up, *even though it's 
not been needed for years*

> Remove obsolete/superfluous imports in spark-hadoop-cloud module
> ----------------------------------------------------------------
>
>                 Key: SPARK-42537
>                 URL: https://issues.apache.org/jira/browse/SPARK-42537
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> The explicit imports into hadoop-cloud are obsolete
> * the hadoop-cloud-storage pom is a cut down export of the bindings to the 
> various cloud stores in their hadoop-* modules
> * it's been shipping since hadoop 2.10
> * its grown to include cos and allyun support
> * fairly well tested
> * actually cuts removed support (hadoop-openstack) when withdrawn. Hadoop 
> 3.3.5 has done this, leaving a stub jar there just to avoid breaking 
> downstream builds like spark's current setup.
> hadoop-cloud-storage *should* be all that's needed.
> I know that the spark hadoop-2 profile still references the (long unsupported 
> 2.7.x), but if you are using those releases then really you aren't going to 
> talk to cloud infra
> * no abfs connector
> * s3n connector which won't authenticate with any of the aws regions launched 
> in the past 5-8 years
> * gcs connector won't work (its java11+; hadoop 3.2.x is minimum for java11 
> clients)
> * none of the new chinese cloud services
> * s3a connector very outdated.
> * s3a connector using unshaded aws client which is unlikely to work with 
> versions of jackson, httpclient written in the last 5 years, has trouble on 
> java8 etc.
> Proposed
> * hadoop-2 profile to be the minimal hadoop-aws and hadoop-azure dependencies 
> in the code today. cutting to the empty set would be better, but a bit more 
> radical
> * hadoop-3 profile to pull in hadoop-cloud-storage (excluding aws sdk as 
> today), *and nothing else*
> This will simplify everyone's life as there are fewer dependencies to 
> reconcile. 
> see also SPARK-39969 proposing making the hadoop-aws versions of the 
> aws-sdk-bundle the normative one, as it is now newer than the spark-kinesis 
> import and more broadly tested



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42537) Remove obsolete/superfluous imports in spark-hadoop-cloud module

Reply via email to