[ https://issues.apache.org/jira/browse/SPARK-42537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692617#comment-17692617 ]
Steve Loughran commented on SPARK-42537: ---------------------------------------- FYI +[~dannycjones]. I'm getting build issues related to compiling spark against the hadoop 3.3.5 RC1 because sparks jackson-cbor maven download is playing up, *even though it's not been needed for years* > Remove obsolete/superfluous imports in spark-hadoop-cloud module > ---------------------------------------------------------------- > > Key: SPARK-42537 > URL: https://issues.apache.org/jira/browse/SPARK-42537 > Project: Spark > Issue Type: Improvement > Components: Build > Affects Versions: 3.4.0 > Reporter: Steve Loughran > Priority: Minor > > The explicit imports into hadoop-cloud are obsolete > * the hadoop-cloud-storage pom is a cut down export of the bindings to the > various cloud stores in their hadoop-* modules > * it's been shipping since hadoop 2.10 > * its grown to include cos and allyun support > * fairly well tested > * actually cuts removed support (hadoop-openstack) when withdrawn. Hadoop > 3.3.5 has done this, leaving a stub jar there just to avoid breaking > downstream builds like spark's current setup. > hadoop-cloud-storage *should* be all that's needed. > I know that the spark hadoop-2 profile still references the (long unsupported > 2.7.x), but if you are using those releases then really you aren't going to > talk to cloud infra > * no abfs connector > * s3n connector which won't authenticate with any of the aws regions launched > in the past 5-8 years > * gcs connector won't work (its java11+; hadoop 3.2.x is minimum for java11 > clients) > * none of the new chinese cloud services > * s3a connector very outdated. > * s3a connector using unshaded aws client which is unlikely to work with > versions of jackson, httpclient written in the last 5 years, has trouble on > java8 etc. > Proposed > * hadoop-2 profile to be the minimal hadoop-aws and hadoop-azure dependencies > in the code today. cutting to the empty set would be better, but a bit more > radical > * hadoop-3 profile to pull in hadoop-cloud-storage (excluding aws sdk as > today), *and nothing else* > This will simplify everyone's life as there are fewer dependencies to > reconcile. > see also SPARK-39969 proposing making the hadoop-aws versions of the > aws-sdk-bundle the normative one, as it is now newer than the spark-kinesis > import and more broadly tested -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org