Steve Loughran created SPARK-42537: -------------------------------------- Summary: Remove obsolete/superfluous imports in spark-hadoop-cloud module Key: SPARK-42537 URL: https://issues.apache.org/jira/browse/SPARK-42537 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Steve Loughran
The explicit imports into hadoop-cloud are obsolete * the hadoop-cloud-storage pom is a cut down export of the bindings to the various cloud stores in their hadoop-* modules * it's been shipping since hadoop 2.10 * its grown to include cos and allyun support * fairly well tested * actually cuts removed support (hadoop-openstack) when withdrawn. Hadoop 3.3.5 has done this, leaving a stub jar there just to avoid breaking downstream builds like spark's current setup. hadoop-cloud-storage *should* be all that's needed. I know that the spark hadoop-2 profile still references the (long unsupported 2.7.x), but if you are using those releases then really you aren't going to talk to cloud infra * no abfs connector * s3n connector which won't authenticate with any of the aws regions launched in the past 5-8 years * gcs connector won't work (its java11+; hadoop 3.2.x is minimum for java11 clients) * none of the new chinese cloud services * s3a connector very outdated. * s3a connector using unshaded aws client which is unlikely to work with versions of jackson, httpclient written in the last 5 years, has trouble on java8 etc. Proposed * hadoop-2 profile to be the minimal hadoop-aws and hadoop-azure dependencies in the code today. cutting to the empty set would be better, but a bit more radical * hadoop-3 profile to pull in hadoop-cloud-storage (excluding aws sdk as today), *and nothing else* This will simplify everyone's life as there are fewer dependencies to reconcile. see also SPARK-39969 proposing making the hadoop-aws versions of the aws-sdk-bundle the normative one, as it is now newer than the spark-kinesis import and more broadly tested -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org