[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-04-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r178823511
  
--- Diff: pom.xml ---
@@ -2671,6 +2671,15 @@
   
 
 
+
+  hadoop-3
+  
+3.1.0-SNAPSHOT
--- End diff --

RC0 is up for testing right now! @leftnoteasy is managing the release


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-30 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r178251635
  
--- Diff: pom.xml ---
@@ -2671,6 +2671,15 @@
   
 
 
+
+  hadoop-3
+  
+3.1.0-SNAPSHOT
--- End diff --

Hey @steveloughran what is the possible release date for Hadoop 3.1.0?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r178195279
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -141,13 +93,98 @@
   httpcore
   ${hadoop.deps.scope}
 
+
   
 
   
 
+
+
+  hadoop-2.6
+  
+true
--- End diff --

I think that's ok as an initial step. It would be better if you could, in 
profiles, customize independent dependencies (e.g. in the hadoop-3 profile 
exclude some transitive deps), but I'm not sure whether maven would complain 
about something like that.

`jackson-dataformat-cbor` can become interesting if Spark decides to 
upgrade jackson, since the github for that project says it's been removed in 
2.8.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r178072258
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188 @@
 
   
 
+
+
+  org.apache.hadoop
+  hadoop-aws
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  org.codehaus.jackson
+  jackson-mapper-asl
+
+
+  org.codehaus.jackson
+  jackson-core-asl
+
+
+  com.fasterxml.jackson.core
+  jackson-core
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+
+  
+
+
+  org.apache.hadoop
+  hadoop-openstack
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  junit
+  junit
+
+
+  org.mockito
+  mockito-all
+
+  
+
+
+
+
+  joda-time
+  joda-time
+  ${hadoop.deps.scope}
+
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.dataformat
+  jackson-dataformat-cbor
+  ${fasterxml.jackson.version}
+
+
+
+  org.apache.httpcomponents
+  httpclient
+  ${hadoop.deps.scope}
+
+
+
+  org.apache.httpcomponents
+  httpcore
+  ${hadoop.deps.scope}
+
+  
+
+
+
+
+  hadoop-3
+  
+src/hadoop-3/main/scala
+
src/hadoop-3/test/scala
+  
+
+  
+
+  
+  
+org.codehaus.mojo
+build-helper-maven-plugin
+
+  
+add-scala-sources
+generate-sources
+
+  add-source
+
+
+  
+${extra.source.dir}
+  
+
+  
+  
+add-scala-test-sources
+generate-test-sources
+
+  add-test-source
+
+
+  
+${extra.testsource.dir}
+  
+
+  
+
+  
+
+
+  
+  
+
+

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r178060744
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -141,13 +93,98 @@
   httpcore
   ${hadoop.deps.scope}
 
+
   
 
   
 
+
+
+  hadoop-2.6
+  
+true
--- End diff --

Hmmm. There's another option which is to leave all those in the standard 
list, and you get a few extra dependencies which aren't needed for the 3.x line:

```
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1:compile   
  *
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.6.7:compile  
  *
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7:compile  
  *
[INFO] +- 
com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.6.7:compile  *
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.4:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.2:compile
[INFO] |  \- commons-codec:commons-codec:jar:1.10:compile
[INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.8:compile
[INFO] +- org.apache.hadoop:hadoop-aws:jar:3.0.2-SNAPSHOT:compile
[INFO] |  \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.271:compile
[INFO] +- org.apache.hadoop:hadoop-openstack:jar:3.0.2-SNAPSHOT:compile
[INFO] +- joda-time:joda-time:jar:2.9.3:compile 
  *
[INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.0.2-SNAPSHOT:compile
[INFO] |  +- org.apache.hadoop:hadoop-aliyun:jar:3.0.2-SNAPSHOT:compile
[INFO] |  |  \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile
[INFO] |  | \- org.jdom:jdom:jar:1.1:compile
[INFO] |  +- org.apache.hadoop:hadoop-azure:jar:3.0.2-SNAPSHOT:compile
[INFO] |  |  +- com.microsoft.azure:azure-storage:jar:5.4.0:compile
[INFO] |  |  |  \- com.microsoft.azure:azure-keyvault-core:jar:0.8.0:compile
[INFO] |  |  \- 
org.eclipse.jetty:jetty-util-ajax:jar:9.3.19.v20170502:compile
[INFO] |  \- 
org.apache.hadoop:hadoop-azure-datalake:jar:3.0.2-SNAPSHOT:compile
[INFO] | \- 
com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.5:compile
```

the `jackson-dataformat-cbor` is the funny one; This is the sole 
declaration within spark. With the shaded aws JAR then it's not needed at all.
The rest all make their way to the spark assembly through other routes.

What do you think? Leave them as the default and not worry about it? It 
would remove the duplication in the 2.7 profile, and apart from the 
extraneousness on hadoop-3 builds, harmless.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r178057319
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188 @@
 
   
 
+
+
+  org.apache.hadoop
+  hadoop-aws
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  org.codehaus.jackson
+  jackson-mapper-asl
+
+
+  org.codehaus.jackson
+  jackson-core-asl
+
+
+  com.fasterxml.jackson.core
+  jackson-core
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+
+  
+
+
+  org.apache.hadoop
+  hadoop-openstack
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  junit
+  junit
+
+
+  org.mockito
+  mockito-all
+
+  
+
+
+
+
+  joda-time
+  joda-time
+  ${hadoop.deps.scope}
+
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.dataformat
+  jackson-dataformat-cbor
+  ${fasterxml.jackson.version}
+
+
+
+  org.apache.httpcomponents
+  httpclient
+  ${hadoop.deps.scope}
+
+
+
+  org.apache.httpcomponents
+  httpcore
+  ${hadoop.deps.scope}
+
+  
+
+
+
+
+  hadoop-3
+  
+src/hadoop-3/main/scala
+
src/hadoop-3/test/scala
+  
+
+  
+
+  
+  
+org.codehaus.mojo
+build-helper-maven-plugin
+
+  
+add-scala-sources
+generate-sources
+
+  add-source
+
+
+  
+${extra.source.dir}
+  
+
+  
+  
+add-scala-test-sources
+generate-test-sources
+
+  add-test-source
+
+
+  
+${extra.testsource.dir}
+  
+
+  
+
+  
+
+
+  
+  
+
+

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r178054506
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188 @@
 
   
 
+
+
+  org.apache.hadoop
+  hadoop-aws
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  org.codehaus.jackson
+  jackson-mapper-asl
+
+
+  org.codehaus.jackson
+  jackson-core-asl
+
+
+  com.fasterxml.jackson.core
+  jackson-core
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+
+  
+
+
+  org.apache.hadoop
+  hadoop-openstack
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  junit
+  junit
+
+
+  org.mockito
+  mockito-all
+
+  
+
+
+
+
+  joda-time
+  joda-time
+  ${hadoop.deps.scope}
+
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.dataformat
+  jackson-dataformat-cbor
+  ${fasterxml.jackson.version}
+
+
+
+  org.apache.httpcomponents
+  httpclient
+  ${hadoop.deps.scope}
+
+
+
+  org.apache.httpcomponents
+  httpcore
+  ${hadoop.deps.scope}
+
+  
+
+
+
+
+  hadoop-3
+  
+src/hadoop-3/main/scala
+
src/hadoop-3/test/scala
+  
+
+  
+
+  
+  
+org.codehaus.mojo
+build-helper-maven-plugin
+
+  
+add-scala-sources
+generate-sources
+
+  add-source
+
+
+  
+${extra.source.dir}
+  
+
+  
+  
+add-scala-test-sources
+generate-test-sources
+
+  add-test-source
+
+
+  
+${extra.testsource.dir}
+  
+
+  
+
+  
+
+
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r178054451
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188 @@
 
   
 
+
+
+  org.apache.hadoop
+  hadoop-aws
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  org.codehaus.jackson
+  jackson-mapper-asl
+
+
+  org.codehaus.jackson
+  jackson-core-asl
+
+
+  com.fasterxml.jackson.core
+  jackson-core
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+
+  
+
+
+  org.apache.hadoop
+  hadoop-openstack
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  junit
+  junit
+
+
+  org.mockito
+  mockito-all
+
+  
+
+
+
+
+  joda-time
+  joda-time
+  ${hadoop.deps.scope}
+
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.dataformat
+  jackson-dataformat-cbor
+  ${fasterxml.jackson.version}
+
+
+
+  org.apache.httpcomponents
+  httpclient
+  ${hadoop.deps.scope}
+
+
+
+  org.apache.httpcomponents
+  httpcore
+  ${hadoop.deps.scope}
+
+  
+
+
+
+
+  hadoop-3
+  
+src/hadoop-3/main/scala
+
src/hadoop-3/test/scala
+  
+
+  
+
+  
--- End diff --

my bad. Cut and paste error. Will make explicit what it's really doing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r177853335
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188 @@
 
   
 
+
+
+  org.apache.hadoop
+  hadoop-aws
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  org.codehaus.jackson
+  jackson-mapper-asl
+
+
+  org.codehaus.jackson
+  jackson-core-asl
+
+
+  com.fasterxml.jackson.core
+  jackson-core
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+
+  
+
+
+  org.apache.hadoop
+  hadoop-openstack
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  junit
+  junit
+
+
+  org.mockito
+  mockito-all
+
+  
+
+
+
+
+  joda-time
+  joda-time
+  ${hadoop.deps.scope}
+
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.dataformat
+  jackson-dataformat-cbor
+  ${fasterxml.jackson.version}
+
+
+
+  org.apache.httpcomponents
+  httpclient
+  ${hadoop.deps.scope}
+
+
+
+  org.apache.httpcomponents
+  httpcore
+  ${hadoop.deps.scope}
+
+  
+
+
+
+
+  hadoop-3
+  
+src/hadoop-3/main/scala
+
src/hadoop-3/test/scala
+  
+
+  
+
+  
+  
+org.codehaus.mojo
+build-helper-maven-plugin
+
+  
+add-scala-sources
+generate-sources
+
+  add-source
+
+
+  
+${extra.source.dir}
+  
+
+  
+  
+add-scala-test-sources
+generate-test-sources
+
+  add-test-source
+
+
+  
+${extra.testsource.dir}
+  
+
+  
+
+  
+
+
+  
+  
+
+

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r177852191
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188 @@
 
   
 
+
+
+  org.apache.hadoop
+  hadoop-aws
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  org.codehaus.jackson
+  jackson-mapper-asl
+
+
+  org.codehaus.jackson
+  jackson-core-asl
+
+
+  com.fasterxml.jackson.core
+  jackson-core
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+
+  
+
+
+  org.apache.hadoop
+  hadoop-openstack
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  junit
+  junit
+
+
+  org.mockito
+  mockito-all
+
+  
+
+
+
+
+  joda-time
+  joda-time
+  ${hadoop.deps.scope}
+
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.dataformat
+  jackson-dataformat-cbor
+  ${fasterxml.jackson.version}
+
+
+
+  org.apache.httpcomponents
+  httpclient
+  ${hadoop.deps.scope}
+
+
+
+  org.apache.httpcomponents
+  httpcore
+  ${hadoop.deps.scope}
+
+  
+
+
+
+
+  hadoop-3
+  
+src/hadoop-3/main/scala
+
src/hadoop-3/test/scala
+  
+
+  
+
+  
+  
+org.codehaus.mojo
+build-helper-maven-plugin
+
+  
+add-scala-sources
+generate-sources
+
+  add-source
+
+
+  
+${extra.source.dir}
+  
+
+  
+  
+add-scala-test-sources
+generate-test-sources
+
+  add-test-source
+
+
+  
+${extra.testsource.dir}
+  
+
+  
+
+  
+
+
--- End diff --

nit: remove


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r177852057
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188 @@
 
   
 
+
+
+  org.apache.hadoop
+  hadoop-aws
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  org.codehaus.jackson
+  jackson-mapper-asl
+
+
+  org.codehaus.jackson
+  jackson-core-asl
+
+
+  com.fasterxml.jackson.core
+  jackson-core
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+
+  
+
+
+  org.apache.hadoop
+  hadoop-openstack
+  ${hadoop.version}
+  ${hadoop.deps.scope}
+  
+
+  org.apache.hadoop
+  hadoop-common
+
+
+  commons-logging
+  commons-logging
+
+
+  junit
+  junit
+
+
+  org.mockito
+  mockito-all
+
+  
+
+
+
+
+  joda-time
+  joda-time
+  ${hadoop.deps.scope}
+
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+  ${hadoop.deps.scope}
+
+
+  com.fasterxml.jackson.dataformat
+  jackson-dataformat-cbor
+  ${fasterxml.jackson.version}
+
+
+
+  org.apache.httpcomponents
+  httpclient
+  ${hadoop.deps.scope}
+
+
+
+  org.apache.httpcomponents
+  httpcore
+  ${hadoop.deps.scope}
+
+  
+
+
+
+
+  hadoop-3
+  
+src/hadoop-3/main/scala
+
src/hadoop-3/test/scala
+  
+
+  
+
+  
--- End diff --

Not really based on the Scala version right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20923#discussion_r177854961
  
--- Diff: hadoop-cloud/pom.xml ---
@@ -141,13 +93,98 @@
   httpcore
   ${hadoop.deps.scope}
 
+
   
 
   
 
+
+
+  hadoop-2.6
+  
+true
--- End diff --

`activeByDefault` is a little misleading. It only enables the profile if 
you don't explicitly activate any other profiles. 

So if you enable any other profile in the build, this won't be enabled 
automatically. And since the cloud module itself is already under a profile, I 
don't think you can ever trigger this.

Probably will need to be documented in the build docs, or maybe you can 
think of a different solution like enabling the cloud profile via a property 
instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-28 Thread steveloughran
GitHub user steveloughran opened a pull request:

https://github.com/apache/spark/pull/20923

[SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with relevant POM fix ups, 
cloud-storage artifacts and binding

## What changes were proposed in this pull request?

1. Adds a `hadoop-3` profile build depending on the hadoop-3.1 artifacts. 
It's tagged as WiP because Hadoop-3.1 isn't out the door yet; it's depending on 
the hadoop 3.1-SNAPSHOT.
1. In the hadoop-cloud module, adds an explicit hadoop-3 profile which 
switches from explicitly pulling in cloud connectors (hadoop-openstack, 
hadoop-aws, hadoop-azure) to depending on the hadoop-cloudstorage POM artifact, 
which pulls these in, has pre-excluded things like hadoop-common, and stays up 
to date with new connectors (hadoop-azuredatalake, hadoop-allyun). Goal: it 
becomes the Hadoop projects homework of keeping this clean, and the spark 
project doesn't need to handle new hadoop releases adding more dependencies.
 and lines up spark for switching to a shaded hadoop-cloud-storage bundle 
when implemented.
1. In the hadoop-cloud module, adds new source and tests for connecting to 
the `PathOutputCommitter` factory mechanism of Hadoop 3.1.
1. Increases the curator and zookeeper versions to match those in hadoop-3, 
fixing spark core to build in sbt with the hadoop-3 dependencies.

Why 3.1-SNAPSHOT over 3.0.1?

* 3.0.0 has to be viewed as an early relase of the code; 3.1 should be the 
stable one.
* The committer changes are only in the forthcoming 3.1.0 and 3.0.2 
releases.
* The cloud-storage dependencies are still unstable in the 3.0.x line (too 
many transitive dependencies, omitted hadoop-allyun). The hadoop-3 profile does 
exclude the transitive cruft, for anyone who does want to use branch-3.0 builds.

Hadoop 3.1 should be viewed as the version where Hadoop 3.x is really ready 
to play.

## How was this patch tested?

* There's some minimal unit tests of the new source in the hadoop-cloud 
module when built with the hadoop-3 connector; 
* Everything this has been built and tested against both ASF Hadoop 
branch-3.1 and hadoop trunk.

The spark hive JAR has problems here, as it's version check logic fails for 
Hadoop versions > 2.

This can be avoided with either of

* The hadoop JARs built to declare their version as Hadoop 2.11  `mvn 
install -DskipTests -DskipShade -Ddeclared.hadoop.version=2.11` . This is safe 
for local test runs, not for deployment (HDFS is very strict about 
cross-version deployment).
* A modified version of spark hive whose version check switch statement is 
happy with hadoop 3.

I've done both, with maven and SBT. 

Two issues surfaced

1. A spark-core test failure —fixed in SPARK-23787. 
1. SBT only: Zookeeper not being found in spark-core. Somehow curator 
2.12.0 triggers some slightly different dependency resolution logic from 
previous versions, and Ivy was missing zookeeper.jar entirely. This patch adds 
the explicit declaration for all spark profiles, setting the ZK version = 3.4.9 
for hadoop-3

The integration tests against real infrastructures live [on 
github](https://github.com/hortonworks-spark/cloud-integration/tree/master/cloud-examples).
 These verify that s3, azure wasb, azure-datalake and openstack swift stores 
can be used as the source and destination of work.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/spark 
cloud/SPARK-23807-hadoop-31

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20923


commit 29e73242cba9797ed24127b24bb0380c69a608d3
Author: Steve Loughran 
Date:   2018-03-28T17:38:57Z

SPARK-23807 Add Hadoop 3 profile with relevant POM fix ups, cloud-storage 
artifacts and binding

Change-Id: Ia4526f184ced9eef5b67aee9e91eced0dd38d723




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org