Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-22 Thread Steve Loughran
hadoop is still on 1.7.7 branch. A move to 1.9 would probably be as painful as a move to 1.8.x, so submit a patch for hadoop trunk. Last PR there wasn't quite ready and I didn't get any follow up to the "what is this going to break" question https://issues.apache.org/jira/browse/HADOOP-13386

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-21 Thread Sean Owen
Tough one. Yes it's because Hive is still 'included' with the no-Hadoop build. I think the avro scope is on purpose in that it's meant to use the version in the larger Hadoop installation it will run on. But, I suspect you'll find 1.7 doesn't work. Yes, there's a rat's nest of compatibility

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-21 Thread Michael Heuer
The scopes for avro-1.8.2.jar and avro-mapred-1.8.2-hadoop2.jar are different org.apache.avro avro ${avro.version} ${hadoop.deps.scope} ... org.apache.avro avro-mapred ${avro.version} ${avro.mapred.classifier} ${hive.deps.scope} What needs to be done then? At a minimum,

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-20 Thread Koert Kuipers
its somewhat weird because avro-mapred-1.8.2-hadoop2.jar is included in the hadoop-provided distro, but avro-1.8.2.jar is not. i tried to fix it but i am not too familiar with the pom file. regarding jline you only run into this if you use spark-shell (and it isnt always reproducible it seems).

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-20 Thread Sean Owen
Re: 1), I think we tried to fix that on the build side and it requires flags that not all tar versions (i.e. OS X) have. But that's tangential. I think the Avro + Parquet dependency situation is generally problematic -- see JIRA for some details. But yes I'm not surprised if Spark has a different

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-20 Thread Koert Kuipers
we run it without issues on hadoop 2.6 - 2.8 on top of my head. we however do some post-processing on the tarball: 1) we fix the ownership of the files inside the tar.gz file (should be uid/gid 0/0, otherwise untarring by root can lead to ownership by unknown user). 2) add avro-1.8.2.jar and

Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-20 Thread Michael Heuer
Hello, Which Hadoop version or versions are compatible with Spark 2.4.3 and Scala 2.12? The binary distribution spark-2.4.3-bin-without-hadoop-scala-2.12.tgz is missing avro-1.8.2.jar, so when attempting to run with Hadoop 2.7.7 there are classpath conflicts at runtime, as Hadoop 2.7.7