Building with -Phadoop-2.7 didn’t help, and if I remember correctly, building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release, so it appears something has changed since then.
I wasn’t familiar with -Phadoop-cloud, but I can try that. My goal here is simply to confirm that this release of Spark works with hadoop-aws like past releases did, particularly for Flintrock users who use Spark with S3A. We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds with every Spark release. If the -hadoop2.7 release build won’t work with hadoop-aws anymore, are there plans to provide a new build type that will? Apologies if the question is poorly formed. I’m batting a bit outside my league here. Again, my goal is simply to confirm that I/my users still have a way to use s3a://. In the past, that way was simply to call pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar. If that will no longer work, I’m trying to confirm that the change of behavior is intentional or acceptable (as a review for the Spark project) and figure out what I need to change (as due diligence for Flintrock’s users). Nick On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <van...@cloudera.com> wrote: > Using the hadoop-aws package is probably going to be a little more > complicated than that. The best bet is to use a custom build of Spark > that includes it (use -Phadoop-cloud). Otherwise you're probably > looking at some nasty dependency issues, especially if you end up > mixing different versions of Hadoop. > > On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: > > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4 > using > > Flintrock. However, trying to load the hadoop-aws package gave me some > > errors. > > > > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4 > > > > <snipped> > > > > :: problems summary :: > > :::: WARNINGS > > [NOT FOUND ] > > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms) > > ==== local-m2-cache: tried > > > > > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar > > [NOT FOUND ] > > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms) > > ==== local-m2-cache: tried > > > > > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar > > [NOT FOUND ] > > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms) > > ==== local-m2-cache: tried > > > > > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar > > [NOT FOUND ] > > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms) > > ==== local-m2-cache: tried > > > > > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar > > > > I’d guess I’m probably using the wrong version of hadoop-aws, but I > called > > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try. > > > > Any quick pointers? > > > > Nick > > > > > > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <van...@cloudera.com> > wrote: > >> > >> Starting with my own +1 (binding). > >> > >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <van...@cloudera.com> > >> wrote: > >> > Please vote on releasing the following candidate as Apache Spark > version > >> > 2.3.1. > >> > > >> > Given that I expect at least a few people to be busy with Spark Summit > >> > next > >> > week, I'm taking the liberty of setting an extended voting period. The > >> > vote > >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT). > >> > > >> > It passes with a majority of +1 votes, which must include at least 3 > +1 > >> > votes > >> > from the PMC. > >> > > >> > [ ] +1 Release this package as Apache Spark 2.3.1 > >> > [ ] -1 Do not release this package because ... > >> > > >> > To learn more about Apache Spark, please see http://spark.apache.org/ > >> > > >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3): > >> > https://github.com/apache/spark/tree/v2.3.1-rc4 > >> > > >> > The release files, including signatures, digests, etc. can be found > at: > >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/ > >> > > >> > Signatures used for Spark RCs can be found in this file: > >> > https://dist.apache.org/repos/dist/dev/spark/KEYS > >> > > >> > The staging repository for this release can be found at: > >> > > https://repository.apache.org/content/repositories/orgapachespark-1272/ > >> > > >> > The documentation corresponding to this release can be found at: > >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/ > >> > > >> > The list of bug fixes going into 2.3.1 can be found at the following > >> > URL: > >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432 > >> > > >> > FAQ > >> > > >> > ========================= > >> > How can I help test this release? > >> > ========================= > >> > > >> > If you are a Spark user, you can help us test this release by taking > >> > an existing Spark workload and running on this release candidate, then > >> > reporting any regressions. > >> > > >> > If you're working in PySpark you can set up a virtual env and install > >> > the current RC and see if anything important breaks, in the Java/Scala > >> > you can add the staging repository to your projects resolvers and test > >> > with the RC (make sure to clean up the artifact cache before/after so > >> > you don't end up building with a out of date RC going forward). > >> > > >> > =========================================== > >> > What should happen to JIRA tickets still targeting 2.3.1? > >> > =========================================== > >> > > >> > The current list of open tickets targeted at 2.3.1 can be found at: > >> > https://s.apache.org/Q3Uo > >> > > >> > Committers should look at those and triage. Extremely important bug > >> > fixes, documentation, and API tweaks that impact compatibility should > >> > be worked on immediately. Everything else please retarget to an > >> > appropriate release. > >> > > >> > ================== > >> > But my bug isn't fixed? > >> > ================== > >> > > >> > In order to make timely releases, we will typically not hold the > >> > release unless the bug in question is a regression from the previous > >> > release. That being said, if there is something which is a regression > >> > that has not been correctly targeted please ping me or a committer to > >> > help target the issue. > >> > > >> > > >> > -- > >> > Marcelo > >> > >> > >> > >> -- > >> Marcelo > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > > > > > > -- > Marcelo >