+1 On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <nicholas.cham...@gmail.com> wrote:
> I'll give that a try, but I'll still have to figure out what to do if none > of the release builds work with hadoop-aws, since Flintrock deploys Spark > release builds to set up a cluster. Building Spark is slow, so we only do > it if the user specifically requests a Spark version by git hash. (This is > basically how spark-ec2 did things, too.) > > > On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <van...@cloudera.com> wrote: > >> If you're building your own Spark, definitely try the hadoop-cloud >> profile. Then you don't even need to pull anything at runtime, >> everything is already packaged with Spark. >> >> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas >> <nicholas.cham...@gmail.com> wrote: >> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me >> > either (even building with -Phadoop-2.7). I guess I’ve been relying on >> an >> > unsupported pattern and will need to figure something else out going >> forward >> > in order to use s3a://. >> > >> > >> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <van...@cloudera.com> >> wrote: >> >> >> >> I have personally never tried to include hadoop-aws that way. But at >> >> the very least, I'd try to use the same version of Hadoop as the Spark >> >> build (2.7.3 IIRC). I don't really expect a different version to work, >> >> and if it did in the past it definitely was not by design. >> >> >> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas >> >> <nicholas.cham...@gmail.com> wrote: >> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly, >> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 >> release, >> >> > so >> >> > it appears something has changed since then. >> >> > >> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that. >> >> > >> >> > My goal here is simply to confirm that this release of Spark works >> with >> >> > hadoop-aws like past releases did, particularly for Flintrock users >> who >> >> > use >> >> > Spark with S3A. >> >> > >> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop >> builds >> >> > with >> >> > every Spark release. If the -hadoop2.7 release build won’t work with >> >> > hadoop-aws anymore, are there plans to provide a new build type that >> >> > will? >> >> > >> >> > Apologies if the question is poorly formed. I’m batting a bit >> outside my >> >> > league here. Again, my goal is simply to confirm that I/my users >> still >> >> > have >> >> > a way to use s3a://. In the past, that way was simply to call pyspark >> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very >> similar. >> >> > If >> >> > that will no longer work, I’m trying to confirm that the change of >> >> > behavior >> >> > is intentional or acceptable (as a review for the Spark project) and >> >> > figure >> >> > out what I need to change (as due diligence for Flintrock’s users). >> >> > >> >> > Nick >> >> > >> >> > >> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <van...@cloudera.com> >> >> > wrote: >> >> >> >> >> >> Using the hadoop-aws package is probably going to be a little more >> >> >> complicated than that. The best bet is to use a custom build of >> Spark >> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably >> >> >> looking at some nasty dependency issues, especially if you end up >> >> >> mixing different versions of Hadoop. >> >> >> >> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas >> >> >> <nicholas.cham...@gmail.com> wrote: >> >> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 >> RC4 >> >> >> > using >> >> >> > Flintrock. However, trying to load the hadoop-aws package gave me >> >> >> > some >> >> >> > errors. >> >> >> > >> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4 >> >> >> > >> >> >> > <snipped> >> >> >> > >> >> >> > :: problems summary :: >> >> >> > :::: WARNINGS >> >> >> > [NOT FOUND ] >> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms) >> >> >> > ==== local-m2-cache: tried >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar >> >> >> > [NOT FOUND ] >> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms) >> >> >> > ==== local-m2-cache: tried >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar >> >> >> > [NOT FOUND ] >> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms) >> >> >> > ==== local-m2-cache: tried >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar >> >> >> > [NOT FOUND ] >> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms) >> >> >> > ==== local-m2-cache: tried >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar >> >> >> > >> >> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but >> I >> >> >> > called >> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else >> to >> >> >> > try. >> >> >> > >> >> >> > Any quick pointers? >> >> >> > >> >> >> > Nick >> >> >> > >> >> >> > >> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin < >> van...@cloudera.com> >> >> >> > wrote: >> >> >> >> >> >> >> >> Starting with my own +1 (binding). >> >> >> >> >> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin < >> van...@cloudera.com> >> >> >> >> wrote: >> >> >> >> > Please vote on releasing the following candidate as Apache >> Spark >> >> >> >> > version >> >> >> >> > 2.3.1. >> >> >> >> > >> >> >> >> > Given that I expect at least a few people to be busy with Spark >> >> >> >> > Summit >> >> >> >> > next >> >> >> >> > week, I'm taking the liberty of setting an extended voting >> period. >> >> >> >> > The >> >> >> >> > vote >> >> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 >> >> >> >> > PDT). >> >> >> >> > >> >> >> >> > It passes with a majority of +1 votes, which must include at >> least >> >> >> >> > 3 >> >> >> >> > +1 >> >> >> >> > votes >> >> >> >> > from the PMC. >> >> >> >> > >> >> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1 >> >> >> >> > [ ] -1 Do not release this package because ... >> >> >> >> > >> >> >> >> > To learn more about Apache Spark, please see >> >> >> >> > http://spark.apache.org/ >> >> >> >> > >> >> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3): >> >> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4 >> >> >> >> > >> >> >> >> > The release files, including signatures, digests, etc. can be >> >> >> >> > found >> >> >> >> > at: >> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/ >> >> >> >> > >> >> >> >> > Signatures used for Spark RCs can be found in this file: >> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS >> >> >> >> > >> >> >> >> > The staging repository for this release can be found at: >> >> >> >> > >> >> >> >> > >> >> >> >> > >> https://repository.apache.org/content/repositories/orgapachespark-1272/ >> >> >> >> > >> >> >> >> > The documentation corresponding to this release can be found >> at: >> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/ >> >> >> >> > >> >> >> >> > The list of bug fixes going into 2.3.1 can be found at the >> >> >> >> > following >> >> >> >> > URL: >> >> >> >> > >> https://issues.apache.org/jira/projects/SPARK/versions/12342432 >> >> >> >> > >> >> >> >> > FAQ >> >> >> >> > >> >> >> >> > ========================= >> >> >> >> > How can I help test this release? >> >> >> >> > ========================= >> >> >> >> > >> >> >> >> > If you are a Spark user, you can help us test this release by >> >> >> >> > taking >> >> >> >> > an existing Spark workload and running on this release >> candidate, >> >> >> >> > then >> >> >> >> > reporting any regressions. >> >> >> >> > >> >> >> >> > If you're working in PySpark you can set up a virtual env and >> >> >> >> > install >> >> >> >> > the current RC and see if anything important breaks, in the >> >> >> >> > Java/Scala >> >> >> >> > you can add the staging repository to your projects resolvers >> and >> >> >> >> > test >> >> >> >> > with the RC (make sure to clean up the artifact cache >> before/after >> >> >> >> > so >> >> >> >> > you don't end up building with a out of date RC going forward). >> >> >> >> > >> >> >> >> > =========================================== >> >> >> >> > What should happen to JIRA tickets still targeting 2.3.1? >> >> >> >> > =========================================== >> >> >> >> > >> >> >> >> > The current list of open tickets targeted at 2.3.1 can be found >> >> >> >> > at: >> >> >> >> > https://s.apache.org/Q3Uo >> >> >> >> > >> >> >> >> > Committers should look at those and triage. Extremely important >> >> >> >> > bug >> >> >> >> > fixes, documentation, and API tweaks that impact compatibility >> >> >> >> > should >> >> >> >> > be worked on immediately. Everything else please retarget to an >> >> >> >> > appropriate release. >> >> >> >> > >> >> >> >> > ================== >> >> >> >> > But my bug isn't fixed? >> >> >> >> > ================== >> >> >> >> > >> >> >> >> > In order to make timely releases, we will typically not hold >> the >> >> >> >> > release unless the bug in question is a regression from the >> >> >> >> > previous >> >> >> >> > release. That being said, if there is something which is a >> >> >> >> > regression >> >> >> >> > that has not been correctly targeted please ping me or a >> committer >> >> >> >> > to >> >> >> >> > help target the issue. >> >> >> >> > >> >> >> >> > >> >> >> >> > -- >> >> >> >> > Marcelo >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Marcelo >> >> >> >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Marcelo >> >> >> >> >> >> >> >> -- >> >> Marcelo >> >> >> >> -- >> Marcelo >> >