Hey Guys, OK flight landed, so have a sec to reply in more detail:
-----Original Message----- From: Patrick Wendell <pwend...@gmail.com> Reply-To: "dev@spark.incubator.apache.org" <dev@spark.incubator.apache.org> Date: Thursday, September 26, 2013 7:02 PM To: "dev@spark.incubator.apache.org" <dev@spark.incubator.apache.org> Subject: Re: Spark 0.8.0: bits need to come from ASF infrastructure >Chris et al, > >I'm -1 on this because it has many negative consequences for our existing >users: > >1. Users who do automated downloads based on our posted URL's (of >which we get many thousands each release) will no longer work. Apache also has many 10s of thousands of downloads each release, depending on the project. The best example I can think of is Open Office which receives 10M downloads/day IIRC -- yes Ooo has some special downloading infra help too, but beyond that popular projects like Apache Lucene and Solr regularly see 4000+ downloads per day and the ASF mirroring system works fine. > Now if >they do "wget XXX" with our posted link, it will fail in a weird way >to due to the redirect page. Is there a version of the closer.cgi >script which just performs 302 redirects instead of asking me to click >on a link? You can do something like e.g., curl "http://www.apache.org/dyn/closer.cgi/incubator/spark/spark-0.8.0-incubatin g/spark-0.8.0-incubating-bin-cdh4.tgz" | grep http | grep tgz| sort -n (and then some HTML strip magic) Even better if you have Apache Tika installed, something like: tika -t "http://www.apache.org/dyn/closer.cgi/incubator/spark/spark-0.8.0-incubatin g/spark-0.8.0-incubating-bin-cdh4.tgz" | grep http | grep tgz produces: http://mirror.nexcess.net/apache/incubator/spark/spark-0.8.0-incubating/spa rk-0.8.0-incubating-bin-cdh4.tgz http://mirror.nexcess.net/apache/incubator/spark/spark-0.8.0-incubating/sp ark-0.8.0-incubating-bin-cdh4.tgz http://apache.cs.utah.edu/incubator/spark/spark-0.8.0-incubating/spark-0.8 .0-incubating-bin-cdh4.tgz http://apache.tradebit.com/pub/incubator/spark/spark-0.8.0-incubating/spar k-0.8.0-incubating-bin-cdh4.tgz http://www.carfab.com/apachesoftware/incubator/spark/spark-0.8.0-incubatin g/spark-0.8.0-incubating-bin-cdh4.tgz http://apache.petsads.us/incubator/spark/spark-0.8.0-incubating/spark-0.8. 0-incubating-bin-cdh4.tgz http://www.trieuvan.com/apache/incubator/spark/spark-0.8.0-incubating/spar k-0.8.0-incubating-bin-cdh4.tgz http://mirrors.ibiblio.org/apache/incubator/spark/spark-0.8.0-incubating/s park-0.8.0-incubating-bin-cdh4.tgz http://mirror.olnevhost.net/pub/apache/incubator/spark/spark-0.8.0-incubat ing/spark-0.8.0-incubating-bin-cdh4.tgz http://psg.mtu.edu/pub/apache/incubator/spark/spark-0.8.0-incubating/spark -0.8.0-incubating-bin-cdh4.tgz http://apache.claz.org/incubator/spark/spark-0.8.0-incubating/spark-0.8.0- incubating-bin-cdh4.tgz http://mirror.metrocast.net/apache/incubator/spark/spark-0.8.0-incubating/ spark-0.8.0-incubating-bin-cdh4.tgz http://apache.mirrors.lucidnetworks.net/incubator/spark/spark-0.8.0-incuba ting/spark-0.8.0-incubating-bin-cdh4.tgz http://mirrors.gigenet.com/apache/incubator/spark/spark-0.8.0-incubating/s park-0.8.0-incubating-bin-cdh4.tgz http://www.poolsaboveground.com/apache/incubator/spark/spark-0.8.0-incubat ing/spark-0.8.0-incubating-bin-cdh4.tgz http://www.bizdirusa.com/mirrors/apache/incubator/spark/spark-0.8.0-incuba ting/spark-0.8.0-incubating-bin-cdh4.tgz http://mirror.sdunix.com/apache/incubator/spark/spark-0.8.0-incubating/spa rk-0.8.0-incubating-bin-cdh4.tgz http://download.nextag.com/apache/incubator/spark/spark-0.8.0-incubating/s park-0.8.0-incubating-bin-cdh4.tgz http://www.motorlogy.com/apache/incubator/spark/spark-0.8.0-incubating/spa rk-0.8.0-incubating-bin-cdh4.tgz http://mirror.cc.columbia.edu/pub/software/apache/incubator/spark/spark-0. 8.0-incubating/spark-0.8.0-incubating-bin-cdh4.tgz http://mirror.tcpdiag.net/apache/incubator/spark/spark-0.8.0-incubating/sp ark-0.8.0-incubating-bin-cdh4.tgz http://apache.mirrors.hoobly.com/incubator/spark/spark-0.8.0-incubating/sp ark-0.8.0-incubating-bin-cdh4.tgz http://www.eng.lsu.edu/mirrors/apache/incubator/spark/spark-0.8.0-incubati ng/spark-0.8.0-incubating-bin-cdh4.tgz http://apache.mesi.com.ar/incubator/spark/spark-0.8.0-incubating/spark-0.8 .0-incubating-bin-cdh4.tgz http://mirror.symnds.com/software/Apache/incubator/spark/spark-0.8.0-incub ating/spark-0.8.0-incubating-bin-cdh4.tgz http://mirror.reverse.net/pub/apache/incubator/spark/spark-0.8.0-incubatin g/spark-0.8.0-incubating-bin-cdh4.tgz http://apache.osuosl.org/incubator/spark/spark-0.8.0-incubating/spark-0.8. 0-incubating-bin-cdh4.tgz http://www.interior-dsgn.com/apache/incubator/spark/spark-0.8.0-incubating /spark-0.8.0-incubating-bin-cdh4.tgz http://mirror.cogentco.com/pub/apache/incubator/spark/spark-0.8.0-incubati ng/spark-0.8.0-incubating-bin-cdh4.tgz http://apache.spinellicreations.com/incubator/spark/spark-0.8.0-incubating /spark-0.8.0-incubating-bin-cdh4.tgz http://www.dsgnwrld.com/am/incubator/spark/spark-0.8.0-incubating/spark-0. 8.0-incubating-bin-cdh4.tgz http://apache.mirrors.pair.com/incubator/spark/spark-0.8.0-incubating/spar k-0.8.0-incubating-bin-cdh4.tgz http://mirrors.sonic.net/apache/incubator/spark/spark-0.8.0-incubating/spa rk-0.8.0-incubating-bin-cdh4.tgz http://apache.mirrors.tds.net/incubator/spark/spark-0.8.0-incubating/spark -0.8.0-incubating-bin-cdh4.tgz http://www.gtlib.gatech.edu/pub/apache/incubator/spark/spark-0.8.0-incubat ing/spark-0.8.0-incubating-bin-cdh4.tgz http://www.eu.apache.org/dist/incubator/spark/spark-0.8.0-incubating/spark- 0.8.0-incubating-bin-cdh4.tgz http://www.us.apache.org/dist/incubator/spark/spark-0.8.0-incubating/spark -0.8.0-incubating-bin-cdh4.tgz > >2. All other users have to click through an additional page to >download the software. That's just to target a mirror, but if they direct link to the actual artifact using the naming convention, it's not a big deal. > >3. Amazon Cloudfront is, as a whole, much more reliable and higher >bandwidth than the mirror network. Based on what facts? Not trying to be argumentative but would love to see some quantitative data on that statement. > >These are my concerns, that basically we're causing our users to have >a much worse experience. I've identified these concerns with moving to >the apache mirror, but perhaps I've overlooked some benefits that >would counteract these. Are there benefits? The benefits are pointing users to the bits as they are provided by Apache's mirror'ing system. Apache Spark (incubating)'s official () download home is at the Apache mirror'ing system. > >I completely agree that we need to send users to the signatures and >hashes at the Apache release site (to verify the release). So I did >add the link to this directly adjacent to the download. Thanks for considering all the options Patrick. Hope the above helps to explain. Cheers, Chris > >- Patrick > >On Thu, Sep 26, 2013 at 3:50 PM, Chris Mattmann <mattm...@apache.org> >wrote: >> Hey Guys, >> >> Yep the link should by the dyn/closer.cgi link on the website and +1 >> to Roman's comment about auditing spark-project.org links to be replaced >> with ASF counterparts. >> >> Cheers, >> Chris >> >> >> >> -----Original Message----- >> From: Patrick Wendell <pwend...@gmail.com> >> Reply-To: "dev@spark.incubator.apache.org" >><dev@spark.incubator.apache.org> >> Date: Wednesday, September 25, 2013 4:08 PM >> To: "dev@spark.incubator.apache.org" <dev@spark.incubator.apache.org> >> Subject: Re: Spark 0.8.0: bits need to come from ASF infrastructure >> >>>Yep, we definitely need to just directly point people the location at >>>apache.org where they can find the hashes. I just updated the release >>>notes and downloads page to point to that site. >>> >>>I just wanted to point out that mirroring these through a CDN seems >>>philosophically the same as mirroring through Apache, since in neither >>>case do we expect the users to trust the artifact they download. We >>>just need to be more explicit that we are, indeed, mirroring and >>>explain that the trusted root is at apache.org >>> >>>- Patrick >>> >>>On Wed, Sep 25, 2013 at 3:56 PM, Roman Shaposhnik <r...@apache.org> >>>wrote: >>>> On Wed, Sep 25, 2013 at 3:48 PM, Patrick Wendell <pwend...@gmail.com> >>>>wrote: >>>>> Hey we've actually distributed our artifacts through amazon >>>>>cloudfront >>>>> in the past (and that is where the website links redirect to). >>>>> >>>>> Since the apache mirrors don't distribute signatures anyways, >>>> >>>> True, but apache dist does. IOW, it is not uncommon for those >>>> having an automated build/fetching systems to get bits from >>>> one of the mirrors and then get the hashes directly from dist. >>>> >>>> In your current case, I don't think I know of a way to do that. >>>> >>>> Now, you may say that the current CDN you guys are you using >>>> is functioning like a mirror -- well, I'd say that it needs to be >>>> called out like one then. >>>> >>>> Otherwise, as a naive user I *really* have to guess where >>>> to get the hashes. >>>> >>>>> what is the difference between linking to an apache mirror vs using a >>>>>more >>>>> robust CDN? If people want to verify the downloads they need to go to >>>>> the apache root in either case. >>>>> >>>>> Is this just a cultural thing or is there some security reason? >>>> >>>> A bit of both I guess. >>>> >>>> Thanks, >>>> Roman. >> >>