Hey Guys,

OK flight landed, so have a sec to reply in more detail:



-----Original Message-----
From: Patrick Wendell <pwend...@gmail.com>
Reply-To: "dev@spark.incubator.apache.org" <dev@spark.incubator.apache.org>
Date: Thursday, September 26, 2013 7:02 PM
To: "dev@spark.incubator.apache.org" <dev@spark.incubator.apache.org>
Subject: Re: Spark 0.8.0: bits need to come from ASF infrastructure

>Chris et al,
>
>I'm -1 on this because it has many negative consequences for our existing
>users:
>
>1. Users who do automated downloads based on our posted URL's (of
>which we get many thousands each release) will no longer work.

Apache also has many 10s of thousands of downloads each release, depending
on
the project. The best example I can think of is Open Office which receives
10M downloads/day IIRC -- yes Ooo has some special downloading infra help
too,
but beyond that popular projects like Apache Lucene and Solr regularly see
4000+
downloads per day and the ASF mirroring system works fine.


> Now if
>they do "wget XXX" with our posted link, it will fail in a weird way
>to due to the redirect page. Is there a version of the closer.cgi
>script which just performs 302 redirects instead of asking me to click
>on a link?

You can do something like e.g.,

curl 
"http://www.apache.org/dyn/closer.cgi/incubator/spark/spark-0.8.0-incubatin
g/spark-0.8.0-incubating-bin-cdh4.tgz" | grep http | grep tgz| sort -n
(and then some HTML strip magic)

Even better if you have Apache Tika installed, something like:

tika -t 
"http://www.apache.org/dyn/closer.cgi/incubator/spark/spark-0.8.0-incubatin
g/spark-0.8.0-incubating-bin-cdh4.tgz" | grep http | grep tgz

produces:

http://mirror.nexcess.net/apache/incubator/spark/spark-0.8.0-incubating/spa
rk-0.8.0-incubating-bin-cdh4.tgz
 http://mirror.nexcess.net/apache/incubator/spark/spark-0.8.0-incubating/sp
ark-0.8.0-incubating-bin-cdh4.tgz
 http://apache.cs.utah.edu/incubator/spark/spark-0.8.0-incubating/spark-0.8
.0-incubating-bin-cdh4.tgz
 http://apache.tradebit.com/pub/incubator/spark/spark-0.8.0-incubating/spar
k-0.8.0-incubating-bin-cdh4.tgz
 http://www.carfab.com/apachesoftware/incubator/spark/spark-0.8.0-incubatin
g/spark-0.8.0-incubating-bin-cdh4.tgz
 http://apache.petsads.us/incubator/spark/spark-0.8.0-incubating/spark-0.8.
0-incubating-bin-cdh4.tgz
 http://www.trieuvan.com/apache/incubator/spark/spark-0.8.0-incubating/spar
k-0.8.0-incubating-bin-cdh4.tgz
 http://mirrors.ibiblio.org/apache/incubator/spark/spark-0.8.0-incubating/s
park-0.8.0-incubating-bin-cdh4.tgz
 http://mirror.olnevhost.net/pub/apache/incubator/spark/spark-0.8.0-incubat
ing/spark-0.8.0-incubating-bin-cdh4.tgz
 http://psg.mtu.edu/pub/apache/incubator/spark/spark-0.8.0-incubating/spark
-0.8.0-incubating-bin-cdh4.tgz
 http://apache.claz.org/incubator/spark/spark-0.8.0-incubating/spark-0.8.0-
incubating-bin-cdh4.tgz
 http://mirror.metrocast.net/apache/incubator/spark/spark-0.8.0-incubating/
spark-0.8.0-incubating-bin-cdh4.tgz
 http://apache.mirrors.lucidnetworks.net/incubator/spark/spark-0.8.0-incuba
ting/spark-0.8.0-incubating-bin-cdh4.tgz
 http://mirrors.gigenet.com/apache/incubator/spark/spark-0.8.0-incubating/s
park-0.8.0-incubating-bin-cdh4.tgz
 http://www.poolsaboveground.com/apache/incubator/spark/spark-0.8.0-incubat
ing/spark-0.8.0-incubating-bin-cdh4.tgz
 http://www.bizdirusa.com/mirrors/apache/incubator/spark/spark-0.8.0-incuba
ting/spark-0.8.0-incubating-bin-cdh4.tgz
 http://mirror.sdunix.com/apache/incubator/spark/spark-0.8.0-incubating/spa
rk-0.8.0-incubating-bin-cdh4.tgz
 http://download.nextag.com/apache/incubator/spark/spark-0.8.0-incubating/s
park-0.8.0-incubating-bin-cdh4.tgz
 http://www.motorlogy.com/apache/incubator/spark/spark-0.8.0-incubating/spa
rk-0.8.0-incubating-bin-cdh4.tgz
 http://mirror.cc.columbia.edu/pub/software/apache/incubator/spark/spark-0.
8.0-incubating/spark-0.8.0-incubating-bin-cdh4.tgz
 http://mirror.tcpdiag.net/apache/incubator/spark/spark-0.8.0-incubating/sp
ark-0.8.0-incubating-bin-cdh4.tgz
 http://apache.mirrors.hoobly.com/incubator/spark/spark-0.8.0-incubating/sp
ark-0.8.0-incubating-bin-cdh4.tgz
 http://www.eng.lsu.edu/mirrors/apache/incubator/spark/spark-0.8.0-incubati
ng/spark-0.8.0-incubating-bin-cdh4.tgz
 http://apache.mesi.com.ar/incubator/spark/spark-0.8.0-incubating/spark-0.8
.0-incubating-bin-cdh4.tgz
 http://mirror.symnds.com/software/Apache/incubator/spark/spark-0.8.0-incub
ating/spark-0.8.0-incubating-bin-cdh4.tgz
 http://mirror.reverse.net/pub/apache/incubator/spark/spark-0.8.0-incubatin
g/spark-0.8.0-incubating-bin-cdh4.tgz
 http://apache.osuosl.org/incubator/spark/spark-0.8.0-incubating/spark-0.8.
0-incubating-bin-cdh4.tgz
 http://www.interior-dsgn.com/apache/incubator/spark/spark-0.8.0-incubating
/spark-0.8.0-incubating-bin-cdh4.tgz
 http://mirror.cogentco.com/pub/apache/incubator/spark/spark-0.8.0-incubati
ng/spark-0.8.0-incubating-bin-cdh4.tgz
 http://apache.spinellicreations.com/incubator/spark/spark-0.8.0-incubating
/spark-0.8.0-incubating-bin-cdh4.tgz
 http://www.dsgnwrld.com/am/incubator/spark/spark-0.8.0-incubating/spark-0.
8.0-incubating-bin-cdh4.tgz
 http://apache.mirrors.pair.com/incubator/spark/spark-0.8.0-incubating/spar
k-0.8.0-incubating-bin-cdh4.tgz
 http://mirrors.sonic.net/apache/incubator/spark/spark-0.8.0-incubating/spa
rk-0.8.0-incubating-bin-cdh4.tgz
 http://apache.mirrors.tds.net/incubator/spark/spark-0.8.0-incubating/spark
-0.8.0-incubating-bin-cdh4.tgz
 http://www.gtlib.gatech.edu/pub/apache/incubator/spark/spark-0.8.0-incubat
ing/spark-0.8.0-incubating-bin-cdh4.tgz
  
http://www.eu.apache.org/dist/incubator/spark/spark-0.8.0-incubating/spark-
0.8.0-incubating-bin-cdh4.tgz
 http://www.us.apache.org/dist/incubator/spark/spark-0.8.0-incubating/spark
-0.8.0-incubating-bin-cdh4.tgz


>
>2. All other users have to click through an additional page to
>download the software.

That's just to target a mirror, but if they direct link to the actual
artifact using the naming convention,
it's not a big deal.

>
>3. Amazon Cloudfront is, as a whole, much more reliable and higher
>bandwidth than the mirror network.

Based on what facts? Not trying to be argumentative but would love to see
some quantitative data
on that statement.

>
>These are my concerns, that basically we're causing our users to have
>a much worse experience. I've identified these concerns with moving to
>the apache mirror, but perhaps I've overlooked some benefits that
>would counteract these. Are there benefits?

The benefits are pointing users to the bits as they are provided by
Apache's
mirror'ing system. Apache Spark (incubating)'s official () download home
is
at the Apache mirror'ing system.


>
>I completely agree that we need to send users to the signatures and
>hashes at the Apache release site (to verify the release). So I did
>add the link to this directly adjacent to the download.

Thanks for considering all the options Patrick. Hope the above helps to
explain.

Cheers,
Chris

>
>- Patrick
>
>On Thu, Sep 26, 2013 at 3:50 PM, Chris Mattmann <mattm...@apache.org>
>wrote:
>> Hey Guys,
>>
>> Yep the link should by the dyn/closer.cgi link on the website and +1
>> to Roman's comment about auditing spark-project.org links to be replaced
>> with ASF counterparts.
>>
>> Cheers,
>> Chris
>>
>>
>>
>> -----Original Message-----
>> From: Patrick Wendell <pwend...@gmail.com>
>> Reply-To: "dev@spark.incubator.apache.org"
>><dev@spark.incubator.apache.org>
>> Date: Wednesday, September 25, 2013 4:08 PM
>> To: "dev@spark.incubator.apache.org" <dev@spark.incubator.apache.org>
>> Subject: Re: Spark 0.8.0: bits need to come from ASF infrastructure
>>
>>>Yep, we definitely need to just directly point people the location at
>>>apache.org where they can find the hashes. I just updated the release
>>>notes and downloads page to point to that site.
>>>
>>>I just wanted to point out that mirroring these through a CDN seems
>>>philosophically the same as mirroring through Apache, since in neither
>>>case do we expect the users to trust the artifact they download. We
>>>just need to be more explicit that we are, indeed, mirroring and
>>>explain that the trusted root is at apache.org
>>>
>>>- Patrick
>>>
>>>On Wed, Sep 25, 2013 at 3:56 PM, Roman Shaposhnik <r...@apache.org>
>>>wrote:
>>>> On Wed, Sep 25, 2013 at 3:48 PM, Patrick Wendell <pwend...@gmail.com>
>>>>wrote:
>>>>> Hey we've actually distributed our artifacts through amazon
>>>>>cloudfront
>>>>> in the past (and that is where the website links redirect to).
>>>>>
>>>>> Since the apache mirrors don't distribute signatures anyways,
>>>>
>>>> True, but apache dist does. IOW, it is not uncommon for those
>>>> having an automated build/fetching systems to get bits from
>>>> one of the mirrors and then get the hashes directly from dist.
>>>>
>>>> In your current case, I don't think I know of a way to do that.
>>>>
>>>> Now, you may say that the current CDN you guys are you using
>>>> is functioning like a mirror -- well, I'd say that it needs to be
>>>> called out like one then.
>>>>
>>>> Otherwise, as a naive user I *really* have to guess where
>>>> to get the hashes.
>>>>
>>>>> what is the difference between linking to an apache mirror vs using a
>>>>>more
>>>>> robust CDN? If people want to verify the downloads they need to go to
>>>>> the apache root in either case.
>>>>>
>>>>> Is this just a cultural thing or is there some security reason?
>>>>
>>>> A bit of both I guess.
>>>>
>>>> Thanks,
>>>> Roman.
>>
>>


Reply via email to