This was actually an AMPLab bucket.
On Feb 27, 2018, 6:04 PM +1300, Holden Karau <holden.ka...@gmail.com>, wrote: > Thanks Nick, we deprecated this during the roll over to the new release > managers. I assume this bucket was maintained by someone at databricks so > maybe they can chime in. > > > On Feb 26, 2018 8:57 PM, "Nicholas Chammas" <nicholas.cham...@gmail.com> > > wrote: > > > If you go to the Downloads page and download Spark 2.2.1, you’ll get a > > > link to an Apache mirror. It didn’t use to be this way. As recently as > > > Spark 2.2.0, downloads were served via CloudFront, which was backed by an > > > S3 bucket named spark-related-packages. > > > It seems that we’ve stopped using CloudFront, and the S3 bucket behind it > > > has stopped receiving updates (e.g. Spark 2.2.1 isn’t there). I’m > > > guessing this is part of an effort to use the Apache mirror network, like > > > other Apache projects do. > > > From a user perspective, the Apache mirror network is several steps down > > > from using a modern CDN. Let me summarize why: > > > > > > 1. Apache mirrors are often slow. Apache does not impose any performance > > > requirements on its mirrors. The difference between getting a good mirror > > > and a bad one means downloading Spark in less than a minute vs. 20 > > > minutes. The problem is so bad that I’ve thought about adding an Apache > > > mirror blacklist to Flintrock to avoid getting one of these dud mirrors. > > > 2. Apache mirrors are inconvenient to use. When you download something > > > from an Apache mirror, you get a link like this one. Instead of > > > automatically redirecting you to your download, though, you need to > > > process the results you get back to find your download target. And you > > > need to handle the high download failure rate, since sometimes the mirror > > > you get doesn’t have the file it claims to have. > > > 3. Apache mirrors are incomplete. Apache mirrors only keep around the > > > latest releases, save for a few “archive” mirrors, which are often slow. > > > So if you want to download anything but the latest version of Spark, you > > > are out of luck. > > > > > > Some of these problems can be mitigated by picking a specific mirror that > > > works well and hardcoding it in your scripts, but that defeats the > > > purpose of dynamically selecting a mirror and makes you a “bad” user of > > > the mirror network. > > > I raised some of these issues over on INFRA-10999. The ticket sat for a > > > year before I heard anything back, and the bottom line was that none of > > > the above problems have a solution on the horizon. It’s fine. I > > > understand that Apache is a volunteer organization and that the > > > infrastructure team has a lot to manage as it is. I still find it > > > disappointing that an organization of Apache’s stature doesn’t have a > > > better solution for this in collaboration with a third party. Python > > > serves PyPI downloads using Fastly and Homebrew serves packages using > > > Bintray. They both work really, really well. Why don’t we have something > > > as good for Apache projects? Anyway, that’s a separate discussion. > > > What I want to say is this: > > > Dear whoever owns the spark-related-packages S3 bucket, > > > Please keep the bucket up-to-date with the latest Spark releases, > > > alongside the past releases that are already on there. It’s a huge help > > > to the Flintrock project, and it’s an equally big help to those of us > > > writing infrastructure automation scripts that deploy Spark in other > > > contexts. > > > I understand that hosting this stuff is not free, and that I am not > > > paying anything for this service. If it needs to go, so be it. But I > > > wanted to take this opportunity to lay out the benefits I’ve enjoyed > > > thanks to having this bucket around, and to make sure that if it did die, > > > it didn’t die a quiet death. > > > Sincerely, > > > Nick > > > >