See http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-d3kbcqa49mib13-cloudfront-net-td22427.html -- it was 'retired', yes.
Agree with all that, though they're intended for occasional individual use and not a case where performance and uptime matter. For that, I think you'd want to just host your own copy of the bits you need. The notional problem was that the S3 bucket wasn't obviously controlled/blessed by the ASF and yet was a source of official bits. It was another set of third-party credentials to hand around to release managers, which was IIRC a little problematic. Homebrew does host distributions of ASF projects, like Spark, FWIW. On Mon, Feb 26, 2018 at 10:57 PM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > If you go to the Downloads <http://spark.apache.org/downloads.html> page > and download Spark 2.2.1, you’ll get a link to an Apache mirror. It didn’t > use to be this way. As recently as Spark 2.2.0, downloads were served via > CloudFront <https://aws.amazon.com/cloudfront/>, which was backed by an > S3 bucket named spark-related-packages. > > It seems that we’ve stopped using CloudFront, and the S3 bucket behind it > has stopped receiving updates (e.g. Spark 2.2.1 isn’t there). I’m guessing > this is part of an effort to use the Apache mirror network, like other > Apache projects do. > > From a user perspective, the Apache mirror network is several steps down > from using a modern CDN. Let me summarize why: > > 1. *Apache mirrors are often slow.* Apache does not impose any > performance requirements on its mirrors > > <https://issues.apache.org/jira/browse/INFRA-10999?focusedCommentId=15717950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15717950>. > The difference between getting a good mirror and a bad one means > downloading Spark in less than a minute vs. 20 minutes. The problem is so > bad that I’ve thought about adding an Apache mirror blacklist > <https://github.com/nchammas/flintrock/issues/84#issuecomment-185038678> > to Flintrock to avoid getting one of these dud mirrors. > 2. *Apache mirrors are inconvenient to use.* When you download > something from an Apache mirror, you get a link like this one > > <https://www.apache.org/dyn/closer.lua/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz>. > Instead of automatically redirecting you to your download, though, you need > to process the results you get back > > <https://github.com/nchammas/flintrock/blob/67bf84a1b7cfa1c276cf57ecd8a0b27613ad2698/flintrock/scripts/download-hadoop.py#L21-L42> > to find your download target. And you need to handle the high download > failure rate, since sometimes the mirror you get doesn’t have the file it > claims to have. > 3. *Apache mirrors are incomplete.* Apache mirrors only keep around > the latest releases, save for a few “archive” mirrors, which are often > slow. So if you want to download anything but the latest version of Spark, > you are out of luck. > > Some of these problems can be mitigated by picking a specific mirror that > works well and hardcoding it in your scripts, but that defeats the purpose > of dynamically selecting a mirror and makes you a “bad” user of the mirror > network. > > I raised some of these issues over on INFRA-10999 > <https://issues.apache.org/jira/browse/INFRA-10999>. The ticket sat for a > year before I heard anything back, and the bottom line was that none of the > above problems have a solution on the horizon. It’s fine. I understand that > Apache is a volunteer organization and that the infrastructure team has a > lot to manage as it is. I still find it disappointing that an organization > of Apache’s stature doesn’t have a better solution for this in > collaboration with a third party. Python serves PyPI downloads using > Fastly <https://www.fastly.com/> and Homebrew serves packages using > Bintray <https://bintray.com/>. They both work really, really well. Why > don’t we have something as good for Apache projects? Anyway, that’s a > separate discussion. > > What I want to say is this: > > Dear whoever owns the spark-related-packages S3 bucket > <https://s3.amazonaws.com/spark-related-packages/>, > > Please keep the bucket up-to-date with the latest Spark releases, > alongside the past releases that are already on there. It’s a huge help to > the Flintrock <https://github.com/nchammas/flintrock> project, and it’s > an equally big help to those of us writing infrastructure automation > scripts that deploy Spark in other contexts. > > I understand that hosting this stuff is not free, and that I am not paying > anything for this service. If it needs to go, so be it. But I wanted to > take this opportunity to lay out the benefits I’ve enjoyed thanks to having > this bucket around, and to make sure that if it did die, it didn’t die a > quiet death. > > Sincerely, > Nick > >