Hi,

Am 03.04.2012 um 18:17 schrieb Roberto Galoppini:
> We at SourceForge have worked the last ten days to line-up dedicated
> infrastructure (including CDN services) to support the upcoming AOO
> download serving test.

I can hardly believe reading this! What's going on? We have an existing (and 
well working) mirror network, that handles any required load just fine. It's 
proven and time-tested. It has survived all releases with ease. By all 
calculation, and by practical experience, the combined upload capacity of the 
mirrors is sufficient to satisfy the peak download demand as well as the 
sustained demand. By the way, the "peak download demand" doesn't really differ 
a lot from the day-to-day download demand, contrary to public belief. The 
mirrors are numerous and spread around the world, and the chance of a client 
being sent to a close and fast mirror is good - better than with a handful of 
mirrors as is the case with the Sourceforge mirror network. Sourceforge 
specializes in something different - providing a myriad of small files by a set 
of specialized mirrors. "Normal", plain simple mirrors can't take part in this 
network as far as I can tell. Even though the network was considerably extended 
a few years ago, from 10 (under 10?) to >20 mirrors, this is still a small 
number of mirrors. (Even though these are power-mirrors, but those are part of 
our existing mirror network just as well.)

With our mirror network, mirrors can mirror partial content, so they can 
provide what's important in their region, like certain language packs only. 
This greatly increases the likelyhood of finding mirrors in remote areas, that 
don't have hundreds of gigabytes to spare. It's also unnecessary that mirrors 
carry old releases that are infrequently downloaded. Mirrors can run whatever 
HTTP software they prefer, not only Apache httpd, or even FTP servers. Mirrors 
can decide to offer mirroring only in their network/autonomous system/country 
to limit the share of requests they get, and from where they get it. Many 
mirrors don't have good international connectivity, but can be used well with 
us nevertheless. We provide cryptohashes, Metalinks, even P2P links, all fully 
automatically. That's very important for these unusually large files. 
Downloading without error correction is not fun. We select mirrors by GeoIP, 
but also by geographical distance as well as network topology, whatever gives a 
close match, and we already support IPv6.

It has taken some years to build all this, and a lot of the features were 
triggered directly by the work on the OpenOffice.org redirector. Built for 
OpenOffice.org

The software is the one kind of work that went into it, finding and collecting 
mirrors the other thing, building trust and lasting relationship. A mirror 
network isn't built overnight.

I think there is a danger that the Apache mirror network is equated with the 
OOo mirror network. This is a mistake in my view. The large files that we have 
are a totally different challenge. It's a huge difference to download 6MB 
tarballs and 200MB files, both from the users perspective ("why does my file 
not work, that I waited so long for!?") and from the mirrors perspective ("what 
are these 200 connections from Chinese IPs on my mirror server!?").
It is important to be able to give mirrors different weight, because they 
differ vastly in their capabilities, which can range from 4GBit bandwidth down 
brittle to 50Mbit somewhere else. Even inside an "Internet country" like 
Germany you'll have differences of 100 MBit to multiple Gbit, and you want to 
utilize the bandwidth well. We have this working well!

OpenOffice.org used a software called "Bouncer" before switching to 
MirrorBrain, which was one of the simpler solutions. I think everybody (who has 
been in the project a few years) will agree that we don't want to go back.

So I see that Sourceforge wants to beef up their network by renting a Content 
Delivery Network (CDN). Is that needed? yes, because they don't have enough 
bandwidth in mirrors. Is that a good idea? I don't think so, but I'm biased, 
because 1) I don't like advertisements and 2) I'm strongly rooted in the mirror 
community with both legs. 

In the mirror community, there is a kind of self esteem among the more 
ambitious mirror admins: they believe that stepping in of commercial CDNs is 
not needed to handle even peak download demand of the most popular Open Source 
software. And they work hard for it. Together, we have proven that the help of 
commercial CDNs is *not* needed, both with OpenOffice.org and with 
OpenSUSE.org. Mirrors have served > 20 GByte per second together. The bandwidth 
is there!
(In the past, Akamai was used during release peaks with OpenSUSE.org, so I have 
been there, and also got interesting insight and numbers there.)

I tried the currently configured download from 
http://www.openoffice.org/download today (from a real crappy end user box ;). 
It was slow and didn't start downloading immediately, but showed a page full of 
advertisement that didn't have any relation to OpenOffice.org, wanted to open a 
popup (MS IE said that and blocked it) and when the download started, it came 
from the Swiss mirror, but I'm in Germany! What's that? Thrown 3 years back in 
time? Sub-optimal. (I can guess who pays for the CDN that is rented to help 
out: advertising.)


Do you really want to ditch what we have built? Ditching the system that 
improved downloading OpenOffice.org in the farthest corners of the world? 
Exchanging it against a handful of Sourceforge mirrors, and 250 Apache mirrors, 
many of which lack the capability? Some are big, but many will be far from 
having the bandwidth to deliver large files. 

Something that Apache's mirror system also can't do is sending me to my local 
mirror (my very ISP in my city runs a mirror, and my home IP is in their 
netblock). Apache mirror system sends me to *any* mirror in my country, while 
our current solution recognizes the network topology and lets me download from 
the local mirror. Especially with large files, that's very nice both for the 
ISP and for me as user. Sourceforge can theoretically do this (because they use 
a part of MirrorBrain for that purpose!) but don't have enough mirrors to play 
this out. This is not only useful with single ISPs, if they have a mirror; it's 
also useful with autonomous systems (AS) of networks that share a backbone, 
like most German universities in AS680 here in Germany. 


So we will have a *technically inferiour* solution in the future? That's not 
the Apache way, is it?

I have been told more than once, on this list, that "it will be the Apache 
mirror system and nothing else". I didn't understand the reasons (except for 
policy, no special treatment for individual projects), but it won't work that 
way IMO.

Now it seems to me that the Apache mirror system seeked the help of 
Sourceforge.net. If that means that some doubts crept up, then I share those 
doubts. But I don't see Sourceforge.net as the solution either, as explained 
above. They have their merits, and I like their dedication and the specialized 
system they've built (with features that I'm envious of!), but I think our 
existing solution is better suited. And not only that, IMO it is a very 
important prerequisite of being successful. No well-working downloads, no luck 
with distributing FOSS that consists of large files. 

Thanks for reading that far,
Peter

Reply via email to