Re: CDN Test Results - Should We Continue Using a CDN?
On Sun, Mar 10, 2019 at 08:47:59PM -0700, Chris Marusich wrote: > In addition, CloudFront reports that traffic came from the following > locations (sorted by bytes transferred): > > Location Request Count Request % Bytes > - [...] > China17,841 0.48% 16.45 GB Looks like someone was benefitting from the CDN in China: https://lists.gnu.org/archive/html/guix-devel/2019-03/msg00222.html I vote that we continue using it. During the Guix Days, the availability of substitutes was identified as a frequent problem with Guix. Anything we can do to increase availability and download speeds is a good thing for the project. signature.asc Description: PGP signature
Re: CDN Test Results - Should We Continue Using a CDN?
Hello Ludovic! Ludovic Courtès writes: > Hi Maxim, > > Maxim Cournoyer skribis: > >> Pardon me for asking, but how does using a CDN frees up resources? >> Aren't the usual infrastructure preserved (e.g., ci.guix.info)? It >> seems it'd be an extra layer to maintain? > > One of the motivations for this is that berlin.guixsd.org > aka. ci.guix.info is a single machine, the head of our main build farm. > If that machine goes down, we have no substitutes. Having a cache like > a CDN provides some redundancy: if the build farm goes down, we’ll at > least still have cached substitutes, which leaves us time to fix the > build farm. I see. I understand that having the service continue running smoothly while fixing ci.guix.info must be a good stress reliever. [...] >> I'd rather see this (even modest) amount put into the hands of a >> motivated hacker to work on a distributed solution instead of >> encouraging a company which do not share our free software ideals. > > As discussed before, I definitely sympathize with this. Heck, if > someone had told me I’d argue in favor of a CDN after all this time > spent filling in CloudFare CAPTCHAs just because CloudFare decided that > user privacy doesn’t matter and that Tor users should be penalized, I’d > have laughed. ;-) > > So it’s definitely not an easy decision. Nevertheless, we have to > acknowledge the fact that our current substitute delivery infrastructure > is fragile. If people volunteer to maintain a set of mirrors with some > load balancing, that’s great, I’m all for it. But for now, we don’t > have that at all, hence the CDN. Right. I understand better the motivation behind the CDN now, thank you for taking the time to explain. Resiliency is indeed welcome and maybe even necessary until better things come. Maxim
Re: CDN Test Results - Should We Continue Using a CDN?
Hi Maxim, Maxim Cournoyer skribis: > Pardon me for asking, but how does using a CDN frees up resources? > Aren't the usual infrastructure preserved (e.g., ci.guix.info)? It > seems it'd be an extra layer to maintain? One of the motivations for this is that berlin.guixsd.org aka. ci.guix.info is a single machine, the head of our main build farm. If that machine goes down, we have no substitutes. Having a cache like a CDN provides some redundancy: if the build farm goes down, we’ll at least still have cached substitutes, which leaves us time to fix the build farm. We can have a cache that’s not a CDN, like we did with mirror.hydra.gnu.org, which runs an nginx caching proxy for hydra.gnu.org. However, that’s another machine to take care of (that’s not much work in practice, but still, we must be able to quickly respond to outages), and another single point of failure. > The heaviest bandwith usage appear to originate from areas already well > served by the current infrastructure (mirror.hydra.gnu.org -> North > America, ci.guix.info -> Europe), so I'm not sure spending resources on > a CDN is worthwhile in this context. I think the good bandwidth is the second motivation for the CDN, but it’s true that it still benefits the same groups of people; in particular we know that Cloudfront is unavailable in China. Nevertheless the extra performance is welcome IMO. I think substitute delivery plays an important role in the user experience so if we can improve it, the better. > I'd rather see this (even modest) amount put into the hands of a > motivated hacker to work on a distributed solution instead of > encouraging a company which do not share our free software ideals. As discussed before, I definitely sympathize with this. Heck, if someone had told me I’d argue in favor of a CDN after all this time spent filling in CloudFare CAPTCHAs just because CloudFare decided that user privacy doesn’t matter and that Tor users should be penalized, I’d have laughed. ;-) So it’s definitely not an easy decision. Nevertheless, we have to acknowledge the fact that our current substitute delivery infrastructure is fragile. If people volunteer to maintain a set of mirrors with some load balancing, that’s great, I’m all for it. But for now, we don’t have that at all, hence the CDN. Longer term, I do hope for IPFS to become our main delivery mechanism. I’ve posted a proof-of-concept that I think should allow us to get started, play with the idea, and find out how that works in practice. Thanks, Ludo’.
Re: CDN Test Results - Should We Continue Using a CDN?
Hi Maxim and others, Maxim Cournoyer writes: > Chris Marusich writes: > >> [...] Starting on February 23rd, 2019 we conducted a test using >> Amazon CloudFront. [...] The test concluded on March 23rd [...]. > > I'm I living in the past, or did you mean another date than March 23rd? > :-) No, you're right: I mixed up my months. The test actually began on January 23rd, 2019, and concluded on February 23rd (31 days total). By the way, I've double checked the other statistics. They're all accurate except for the test duration, which was actually 31 days. I just mixed up the months in my head. Sorry for the confusion! -- Chris signature.asc Description: PGP signature
Re: CDN Test Results - Should We Continue Using a CDN?
Hello Chris! Chris Marusich writes: > Hi Guix! > > Recently, the Guix project experimented with using a CDN to improve > substitute availability and performance. This email summarizes the > results of the test for your review. I also hope this email will start > a discussion about whether or not we should continue to use a CDN. > > First, I'll summarize what we did. Starting on February 23rd, 2019 we > conducted a test using Amazon CloudFront. We configured ci.guix.info so > that all requests for substitutes via that domain name would go through > an Amazon CloudFront distribution that we set up for this purpose. The > test concluded on March 23rd, and the CDN is not currently being used. I'm I living in the past, or did you mean another date than March 23rd? :-) > Amazon CloudFront provides us with billing information and aggregate > usage statistics. Here's the information for the duration of the test: > > Duration: 28 days (February 23rd - March 23rd) > Expense: 156.88 US Dollars > Requests received: 3,732,919 > Average request size: 490 KB > Bytes transferred: 1,744.5724 GB > Bytes from misses: 684.3992 GB > Hits: 2.14 M (57.44%) > Misses: 0.99 M (26.41%) > Errors: 602.91 K (16.15%) > 2xx: 2,983.24 K (79.92%) > 3xx: 146.753 K (3.93%) > 4xx: 593.159 K (15.89%) > 5xx: 9.471 K (0.25%) > [...] > Location Request Count Request % Bytes > - > United States933,44825.01% 562.52 GB > Germany 687,54818.42% 174.53 GB > France 341,5739.15% 167.36 GB > Canada 179,6304.81% 96.31 GB [...] > Since the test has concluded, we are not currently using a CDN. Going > forward, we need to decide if we want to continue to use a CDN. Did you > notice an improvement in download speed or substitute availability > during the test period? Do you have metrics of your own that you can > share with us? If so, please share the information so we can understand > whether it's worth continuing to pay for a CDN. I haven't noticed a big difference on ci.guix.info; but then my WiFi link seems to saturate around 1 MiB or so at home, so I'm not a very demanding user ;-). Things felt as zippy as usual. > One of the reasons why we wanted to use a CDN in the first place was to > free up resources so that the community could spend more time working on > better solutions. Pardon me for asking, but how does using a CDN frees up resources? Aren't the usual infrastructure preserved (e.g., ci.guix.info)? It seems it'd be an extra layer to maintain? The heaviest bandwith usage appear to originate from areas already well served by the current infrastructure (mirror.hydra.gnu.org -> North America, ci.guix.info -> Europe), so I'm not sure spending resources on a CDN is worthwhile in this context. I'd rather see this (even modest) amount put into the hands of a motivated hacker to work on a distributed solution instead of encouraging a company which do not share our free software ideals. I'm hoping this doesn't come across as too negative! Thanks for sharing this interesting information with us. Maxim
Re: CDN Test Results - Should We Continue Using a CDN?
Thank you for correcting my false assumptions and sharing that link. Ricardo Wurmus writes: > mikadoZero writes: > >> In "14.4.1 Software Freedom" of the Guix manual it says that Guix is free >> software and follows the free software distribution guidelines. >> >> Is using a proprietary non free CDN as a core part of Guix's >> infrastructure in conflict with Guix's software freedom? > > Two things: > > 1) It is not a core part of Guix’s infrastructure. People who want to > bypass the CDN can do so by fetching substitutes from berlin.guixsd.org > instead of ci.guix.info. People can also opt out of getting substitutes > all together or choose to get them from some other build farm. (The > build farm is little more than another Guix user.) > > 2) “proprietary” / “non-free” terminology does not apply to services. > See also > https://www.gnu.org/philosophy/network-services-arent-free-or-nonfree.html > > This is a case of “Service as a Hardware Substitute” where we pay to use > hardware that we do not physically control to substitute for having to > own and maintain hardware at a large number of physical locations in the > world. > >> Using a proprietary CDN has the potential for an unplanned increase in >> workload. This is because of the combination of vendor lock in and >> product line discontinuation. Which could create unplanned rework of >> setting up a CDN elsewhere. This hinders Guix's resource planning by >> introducing the potential for surprise rework. > > There is no vendor lock in. We can drop and have dropped the use of a > CDN without service interruption. If the CDN service were to be > discontinued we would simply revert to not offering package distribution > via CDN.
Re: CDN Test Results - Should We Continue Using a CDN?
mikadoZero writes: > In "14.4.1 Software Freedom" of the Guix manual it says that Guix is free > software and follows the free software distribution guidelines. > > Is using a proprietary non free CDN as a core part of Guix's > infrastructure in conflict with Guix's software freedom? Two things: 1) It is not a core part of Guix’s infrastructure. People who want to bypass the CDN can do so by fetching substitutes from berlin.guixsd.org instead of ci.guix.info. People can also opt out of getting substitutes all together or choose to get them from some other build farm. (The build farm is little more than another Guix user.) 2) “proprietary” / “non-free” terminology does not apply to services. See also https://www.gnu.org/philosophy/network-services-arent-free-or-nonfree.html This is a case of “Service as a Hardware Substitute” where we pay to use hardware that we do not physically control to substitute for having to own and maintain hardware at a large number of physical locations in the world. > Using a proprietary CDN has the potential for an unplanned increase in > workload. This is because of the combination of vendor lock in and > product line discontinuation. Which could create unplanned rework of > setting up a CDN elsewhere. This hinders Guix's resource planning by > introducing the potential for surprise rework. There is no vendor lock in. We can drop and have dropped the use of a CDN without service interruption. If the CDN service were to be discontinued we would simply revert to not offering package distribution via CDN. -- Ricardo
Re: CDN Test Results - Should We Continue Using a CDN?
Chris Marusich writes: > Since the test has concluded, we are not currently using a CDN. Going > forward, we need to decide if we want to continue to use a CDN. In "14.4.1 Software Freedom" of the Guix manual it says that Guix is free software and follows the free software distribution guidelines. Is using a proprietary non free CDN as a core part of Guix's infrastructure in conflict with Guix's software freedom? Using a proprietary CDN has the potential for an unplanned increase in workload. This is because of the combination of vendor lock in and product line discontinuation. Which could create unplanned rework of setting up a CDN elsewhere. This hinders Guix's resource planning by introducing the potential for surprise rework. Are there any free software content delivery networks? > One of the reasons why we wanted to use a CDN in the first place was to > free up resources so that the community could spend more time working on > better solutions. For example, some people have expressed an interest > in a distributed or peer-to-peer substitute mechanism using IPFS or > GNUnet. In fact, Ludo paved the way for this by submitting patches to > distribute substitutes over IPFS: > > https://issues.guix.info/issue/33899 > > However, it seems his work hasn't succeeded in exciting people enough to > carry the momentum forward. We need more people who are interested in > this and can work on it! Otherwise, it may never become a reality. So > if you care about distributed or peer-to-peer substitutes, please help! This is interesting. Peer-to-peer substitutes using free software is well aligned with Guix as a free software project. I would want to use this method if it was available. Has there been any progress on this since the end of that thread? Any guesses about how difficult this may be to complete and how much work might be required?
CDN Test Results - Should We Continue Using a CDN?
Hi Guix! Recently, the Guix project experimented with using a CDN to improve substitute availability and performance. This email summarizes the results of the test for your review. I also hope this email will start a discussion about whether or not we should continue to use a CDN. First, I'll summarize what we did. Starting on February 23rd, 2019 we conducted a test using Amazon CloudFront. We configured ci.guix.info so that all requests for substitutes via that domain name would go through an Amazon CloudFront distribution that we set up for this purpose. The test concluded on March 23rd, and the CDN is not currently being used. Amazon CloudFront provides us with billing information and aggregate usage statistics. Here's the information for the duration of the test: Duration: 28 days (February 23rd - March 23rd) Expense: 156.88 US Dollars Requests received: 3,732,919 Average request size: 490 KB Bytes transferred: 1,744.5724 GB Bytes from misses: 684.3992 GB Hits: 2.14 M (57.44%) Misses: 0.99 M (26.41%) Errors: 602.91 K (16.15%) 2xx: 2,983.24 K (79.92%) 3xx: 146.753 K (3.93%) 4xx: 593.159 K (15.89%) 5xx: 9.471 K (0.25%) Usage was fairly constant throughout the test. This means that the daily statistics for requests received, bytes transferred, HTTP response distribution, and hit rate neither grew nor fell significantly. The average request size (490 KB) may seem small, since usually one might expect substitutes to be large binaries. However, the size is reasonable because narinfo files are small, and error responses (e.g., 404) are probably being included in the average. The cache hit rate (57.44%) may also seem low, but it's also reasonable because it's aggregated over all of CloudFront's points of presence worldwide. If one request in Seattle is a cache hit, and one request in London is a cache miss, then that results in an overall cache hit rate of 50%. Different points of presence don't generally share caches. According to Amazon CloudFront, 11.75% of requests received came from "Bot/Crawler", which CloudFront defines as "primarily requests from search engines that are indexing your content". In addition, CloudFront reports that traffic came from the following locations (sorted by bytes transferred): Location Request Count Request % Bytes - United States933,44825.01% 562.52 GB Germany 687,54818.42% 174.53 GB France 341,5739.15% 167.36 GB Canada 179,6304.81% 96.31 GB Russian Federation 252,7386.77% 94.28 GB United Kingdom 177,3284.75% 81.55 GB Spain38,476 1.03% 70.49 GB Netherlands 118,9023.19% 61.55 GB Belgium 64,427 1.73% 54.16 GB Australia101,1732.71% 51.33 GB Brazil 71,174 1.91% 31.01 GB Czech Republic 48,514 1.30% 29.60 GB Sweden 45,446 1.22% 23.12 GB Switzerland 41,804 1.12% 21.85 GB South Africa 42,508 1.14% 17.94 GB Poland 46,049 1.23% 17.12 GB China17,841 0.48% 16.45 GB Israel 84,443 2.26% 14.78 GB Norway 26,171 0.70% 14.49 GB Japan14,013 0.38% 13.73 GB Reunion 19,144 0.51% 11.21 GB India19,751 0.53% 11.11 GB Denmark 30,390 0.81% 10.24 GB Belarus 25,943 0.69% 9.43GB Italy25,359 0.68% 8.56GB Ecuador 13,321 0.36% 8.41GB Ukraine 68,807 1.84% 7.91GB Bolivia, Plurinational State of 8,932 0.24% 6.51GB Hungary 21,374 0.57% 5.99GB Romania 13,187 0.35% 5.65GB Mexico 7,299 0.20% 4.25GB Ireland 7,239 0.19% 4.05GB Greece 7,946 0.21% 3.98GB Iran, Islamic Republic of7,730 0.21% 3.84GB Slovenia 19,901 0.53% 3.62GB Argentina8,687 0.23% 3.57GB Finland 5,105 0.14% 3.51GB Turkey