Re: CDN Test Results - Should We Continue Using a CDN?

2019-03-14 Thread Leo Famulari
On Sun, Mar 10, 2019 at 08:47:59PM -0700, Chris Marusich wrote:
> In addition, CloudFront reports that traffic came from the following
> locations (sorted by bytes transferred):
> 
> Location Request Count  Request %  Bytes
> -
[...]
> China17,841 0.48%  16.45   GB

Looks like someone was benefitting from the CDN in China:

https://lists.gnu.org/archive/html/guix-devel/2019-03/msg00222.html

I vote that we continue using it. During the Guix Days, the availability
of substitutes was identified as a frequent problem with Guix. Anything
we can do to increase availability and download speeds is a good thing
for the project.


signature.asc
Description: PGP signature


Re: CDN Test Results - Should We Continue Using a CDN?

2019-03-12 Thread Maxim Cournoyer
Hello Ludovic!

Ludovic Courtès  writes:

> Hi Maxim,
>
> Maxim Cournoyer  skribis:
>
>> Pardon me for asking, but how does using a CDN frees up resources?
>> Aren't the usual infrastructure preserved (e.g., ci.guix.info)? It
>> seems it'd be an extra layer to maintain?
>
> One of the motivations for this is that berlin.guixsd.org
> aka. ci.guix.info is a single machine, the head of our main build farm.
> If that machine goes down, we have no substitutes.  Having a cache like
> a CDN provides some redundancy: if the build farm goes down, we’ll at
> least still have cached substitutes, which leaves us time to fix the
> build farm.

I see. I understand that having the service continue running smoothly
while fixing ci.guix.info must be a good stress reliever.

[...]

>> I'd rather see this (even modest) amount put into the hands of a
>> motivated hacker to work on a distributed solution instead of
>> encouraging a company which do not share our free software ideals.
>
> As discussed before, I definitely sympathize with this.  Heck, if
> someone had told me I’d argue in favor of a CDN after all this time
> spent filling in CloudFare CAPTCHAs just because CloudFare decided that
> user privacy doesn’t matter and that Tor users should be penalized, I’d
> have laughed.  ;-)
>
> So it’s definitely not an easy decision.  Nevertheless, we have to
> acknowledge the fact that our current substitute delivery infrastructure
> is fragile.  If people volunteer to maintain a set of mirrors with some
> load balancing, that’s great, I’m all for it.  But for now, we don’t
> have that at all, hence the CDN.

Right. I understand better the motivation behind the CDN now, thank you
for taking the time to explain. Resiliency is indeed welcome and maybe
even necessary until better things come.

Maxim



Re: CDN Test Results - Should We Continue Using a CDN?

2019-03-12 Thread Ludovic Courtès
Hi Maxim,

Maxim Cournoyer  skribis:

> Pardon me for asking, but how does using a CDN frees up resources?
> Aren't the usual infrastructure preserved (e.g., ci.guix.info)? It
> seems it'd be an extra layer to maintain?

One of the motivations for this is that berlin.guixsd.org
aka. ci.guix.info is a single machine, the head of our main build farm.
If that machine goes down, we have no substitutes.  Having a cache like
a CDN provides some redundancy: if the build farm goes down, we’ll at
least still have cached substitutes, which leaves us time to fix the
build farm.

We can have a cache that’s not a CDN, like we did with
mirror.hydra.gnu.org, which runs an nginx caching proxy for
hydra.gnu.org.  However, that’s another machine to take care of (that’s
not much work in practice, but still, we must be able to quickly respond
to outages), and another single point of failure.

> The heaviest bandwith usage appear to originate from areas already well
> served by the current infrastructure (mirror.hydra.gnu.org -> North
> America, ci.guix.info -> Europe), so I'm not sure spending resources on
> a CDN is worthwhile in this context.

I think the good bandwidth is the second motivation for the CDN, but
it’s true that it still benefits the same groups of people; in
particular we know that Cloudfront is unavailable in China.

Nevertheless the extra performance is welcome IMO.  I think substitute
delivery plays an important role in the user experience so if we can
improve it, the better.

> I'd rather see this (even modest) amount put into the hands of a
> motivated hacker to work on a distributed solution instead of
> encouraging a company which do not share our free software ideals.

As discussed before, I definitely sympathize with this.  Heck, if
someone had told me I’d argue in favor of a CDN after all this time
spent filling in CloudFare CAPTCHAs just because CloudFare decided that
user privacy doesn’t matter and that Tor users should be penalized, I’d
have laughed.  ;-)

So it’s definitely not an easy decision.  Nevertheless, we have to
acknowledge the fact that our current substitute delivery infrastructure
is fragile.  If people volunteer to maintain a set of mirrors with some
load balancing, that’s great, I’m all for it.  But for now, we don’t
have that at all, hence the CDN.

Longer term, I do hope for IPFS to become our main delivery mechanism.
I’ve posted a proof-of-concept that I think should allow us to get
started, play with the idea, and find out how that works in practice.

Thanks,
Ludo’.



Re: CDN Test Results - Should We Continue Using a CDN?

2019-03-11 Thread Chris Marusich
Hi Maxim and others,

Maxim Cournoyer  writes:

> Chris Marusich  writes:
>
>> [...]  Starting on February 23rd, 2019 we conducted a test using
>> Amazon CloudFront.  [...] The test concluded on March 23rd [...].
>
> I'm I living in the past, or did you mean another date than March 23rd?
> :-)

No, you're right: I mixed up my months.  The test actually began on
January 23rd, 2019, and concluded on February 23rd (31 days total).

By the way, I've double checked the other statistics.  They're all
accurate except for the test duration, which was actually 31 days.  I
just mixed up the months in my head.  Sorry for the confusion!

-- 
Chris


signature.asc
Description: PGP signature


Re: CDN Test Results - Should We Continue Using a CDN?

2019-03-11 Thread Maxim Cournoyer
Hello Chris!

Chris Marusich  writes:

> Hi Guix!
>
> Recently, the Guix project experimented with using a CDN to improve
> substitute availability and performance.  This email summarizes the
> results of the test for your review.  I also hope this email will start
> a discussion about whether or not we should continue to use a CDN.
>
> First, I'll summarize what we did.  Starting on February 23rd, 2019 we
> conducted a test using Amazon CloudFront.  We configured ci.guix.info so
> that all requests for substitutes via that domain name would go through
> an Amazon CloudFront distribution that we set up for this purpose.  The
> test concluded on March 23rd, and the CDN is not currently being used.

I'm I living in the past, or did you mean another date than March 23rd?
:-)

> Amazon CloudFront provides us with billing information and aggregate
> usage statistics.  Here's the information for the duration of the test:
>
> Duration: 28 days (February 23rd - March 23rd)
> Expense: 156.88 US Dollars
> Requests received: 3,732,919
> Average request size: 490 KB
> Bytes transferred: 1,744.5724 GB
> Bytes from misses: 684.3992 GB
> Hits: 2.14 M (57.44%)
> Misses: 0.99 M (26.41%)
> Errors: 602.91 K (16.15%)
> 2xx: 2,983.24 K (79.92%)
> 3xx: 146.753 K (3.93%)
> 4xx: 593.159 K (15.89%)
> 5xx: 9.471 K (0.25%)
>

[...]

> Location Request Count  Request %  Bytes
> -
> United States933,44825.01% 562.52  GB
> Germany  687,54818.42% 174.53  GB
> France   341,5739.15%  167.36  GB
> Canada   179,6304.81%  96.31   GB

[...]

> Since the test has concluded, we are not currently using a CDN.  Going
> forward, we need to decide if we want to continue to use a CDN.  Did you
> notice an improvement in download speed or substitute availability
> during the test period?  Do you have metrics of your own that you can
> share with us?  If so, please share the information so we can understand
> whether it's worth continuing to pay for a CDN.

I haven't noticed a big difference on ci.guix.info; but then my WiFi
link seems to saturate around 1 MiB or so at home, so I'm not a very
demanding user ;-). Things felt as zippy as usual.

> One of the reasons why we wanted to use a CDN in the first place was to
> free up resources so that the community could spend more time working on
> better solutions.

Pardon me for asking, but how does using a CDN frees up resources?
Aren't the usual infrastructure preserved (e.g., ci.guix.info)? It
seems it'd be an extra layer to maintain?

The heaviest bandwith usage appear to originate from areas already well
served by the current infrastructure (mirror.hydra.gnu.org -> North
America, ci.guix.info -> Europe), so I'm not sure spending resources on
a CDN is worthwhile in this context.

I'd rather see this (even modest) amount put into the hands of a
motivated hacker to work on a distributed solution instead of
encouraging a company which do not share our free software ideals.

I'm hoping this doesn't come across as too negative! Thanks for sharing
this interesting information with us.

Maxim



Re: CDN Test Results - Should We Continue Using a CDN?

2019-03-11 Thread mikadoZero
Thank you for correcting my false assumptions and sharing that link.

Ricardo Wurmus writes:

> mikadoZero  writes:
>
>> In "14.4.1 Software Freedom" of the Guix manual it says that Guix is free
>> software and follows the free software distribution guidelines.
>>
>> Is using a proprietary non free CDN as a core part of Guix's
>> infrastructure in conflict with Guix's software freedom?
>
> Two things:
>
> 1) It is not a core part of Guix’s infrastructure.  People who want to
> bypass the CDN can do so by fetching substitutes from berlin.guixsd.org
> instead of ci.guix.info.  People can also opt out of getting substitutes
> all together or choose to get them from some other build farm.  (The
> build farm is little more than another Guix user.)
>
> 2) “proprietary” / “non-free” terminology does not apply to services.
> See also
> https://www.gnu.org/philosophy/network-services-arent-free-or-nonfree.html
>
> This is a case of “Service as a Hardware Substitute” where we pay to use
> hardware that we do not physically control to substitute for having to
> own and maintain hardware at a large number of physical locations in the
> world.
>
>> Using a proprietary CDN has the potential for an unplanned increase in
>> workload.  This is because of the combination of vendor lock in and
>> product line discontinuation.  Which could create unplanned rework of
>> setting up a CDN elsewhere.  This hinders Guix's resource planning by
>> introducing the potential for surprise rework.
>
> There is no vendor lock in.  We can drop and have dropped the use of a
> CDN without service interruption.  If the CDN service were to be
> discontinued we would simply revert to not offering package distribution
> via CDN.




Re: CDN Test Results - Should We Continue Using a CDN?

2019-03-11 Thread Ricardo Wurmus


mikadoZero  writes:

> In "14.4.1 Software Freedom" of the Guix manual it says that Guix is free
> software and follows the free software distribution guidelines.
>
> Is using a proprietary non free CDN as a core part of Guix's
> infrastructure in conflict with Guix's software freedom?

Two things:

1) It is not a core part of Guix’s infrastructure.  People who want to
bypass the CDN can do so by fetching substitutes from berlin.guixsd.org
instead of ci.guix.info.  People can also opt out of getting substitutes
all together or choose to get them from some other build farm.  (The
build farm is little more than another Guix user.)

2) “proprietary” / “non-free” terminology does not apply to services.
See also
https://www.gnu.org/philosophy/network-services-arent-free-or-nonfree.html

This is a case of “Service as a Hardware Substitute” where we pay to use
hardware that we do not physically control to substitute for having to
own and maintain hardware at a large number of physical locations in the
world.

> Using a proprietary CDN has the potential for an unplanned increase in
> workload.  This is because of the combination of vendor lock in and
> product line discontinuation.  Which could create unplanned rework of
> setting up a CDN elsewhere.  This hinders Guix's resource planning by
> introducing the potential for surprise rework.

There is no vendor lock in.  We can drop and have dropped the use of a
CDN without service interruption.  If the CDN service were to be
discontinued we would simply revert to not offering package distribution
via CDN.

--
Ricardo




Re: CDN Test Results - Should We Continue Using a CDN?

2019-03-11 Thread mikadoZero


Chris Marusich writes:

> Since the test has concluded, we are not currently using a CDN.  Going
> forward, we need to decide if we want to continue to use a CDN.

In "14.4.1 Software Freedom" of the Guix manual it says that Guix is free
software and follows the free software distribution guidelines.

Is using a proprietary non free CDN as a core part of Guix's
infrastructure in conflict with Guix's software freedom?

Using a proprietary CDN has the potential for an unplanned increase in
workload.  This is because of the combination of vendor lock in and 
product line discontinuation.  Which could create unplanned rework of
setting up a CDN elsewhere.  This hinders Guix's resource planning by
introducing the potential for surprise rework.

Are there any free software content delivery networks?

> One of the reasons why we wanted to use a CDN in the first place was to
> free up resources so that the community could spend more time working on
> better solutions.  For example, some people have expressed an interest
> in a distributed or peer-to-peer substitute mechanism using IPFS or
> GNUnet.  In fact, Ludo paved the way for this by submitting patches to
> distribute substitutes over IPFS:
>
> https://issues.guix.info/issue/33899
>
> However, it seems his work hasn't succeeded in exciting people enough to
> carry the momentum forward.  We need more people who are interested in
> this and can work on it!  Otherwise, it may never become a reality.  So
> if you care about distributed or peer-to-peer substitutes, please help!

This is interesting.  Peer-to-peer substitutes using free software is
well aligned with Guix as a free software project.  I would want to use
this method if it was available.

Has there been any progress on this since the end of that thread?

Any guesses about how difficult this may be to complete and how much
work might be required?



CDN Test Results - Should We Continue Using a CDN?

2019-03-10 Thread Chris Marusich
Hi Guix!

Recently, the Guix project experimented with using a CDN to improve
substitute availability and performance.  This email summarizes the
results of the test for your review.  I also hope this email will start
a discussion about whether or not we should continue to use a CDN.

First, I'll summarize what we did.  Starting on February 23rd, 2019 we
conducted a test using Amazon CloudFront.  We configured ci.guix.info so
that all requests for substitutes via that domain name would go through
an Amazon CloudFront distribution that we set up for this purpose.  The
test concluded on March 23rd, and the CDN is not currently being used.

Amazon CloudFront provides us with billing information and aggregate
usage statistics.  Here's the information for the duration of the test:

Duration: 28 days (February 23rd - March 23rd)
Expense: 156.88 US Dollars
Requests received: 3,732,919
Average request size: 490 KB
Bytes transferred: 1,744.5724 GB
Bytes from misses: 684.3992 GB
Hits: 2.14 M (57.44%)
Misses: 0.99 M (26.41%)
Errors: 602.91 K (16.15%)
2xx: 2,983.24 K (79.92%)
3xx: 146.753 K (3.93%)
4xx: 593.159 K (15.89%)
5xx: 9.471 K (0.25%)

Usage was fairly constant throughout the test.  This means that the
daily statistics for requests received, bytes transferred, HTTP response
distribution, and hit rate neither grew nor fell significantly.

The average request size (490 KB) may seem small, since usually one
might expect substitutes to be large binaries.  However, the size is
reasonable because narinfo files are small, and error responses (e.g.,
404) are probably being included in the average.

The cache hit rate (57.44%) may also seem low, but it's also reasonable
because it's aggregated over all of CloudFront's points of presence
worldwide.  If one request in Seattle is a cache hit, and one request in
London is a cache miss, then that results in an overall cache hit rate
of 50%.  Different points of presence don't generally share caches.

According to Amazon CloudFront, 11.75% of requests received came from
"Bot/Crawler", which CloudFront defines as "primarily requests from
search engines that are indexing your content".

In addition, CloudFront reports that traffic came from the following
locations (sorted by bytes transferred):

Location Request Count  Request %  Bytes
-
United States933,44825.01% 562.52  GB
Germany  687,54818.42% 174.53  GB
France   341,5739.15%  167.36  GB
Canada   179,6304.81%  96.31   GB
Russian Federation   252,7386.77%  94.28   GB
United Kingdom   177,3284.75%  81.55   GB
Spain38,476 1.03%  70.49   GB
Netherlands  118,9023.19%  61.55   GB
Belgium  64,427 1.73%  54.16   GB
Australia101,1732.71%  51.33   GB
Brazil   71,174 1.91%  31.01   GB
Czech Republic   48,514 1.30%  29.60   GB
Sweden   45,446 1.22%  23.12   GB
Switzerland  41,804 1.12%  21.85   GB
South Africa 42,508 1.14%  17.94   GB
Poland   46,049 1.23%  17.12   GB
China17,841 0.48%  16.45   GB
Israel   84,443 2.26%  14.78   GB
Norway   26,171 0.70%  14.49   GB
Japan14,013 0.38%  13.73   GB
Reunion  19,144 0.51%  11.21   GB
India19,751 0.53%  11.11   GB
Denmark  30,390 0.81%  10.24   GB
Belarus  25,943 0.69%  9.43GB
Italy25,359 0.68%  8.56GB
Ecuador  13,321 0.36%  8.41GB
Ukraine  68,807 1.84%  7.91GB
Bolivia, Plurinational State of  8,932  0.24%  6.51GB
Hungary  21,374 0.57%  5.99GB
Romania  13,187 0.35%  5.65GB
Mexico   7,299  0.20%  4.25GB
Ireland  7,239  0.19%  4.05GB
Greece   7,946  0.21%  3.98GB
Iran, Islamic Republic of7,730  0.21%  3.84GB
Slovenia 19,901 0.53%  3.62GB
Argentina8,687  0.23%  3.57GB
Finland  5,105  0.14%  3.51GB
Turkey