Re: Using a CDN or some other mirror?

2018-12-14 Thread Ludovic Courtès
Hello,

Hartmut Goebel  skribis:

> Am 09.12.2018 um 14:58 schrieb Ludovic Courtès:
>>> I could try and ask a few organizations in my area, but I would need
>>> figures for this.
>> What would you need to know?  ‘guix weather’ can provide info about
>> storage size.
>
> I don't know yet, which info the admins need for a decision. FMPOV I'd
> says: Disk-space and traffic to be expected.
>
> `guix weather` only provides the disk-space, but even this is not
> obvious for me:
>
>   13912.1 MiB of nars (compressed)
>   41176.6 MiB on disk (uncompressed)
>
> From reading the manual, I assume 13.9 GB are required on the server
> (which is quite a lot IMHO). Is this correct?

If you’re running a caching proxy, you’ll need 13G.  However note that
it’s only for one architecture and one revision of Guix.  The total
space needed is obviously a function of time (number of Guix revisions
served) and number of architectures.

You could choose an expiration time in your caching proxy that satisfies
your disk space constraints, though.

Our machines that run Cuirass + ‘guix publish --cache’ need roughly
41+13G since they contain both /gnu/store and /var/cache/guix/publish.

HTH!

Ludo’.



Re: Using a CDN or some other mirror?

2018-12-14 Thread Chris Marusich
Hi Giovanni,

Thank you for sharing some data with us!

Giovanni Biscuolo  writes:

> measures from my office network: Italy, 20Km north Milan, FTTC
> (90Mbit/sec measured bandwidth)
>
> measure from Berlin:
>
> url_effective: 
> https://berlin.guixsd.org/nar/gzip/1bq783rbkzv9z9zdhivbvfzhsz2s5yac-linux-libre-4.19
>   http_code: 200
>   num_connects: 1
>   num_redirects: 0
>   remote_ip: 141.80.181.40
>   remote_port: 443
>   size_download: 69899433 B
>   speed_download: 9051388,000 B/s

That's about 72 megabits per second.

>   time_appconnect: 0,229271 s
>   time_connect: 0,110443 s
>   time_namelookup: 0,061754 s

Latency was about 49 milliseconds (after the name lookup).

>   [...]
> latency measured with mtr:
>
> HOST: roquette  Loss%   Snt   Last   Avg  Best  Wrst StDev
>   1.|-- 10.38.2.1  0.0%100.3   0.4   0.3   0.4   0.0
>
> [...]
>
>  18.|-- 141.80.181.40  0.0%10  112.5  77.1  55.6 201.7  47.1
>
>
>
> from your mirror (third download):
>
> url_effective: 
> https://berlin-mirror.marusich.info/nar/gzip/1bq783rbkzv9z9zdhivbvfzhsz2s5yac-linux-libre-4.19
>   http_code: 200
>   num_connects: 1
>   num_redirects: 0
>   remote_ip: 54.230.102.61
>   remote_port: 443
>   size_download: 69899433 B
>   speed_download: 9702091,000 B/s

That's about 78 megabits per second, which is 8% more than 72.

>   time_appconnect: 0,172660 s
>   time_connect: 0,037833 s
>   time_namelookup: 0,003772 s

Latency was 34 milliseconds, which is 31% less than 49.

>   [...]
>
> latency measured with mtr:
>
> HOST: roquette   Loss%   Snt   Last   Avg  Best  Wrst StDev
>   1.|-- 10.38.2.1   0.0%100.4   0.4   0.4   0.4   0.0
>
> [...]
>
>  11.|-- ???100.0100.0   0.0   0.0   0.0   0.0
>  12.|-- ???100.0100.0   0.0   0.0   0.0   0.0
>  13.|-- ???100.0100.0   0.0   0.0   0.0   0.0
>  14.|-- ???100.0100.0   0.0   0.0   0.0   0.0
>  15.|-- 52.93.58.1900.0%10   36.1  34.6  32.9  37.1   1.2
>  16.|-- ???100.0100.0   0.0   0.0   0.0   0.0
>
> 100% loss?

Yes, mtr's output here is a bit surprising.

On my end, also, mtr reported similar "loss" for intermediate hops, but
in my case the final hop did not report any loss.  Deprioritization of
ICMP traffic is common in many networks, so tools like mtr and
traceroute will sometimes report surprisingly high latency or packet
loss even when the network is just fine.

The mechanism used by tools like mtr and traceroute is to repeatedly
send "probes" with monotonically increasing TTL values.  The measurement
(even when using TCP probes) relies on (1) intermediate hops correctly
returning an ICMP "time exceeded" message when the packet lands on that
hop and the TTL expires, and (2) the ICMP "time exceeded" message
getting successfully delivered back to the mtr process.

In any case, the "100% loss" metric is clearly inaccurate, since you
successfully downloaded the file at an impressive speed.  If a hop were
truly dropping 100% of the traffic, the download would have failed.  In
addition, the latency that mtr does report seems comparable to the
latency calculated from the measure_get output (which is not influenced
by the vagaries of ICMP deprioritization).

> from here it seems Berlin is as performant as CloudFront

Yes, it seems you are already well connected to the build farm!  But
still, when you used CloudFront, your throughput went up by 7%, and your
latency went down by 31%.  Even more importantly, when you downloaded
the file from CloudFront, it placed zero additional load on the build
farm because it was served from CloudFront's cache.

Again, thank you for sharing!  This is useful information.

-- 
Chris


signature.asc
Description: PGP signature


Re: Using a CDN or some other mirror?

2018-12-14 Thread Pierre Neidhardt
Talking about this, I recently discussed with Ludovic the idea of compressing
nars in Lzip instead of gzip.

http://lzip.nongnu.org/lzip.html (benchmark included)

I can work on some Lzip guile-bindings, it should be quite easy, then we could
save some 10-50% (!!!) disk usage.

-- 
Pierre Neidhardt
https://ambrevar.xyz/


signature.asc
Description: PGP signature


Re: Using a CDN or some other mirror?

2018-12-14 Thread Hartmut Goebel
Am 09.12.2018 um 14:58 schrieb Ludovic Courtès:
>> I could try and ask a few organizations in my area, but I would need
>> figures for this.
> What would you need to know?  ‘guix weather’ can provide info about
> storage size.

I don't know yet, which info the admins need for a decision. FMPOV I'd
says: Disk-space and traffic to be expected.

`guix weather` only provides the disk-space, but even this is not
obvious for me:

  13912.1 MiB of nars (compressed)
  41176.6 MiB on disk (uncompressed)

From reading the manual, I assume 13.9 GB are required on the server
(which is quite a lot IMHO). Is this correct?

-- 
+++hartmut

| Hartmut Goebel|   |
| hart...@goebel-consult.de | www.goebel-consult.de |



Re: Using a CDN or some other mirror?

2018-12-13 Thread Giovanni Biscuolo
Hi Chris,

thank you for your CDN testing environment!

Chris Marusich  writes:

[...]

> For experimentation, I've set up a CloudFront distribution at
> berlin-mirror.marusich.info that uses berlin.guixsd.org as its origin
> server.  Let's repeat these steps to measure the performance of the
> distribution from my machine's perspective (before I did this, I made
> sure the GET would result in a cache hit by downloading the substitute
> once before and verifying that the same remote IP address was used):

[...]

> It would be interesting to see what the performance is for others.

[...]

measures from my office network: Italy, 20Km north Milan, FTTC
(90Mbit/sec measured bandwidth)

measure from Berlin:

--8<---cut here---start->8---
url_effective: 
https://berlin.guixsd.org/nar/gzip/1bq783rbkzv9z9zdhivbvfzhsz2s5yac-linux-libre-4.19
  http_code: 200
  num_connects: 1
  num_redirects: 0
  remote_ip: 141.80.181.40
  remote_port: 443
  size_download: 69899433 B
  speed_download: 9051388,000 B/s
  time_appconnect: 0,229271 s
  time_connect: 0,110443 s
  time_namelookup: 0,061754 s
  time_pretransfer: 0,229328 s
  time_redirect: 0,00 s
  time_starttransfer: 0,326907 s
  time_total: 7,722509 s
--8<---cut here---end--->8---

latency measured with mtr:

--8<---cut here---start->8---
HOST: roquette  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.38.2.1  0.0%100.3   0.4   0.3   0.4   0.0

[...]

 18.|-- 141.80.181.40  0.0%10  112.5  77.1  55.6 201.7  47.1
--8<---cut here---end--->8---


from your mirror (third download):

--8<---cut here---start->8---
url_effective: 
https://berlin-mirror.marusich.info/nar/gzip/1bq783rbkzv9z9zdhivbvfzhsz2s5yac-linux-libre-4.19
  http_code: 200
  num_connects: 1
  num_redirects: 0
  remote_ip: 54.230.102.61
  remote_port: 443
  size_download: 69899433 B
  speed_download: 9702091,000 B/s
  time_appconnect: 0,172660 s
  time_connect: 0,037833 s
  time_namelookup: 0,003772 s
  time_pretransfer: 0,173263 s
  time_redirect: 0,00 s
  time_starttransfer: 0,212716 s
  time_total: 7,204574 s
--8<---cut here---end--->8---

latency measured with mtr:

--8<---cut here---start->8---
HOST: roquette   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.38.2.1   0.0%100.4   0.4   0.4   0.4   0.0

[...]

 11.|-- ???100.0100.0   0.0   0.0   0.0   0.0
 12.|-- ???100.0100.0   0.0   0.0   0.0   0.0
 13.|-- ???100.0100.0   0.0   0.0   0.0   0.0
 14.|-- ???100.0100.0   0.0   0.0   0.0   0.0
 15.|-- 52.93.58.1900.0%10   36.1  34.6  32.9  37.1   1.2
 16.|-- ???100.0100.0   0.0   0.0   0.0   0.0
--8<---cut here---end--->8---

100% loss?

from here it seems Berlin is as performant as CloudFront

HTH!
Gio

-- 
Giovanni Biscuolo

Xelera IT Infrastructures




signature.asc
Description: PGP signature


Re: Using a CDN or some other mirror?

2018-12-11 Thread Giovanni Biscuolo
Hi all,

my two cents...

(I can't still help with a public cache, I hope soon...)

Ludovic Courtès  writes:

[...]

>> TL;DR: A CDN is a centralized infrastructure, allowing to collect
>> information about valuable vulnerability information of almost all
>> Guix-users and -systems. This is might become a thread to freedom of
>> speech, human rights, democracy and economics. Guix should build on a
>> decentralized infrastructure.

I completely agree with you, decentralization is the solution

unfortunately the **only functioning** way is to avoid current Internet,
since it's broken (https://youbroketheinternet.org/); I see GuixSD as an
integral part of The Project Map™ https://youbroketheinternet.org/map

...but to fix the situation we need a substantial GNUnet(work) effect
and for that we _need_ GuixSD substitutes to be easily and quickly
downloaded (can we avoid this asking potential adopters to be patient or
to build?)

maybe we should divide this task in two steps:

1. distributed substitutes: caching servers hosted by a network of
friendly institutions and companies donated to GNU/GuixSD, with a
haproxy frontend for geolocated load-balancing [1]

2. decentralized substitutes: caching servers on IPFS or better (since
it allows complete anonimity) on GNUnet

> Heck it would be ironic to find myself arguing in favor of centralized
> commercial services.  So I won’t do that.  :-)

I see no problems with commercial services, _unfortunately_ nowadays
this *almost* always means centralized silos, usually exploited for
global surveillance (since Internet is broken)

[...]

> The operator of a substitute server (or caching proxy), in general,
> knows which IPs downloaded vulnerable software.  This is the main
> threat.

on Internet, and on IPFS? (sorry for the ignorance)

on GNUNet filesharing can be completely anonymous, but the performace is
degraded (so we need a large network effect here)

> This can be mitigated by talking to nearby mirrors and not just
> ci.guix.info, a feature we implemented a year ago (see
> ),
> or by using several substitute servers, or by not using (or not always
> using) substitutes.  Few distros have all these options.
>
> We might also be able to somehow balance requests between several CDNs
> or mirrors.

did someone explored an haproxy (with geolocation) solution?

is there a wip-haproxy attempt?

[...]

HTH
Giovanni


[1] in the next few weeks I'm going to test an haproxy instance with
geolocated ACLs following this directions
https:/www.haproxy.com/blog/use-geoip-database-within-haproxy/

-- 
Giovanni Biscuolo

Xelera IT Infrastructures


signature.asc
Description: PGP signature


Re: Using a CDN or some other mirror?

2018-12-09 Thread Ludovic Courtès
Hi Hartmut,

Hartmut Goebel  skribis:

> Am 09.12.2018 um 04:33 schrieb Chris Marusich:
>> Instead, we would be using a CDN as a performance optimization that is
>> transparent to a Guix user.  You seem unsettled by the idea of
>> entrusting any part of substitute delivery to a third party, but
>> concretely what risks do you foresee?
>
> I have serious privacy concerns.
>
> TL;DR: A CDN is a centralized infrastructure, allowing to collect
> information about valuable vulnerability information of almost all
> Guix-users and -systems. This is might become a thread to freedom of
> speech, human rights, democracy and economics. Guix should build on a
> decentralized infrastructure.

Heck it would be ironic to find myself arguing in favor of centralized
commercial services.  So I won’t do that.  :-)

Clearly, I do understand the concerns you list.  As a maintainer, I’m
looking for solutions that can address real problems (availability of
substitutes and bandwidth) while not being a threat to our user’s
privacy and security.

The operator of a substitute server (or caching proxy), in general,
knows which IPs downloaded vulnerable software.  This is the main
threat.

This can be mitigated by talking to nearby mirrors and not just
ci.guix.info, a feature we implemented a year ago (see
),
or by using several substitute servers, or by not using (or not always
using) substitutes.  Few distros have all these options.

We might also be able to somehow balance requests between several CDNs
or mirrors.

But again, medium- to long-term, the goal is to move towards IPFS or
GNUnet/Bittorrent.  IPFS is attractive because it would probably require
no modifications to ‘guix substitutes’ and only minor changes to ‘guix
publish’ since the IPFS daemon has an HTTP interface.

>> Regarding your suggestion to ask universities to host mirrors (really,
>> caching proxies), I think it could be a good idea.  As Leo mentioned,
>> the configuration to set up an NGINX caching proxy of Hydra (or berlin)
>> is freely available in maintenance.git.  Do you think we could convince
>> some universities to host caching proxies that just run an NGINX web
>> server using those configurations?
>
> The difference is: For a traditional "ftp"-mirror, an organization just
> needs to add another source to its existing configuration and administer
> to the save way as all other mirrors. Whereas for a caching proxy they
> need to change the setup of the web-server and learn how to administer
> the cache. This difference might make it difficult to convince
> organizations to mirror.
>
> I could try and ask a few organizations in my area, but I would need
> figures for this.

What would you need to know?  ‘guix weather’ can provide info about
storage size.

Thanks,
Ludo’.



Re: Using a CDN or some other mirror?

2018-12-09 Thread Hartmut Goebel
Am 09.12.2018 um 04:33 schrieb Chris Marusich:
> Instead, we would be using a CDN as a performance optimization that is
> transparent to a Guix user.  You seem unsettled by the idea of
> entrusting any part of substitute delivery to a third party, but
> concretely what risks do you foresee?

I have serious privacy concerns.

TL;DR: A CDN is a centralized infrastructure, allowing to collect
information about valuable vulnerability information of almost all
Guix-users and -systems. This is might become a thread to freedom of
speech, human rights, democracy and economics. Guix should build on a
decentralized infrastructure.

A distribution provider gets a notion which system is running which
software in which version. In case of guix, the provider even gets the
exact version of the software and all its dependencies. Combining this
with the rise of IPv6, which per default uses the MAC address as part of
the IP address, actually allows identifying a single system.

This information is extremely valuable for all kinds of attackers as it
eases attacking a system a lot. This becomes a thread to

  * to opposition members, dissidents and human rights activists as the
intelligent agencies can target these persons much more precisely,
  * to companies all over the world as many countries do industrial
espionage.

This becomes even worst when using a CDN, since the CDN is a centralized
system: A single CDN provider gains knowledge for almost all systems all
over the world. Which means: this valuable vulnerability information is
collected at a single place. Intelligence agencies might be keen on
getting access to this information and a centralized system makes it
easy for them. And there is evidence they actually collect this
information [*].

This gets even worse when the CDN belongs to one of these companies
compiling personal profiles, like Google, Facebook or Tencent. Amazon
belongs to this group.

I have the strong opinion that Guix should build on a decentralized
infrastructure to support keeping the freedom of speech, democracy and
human rights.

[*] Actually it is known the US-American intelligence agencies have
equipment placed at Verizon to collect all kind of data [1]. One can
reason the same is true for other big providers in the US. The USA has
the FISA act AFAIU enforcing US companies to collaborate in industrial
espionage. In Germany it is known that the BND is extracting high-volume
data at the central internet exchange (DE-CIX) [2]. One can reason such
also happens in other countries, esp. members of the five-eyes, France,
Russia, China, Israel, Saudi Arabia, Iran, Irak, etc.

> Regarding your suggestion to ask universities to host mirrors (really,
> caching proxies), I think it could be a good idea.  As Leo mentioned,
> the configuration to set up an NGINX caching proxy of Hydra (or berlin)
> is freely available in maintenance.git.  Do you think we could convince
> some universities to host caching proxies that just run an NGINX web
> server using those configurations?

The difference is: For a traditional "ftp"-mirror, an organization just
needs to add another source to its existing configuration and administer
to the save way as all other mirrors. Whereas for a caching proxy they
need to change the setup of the web-server and learn how to administer
the cache. This difference might make it difficult to convince
organizations to mirror.

I could try and ask a few organizations in my area, but I would need
figures for this.


[1] https://www.bbc.com/news/world-us-canada-23123964 or search the
internet for e.g. "cia verizon espionage"
[2]
https://www.heise.de/newsticker/meldung/Gerichtsurteil-BND-darf-weiterhin-Internet-Knoten-De-CIX-anzapfen-4061494.html
[3] https://en.wikipedia.org/wiki/Foreign_Intelligence_Surveillance_Act
[4]

-- 
+++hartmut

| Hartmut Goebel|   |
| hart...@goebel-consult.de | www.goebel-consult.de |



Re: Using a CDN or some other mirror?

2018-12-09 Thread Hartmut Goebel
Hi Ludo,

Am 07.12.2018 um 15:05 schrieb Ludovic Courtès:
> However, Guix is very different from these: on the build farm, we build
> several new store items per minute, and we aim to distribute them to our
> users. 
> […]  
> For Guix I think a caching proxy […] is a better 
> fit:

In this constellation an old-fashined "ftp" mirror does not work (since
we would need to convince the admins to change the setup, not just to
add another source) and the idea is void.

OTOH, maybe it's worth rethinking the premises. if some region has low
bandwidth if would still be better if one could fetch 90% of the
substitutes from nearby. Thus some daily or twice-a-day rsync would
help. It would also help if some day GuixSD gets some kind of "stable"
(and some "testing") branch and users might subscribe to this, which
would not have so many changes a day.

I'm now finishing this discussion, as it is up to you (the core team) to
decide. I just wanted to share the idea and arguments.

-- 
Regards
Hartmut Goebel

| Hartmut Goebel  | h.goe...@crazy-compilers.com   |
| www.crazy-compilers.com | compilers which you thought are impossible |




Re: Using a CDN or some other mirror?

2018-12-08 Thread Chris Marusich
Hi everyone,

l...@gnu.org (Ludovic Courtès) writes:

> Ludovic Courtès  skribis:
>
> [...] I’m thinking about using a similar setup, but hosting the mirror
> on some Big Corp CDN or similar.  Chris Marusich came up with a setup
> along these lines a while back:
>
>   https://lists.gnu.org/archive/html/guix-devel/2016-03/msg00312.html
>
> Compared to Chris’s setup, given that ‘guix publish’ now provides
> ‘Cache-Control’ headers (that was not the case back then, see
> ),
> caching in the proxy should Just Work.
>
> I would like us to set up such a mirror for berlin and then have
> ci.guix.info point to that.  The project should be able to pay the
> hosting fees.
>
> Thoughts?

Regarding DNS, it would be nice if we could use an official GNU
subdomain.  If we can't use a GNU subdomain, we should at least make
sure we have some kind of DNS auto-renewal set up so that nobody can
poach our domain names.  And the operators should take appropriate
precautions when sharing any credentials used for managing it all.

Regarding CDNs, I definitely think it's worth a try!  Even Debian is
using CloudFront (cloudfront.debian.net).  In fact, email correspondence
suggests that as of 2013, Amazon may even have been paying for it!

https://lists.debian.org/debian-cloud/2013/05/msg00071.html

I wonder if Amazon would be willing to pay for our CloudFront
distribution if we asked them nicely?

In any case, before deciding to use Amazon CloudFront for ci.guix.info,
it would be prudent to estimate the cost.  CloudFront, like most Amazon
AWS services, is a "pay for what you use" model.  The pricing is here:

https://aws.amazon.com/cloudfront/pricing

To accurately estimate the cost, we need to know how many requests we
expect to receive, and how many bytes we expect to transfer out, during
a single month.  Do we have information like this for berlin today?

Although I don't doubt that a CDN will perform better than what we have
now, I do think it would be good to measure the performance so that we
know for sure the money spent is actually providing a benefit.  It would
be nice to have some data before and after to measure how availability
and performance have changed.  Apart from anecdotes, what data do we
have to determine whether performance has improved after introducing a
CDN?  For example, the following information could be useful:

  * Network load on the origin server(s)
  * Clients' latency to (the addresses pointed to by) ci.guix.info
  * Clients' throughput while downloading substitutes from ci.guix.info

We don't log or collect client metrics, and that's fine.  It could be
useful to add code to Guix to measure things like this when the user
asks to do so, but perhaps it isn't necessary.  It may be good enough if
people just volunteer to manually gather some information and share it.
For example, you can define a shell function like this:

--8<---cut here---start->8---
measure_get () {
curl -L \
 -o /dev/null \
 -w "url_effective: %{url_effective}\\n\
http_code: %{http_code}\\n\
num_connects: %{num_connects}\\n\
num_redirects: %{num_redirects}\\n\
remote_ip: %{remote_ip}\\n\
remote_port: %{remote_port}\\n\
size_download: %{size_download} B\\n\
speed_download: %{speed_download} B/s\\n\
time_appconnect: %{time_appconnect} s\\n\
time_connect: %{time_connect} s\\n\
time_namelookup: %{time_namelookup} s\\n\
time_pretransfer: %{time_pretransfer} s\\n\
time_redirect: %{time_redirect} s\\n\
time_starttransfer: %{time_starttransfer} s\\n\
time_total: %{time_total} s\\n" \
"$1"
}
--8<---cut here---end--->8---

See "man curl" for the meaning of each metric.

You can then use this function to measure a substitute download.  Here's
an example in which I download a large substitute (linux-libre) from one
of my machines in Seattle:

--8<---cut here---start->8---
$ measure_get 
https://berlin.guixsd.org/nar/gzip/1bq783rbkzv9z9zdhivbvfzhsz2s5yac-linux-libre-4.19
 2>/dev/null
url_effective: 
https://berlin.guixsd.org/nar/gzip/1bq783rbkzv9z9zdhivbvfzhsz2s5yac-linux-libre-4.19
http_code: 200
num_connects: 1
num_redirects: 0
remote_ip: 141.80.181.40
remote_port: 443
size_download: 69899433 B
speed_download: 4945831.000 B/s
time_appconnect: 0.885277 s
time_connect: 0.459667 s
time_namelookup: 0.254210 s
time_pretransfer: 0.885478 s
time_redirect: 0.00 s
time_starttransfer: 1.273994 s
time_total: 14.133584 s
$ 
--8<---cut here---end--->8---

Here, it took 0.459667 - 0.254210 = 0.205457 seconds (about 205 ms) to
establish the TCP connection after the DNS lookup.  The average
throughput was 1924285 bytes per second (about 40 megabits per second,
where 1 megabit = 10^6 bits).  It seems my connection to berlin is
already pretty good!

We can get more information about latency by using a tool like mtr:

--8<---cut h

Re: Using a CDN or some other mirror?

2018-12-07 Thread Ludovic Courtès
Hi Hartmut,

Hartmut Goebel  skribis:

> One could get in touch with the administrators of some of the mirrors
> used by other distributions. [1] has a list of Mirrors for Fedora,
> including location and bandwidth. [2] is a list of mirrors used by the
> community driven distribution Mageia, additionally including the
> "upstream" - which might help prioritizing.

It’s tempting to follow the lead of traditional distros when it comes to
mirroring.

However, Guix is very different from these: on the build farm, we build
several new store items per minute, and we aim to distribute them to our
users.  This is quite different from distros that upload .deb, .rpm,
etc. files much less frequently.  Their mirroring process is thus fairly
similar to good ol’ synchronization over rsync and the like.

For Guix I think a caching proxy and a CDN kind of model is a better
fit: it’s good fit for the publication rate of store items, and a good
fit for ‘guix publish’ which uses HTTP.  Furthermore, the build farm
doesn’t expose anything like rsync, which makes it hard to imagine a
“traditional” mirroring process.

WDYT?

Thanks,
Ludo’.



Re: Using a CDN or some other mirror?

2018-12-05 Thread Hartmut Goebel
Am 04.12.2018 um 15:05 schrieb Ludovic Courtès:
> We shouldn’t take it for granted that public institutes will
> happily host our stuff and donate hardware

I would not expect them nor take for granted that some organization will
host and even donate a build-farm. This is indeed very generous from MDC,

As you wrote, we are in need of storage. What I'm suggesting is:

Many, many organizations support open source projects by donating
disk-space and bandwidth in form of a mirror. Supporters are internet
providers, universities, publicly founded organizations  and companies.
In former time this have been ftp-mirrors (later with an
http-interface), in the meanwhile the larger ones support rsync, too.
And many Software projects are using these kind of servers for mirroring
their software. The only restriction is that these mirrors are serving
static files only. (I assume you all know.)

> In the meantime, we need redundant storage, high bandwidth, and high
> availability.  If you know of non-profit organizations that can provide
> such services, please let us know; if not, we’ll resort to a commercial
> service.  The bottom line is: we cannot reasonably pretend to offer such
> a service ourselves.

One could get in touch with the administrators of some of the mirrors
used by other distributions. [1] has a list of Mirrors for Fedora,
including location and bandwidth. [2] is a list of mirrors used by the
community driven distribution Mageia, additionally including the
"upstream" - which might help prioritizing.

When approaching some of these server admins, we should pass them some
relevant information like expected storage demand, expected transfer
traffic from upstream, how often updates are published, how long to hold
copies, can we provide a rsync server access, etc.

I can approach two or three organizations I know of (but don't have any
contact) if I have some information to provide. A cover letter would
help a lot. For getting in tough with universities somebody working at
an university might have better chances :-)

[1] Fedora 28:
https://admin.fedoraproject.org/mirrormanager/mirrors/Fedora/28
[2] Mageia: https://mirrors.mageia.org/
[3] Debian "Secondary mirrors" in https://www.debian.org/mirror/list

-- 
+++hartmut

| Hartmut Goebel|   |
| hart...@goebel-consult.de | www.goebel-consult.de |





Re: Using a CDN or some other mirror?

2018-12-05 Thread ng0
Thompson, David transcribed 1.2K bytes:
> On Tue, Dec 4, 2018 at 4:15 PM  wrote:
> >
> > Hartmut Goebel transcribed 771 bytes:
> > > Am 03.12.2018 um 17:12 schrieb Ludovic Courtès:
> > > > Thus, I’m thinking about using a similar setup, but hosting the mirror
> > > > on some Big Corp CDN or similar.
> > >
> > > Isn't this a contradiction: Building a free infrastructure relaying on
> > > servers from some Big Corporation? Let allow the privacy concerns
> > > raising when delivering data via some Big Corporation.
> > >
> > > If delivering "packages" works via static data without requiring any
> > > additional service, we could ask universities to host Guix, too. IMHO
> > > this is a much preferred solution since this is a decentralized publish
> > > infrastructure already in place for many GNU/Linux distributions.
> >
> > Regardless of me agreeing with Hartmut here, I suggest https://wasabi.com/
> > as an S3 compatible storage (, which is not run by Amazon.).
> 
> But can Wasabi provide what CloudFront does?  CloudFront is distinct
> from S3, and can fetch and cache data from any origin. Additionally,
> it stores data on edge nodes across the globe and routes requests
> accordingly to maximize download speed.
> 
> - Dave
> 

Probably not, for some reason I thought you were discussing S3.
My bad.



Re: Using a CDN or some other mirror?

2018-12-04 Thread Leo Famulari
On Wed, Dec 05, 2018 at 10:32:02AM +0800, Meiyo Peng wrote:
> If at some point we need to setup traditional mirrors like other major
> Gnu/Linux distros, I can contact my friends in China to setup mirrors in
> several universities. I was a member of LUG@USTC, which provides the
> largest FLOSS mirror in China.

That would be cool, especially if it's hard to reach our servers from
within China.

The Nginx configuration for our mirrors is available in the Guix
"maintenance" repo:

https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror-locations.conf

... along with similar files 'berlin.conf' and 'bayfront.conf' for those
servers.

You can use those files to run your own mirror by changing at least
'server_name', 'ssl_certificate', and 'ssl_certificate_key'.


signature.asc
Description: PGP signature


Re: Using a CDN or some other mirror?

2018-12-04 Thread Meiyo Peng
Hi,

l...@gnu.org (Ludovic Courtès) writes:

> As you know, berlin.guixsd.org is hosted at the Max Delbrück Center in
> Berlin, a public research institute.  So in a way, we’re already doing
> that.  We shouldn’t take it for granted that public institutes will
> happily host our stuff and donate hardware: without Ricardo’s work and
> the generosity of the MDC, we wouldn’t have anything there.
>
> I understand the reluctance regarding “Big Corp” hosting, and I actually
> share it to some extent.  However, having put much thought into it (and
> also much sweat in build farm sysadmin…), I think the alternative is:
> commercial hosting, or peer-to-peer.
>
> Florian has been looking at the latter approach with IPFS, and perhaps
> we’ll be able to put it in production in a few months and be happy with
> it (I have good hopes given what Florian already demonstrated.)
>
> In the meantime, we need redundant storage, high bandwidth, and high
> availability.  If you know of non-profit organizations that can provide
> such services, please let us know; if not, we’ll resort to a commercial
> service.  The bottom line is: we cannot reasonably pretend to offer such
> a service ourselves.
>
> (Note that we’re just talking about substitute delivery—I wouldn’t want
> to *build* packages on one of these commercial hosting services.)
>
> I hope this clarifies my position.

When I started to try Guix several months ago, the network speed to
substitute servers from China is very slow (<100kB/s). I don't know what
has changed but recently the network speed is about 1MB/s. Thank you all
for the improments. Hopefully a CDN will make the network even better.
I am not against using a commercial service as long as we only use them
to distribute signed packages rather than building packages.

I like the idea of IPFS. We should try it. It would be great if it works
well.

If at some point we need to setup traditional mirrors like other major
Gnu/Linux distros, I can contact my friends in China to setup mirrors in
several universities. I was a member of LUG@USTC, which provides the
largest FLOSS mirror in China.

--
Meiyo Peng



Re: Using a CDN or some other mirror?

2018-12-04 Thread Thompson, David
On Tue, Dec 4, 2018 at 4:15 PM  wrote:
>
> Hartmut Goebel transcribed 771 bytes:
> > Am 03.12.2018 um 17:12 schrieb Ludovic Courtès:
> > > Thus, I’m thinking about using a similar setup, but hosting the mirror
> > > on some Big Corp CDN or similar.
> >
> > Isn't this a contradiction: Building a free infrastructure relaying on
> > servers from some Big Corporation? Let allow the privacy concerns
> > raising when delivering data via some Big Corporation.
> >
> > If delivering "packages" works via static data without requiring any
> > additional service, we could ask universities to host Guix, too. IMHO
> > this is a much preferred solution since this is a decentralized publish
> > infrastructure already in place for many GNU/Linux distributions.
>
> Regardless of me agreeing with Hartmut here, I suggest https://wasabi.com/
> as an S3 compatible storage (, which is not run by Amazon.).

But can Wasabi provide what CloudFront does?  CloudFront is distinct
from S3, and can fetch and cache data from any origin. Additionally,
it stores data on edge nodes across the globe and routes requests
accordingly to maximize download speed.

- Dave



Re: Using a CDN or some other mirror?

2018-12-04 Thread ng0
Hartmut Goebel transcribed 771 bytes:
> Am 03.12.2018 um 17:12 schrieb Ludovic Courtès:
> > Thus, I’m thinking about using a similar setup, but hosting the mirror
> > on some Big Corp CDN or similar.
> 
> Isn't this a contradiction: Building a free infrastructure relaying on
> servers from some Big Corporation? Let allow the privacy concerns
> raising when delivering data via some Big Corporation.
> 
> If delivering "packages" works via static data without requiring any
> additional service, we could ask universities to host Guix, too. IMHO
> this is a much preferred solution since this is a decentralized publish
> infrastructure already in place for many GNU/Linux distributions.

Regardless of me agreeing with Hartmut here, I suggest https://wasabi.com/
as an S3 compatible storage (, which is not run by Amazon.).



Re: Using a CDN or some other mirror?

2018-12-04 Thread Thompson, David
On Tue, Dec 4, 2018 at 9:06 AM Ludovic Courtès  wrote:
>
> Hi Hartmut,
>
> Hartmut Goebel  skribis:
>
> > Am 03.12.2018 um 17:12 schrieb Ludovic Courtès:
> >> Thus, I’m thinking about using a similar setup, but hosting the mirror
> >> on some Big Corp CDN or similar.
> >
> > Isn't this a contradiction: Building a free infrastructure relaying on
> > servers from some Big Corporation? Let allow the privacy concerns
> > raising when delivering data via some Big Corporation.
> >
> > If delivering "packages" works via static data without requiring any
> > additional service, we could ask universities to host Guix, too. IMHO
> > this is a much preferred solution since this is a decentralized publish
> > infrastructure already in place for many GNU/Linux distributions.
>
> As you know, berlin.guixsd.org is hosted at the Max Delbrück Center in
> Berlin, a public research institute.  So in a way, we’re already doing
> that.  We shouldn’t take it for granted that public institutes will
> happily host our stuff and donate hardware: without Ricardo’s work and
> the generosity of the MDC, we wouldn’t have anything there.
>
> I understand the reluctance regarding “Big Corp” hosting, and I actually
> share it to some extent.  However, having put much thought into it (and
> also much sweat in build farm sysadmin…), I think the alternative is:
> commercial hosting, or peer-to-peer.
>
> Florian has been looking at the latter approach with IPFS, and perhaps
> we’ll be able to put it in production in a few months and be happy with
> it (I have good hopes given what Florian already demonstrated.)
>
> In the meantime, we need redundant storage, high bandwidth, and high
> availability.  If you know of non-profit organizations that can provide
> such services, please let us know; if not, we’ll resort to a commercial
> service.  The bottom line is: we cannot reasonably pretend to offer such
> a service ourselves.
>
> (Note that we’re just talking about substitute delivery—I wouldn’t want
> to *build* packages on one of these commercial hosting services.)
>
> I hope this clarifies my position.

Using CloudFront with a custom (non-S3) origin sounds like a
reasonable solution to me, though I understand the hesitance to use a
commercial service.

If AWS CloudFront is the path chosen, it may be worthwhile to follow
the "infrastructure as code" practice and use CloudFormation to
provision the CloudFront distribution and any other supporting
resources. The benefit is that there would be a record of exactly
*how* the project is using these commercial services and the setup
could be easily reproduced.  The timing is interesting here because I
just attended the annual AWS conference on behalf of my employer and
while I was there I felt inspired to write a Guile API for building
CloudFormation "stacks".  You can see a small sample of what it does
here: https://gist.github.com/davexunit/db4b9d3e67902216fbdbc66cd9c6413e

- Dave



Re: Using a CDN or some other mirror?

2018-12-04 Thread Pjotr Prins
On Tue, Dec 04, 2018 at 03:05:44PM +0100, Ludovic Courtès wrote:
> Florian has been looking at the latter approach with IPFS, and perhaps
> we’ll be able to put it in production in a few months and be happy with
> it (I have good hopes given what Florian already demonstrated.)
> 
> In the meantime, we need redundant storage, high bandwidth, and high
> availability.  If you know of non-profit organizations that can provide
> such services, please let us know; if not, we’ll resort to a commercial
> service.  The bottom line is: we cannot reasonably pretend to offer such
> a service ourselves.
> 
> (Note that we’re just talking about substitute delivery—I wouldn’t want
> to *build* packages on one of these commercial hosting services.)

IPFS would be great because it will allow anyone to lookup and share
substitutes with a low barrier to entry and it allows for redundancy
too. Unlike bittorrent it comes with a built in webinterface and it is
a merkle tree with local deduplication. 

I think IPFS is a pretty solid proposition since so many are building
solutions on it.

Pj.



Re: Using a CDN or some other mirror?

2018-12-04 Thread Ludovic Courtès
Hi Hartmut,

Hartmut Goebel  skribis:

> Am 03.12.2018 um 17:12 schrieb Ludovic Courtès:
>> Thus, I’m thinking about using a similar setup, but hosting the mirror
>> on some Big Corp CDN or similar.
>
> Isn't this a contradiction: Building a free infrastructure relaying on
> servers from some Big Corporation? Let allow the privacy concerns
> raising when delivering data via some Big Corporation.
>
> If delivering "packages" works via static data without requiring any
> additional service, we could ask universities to host Guix, too. IMHO
> this is a much preferred solution since this is a decentralized publish
> infrastructure already in place for many GNU/Linux distributions.

As you know, berlin.guixsd.org is hosted at the Max Delbrück Center in
Berlin, a public research institute.  So in a way, we’re already doing
that.  We shouldn’t take it for granted that public institutes will
happily host our stuff and donate hardware: without Ricardo’s work and
the generosity of the MDC, we wouldn’t have anything there.

I understand the reluctance regarding “Big Corp” hosting, and I actually
share it to some extent.  However, having put much thought into it (and
also much sweat in build farm sysadmin…), I think the alternative is:
commercial hosting, or peer-to-peer.

Florian has been looking at the latter approach with IPFS, and perhaps
we’ll be able to put it in production in a few months and be happy with
it (I have good hopes given what Florian already demonstrated.)

In the meantime, we need redundant storage, high bandwidth, and high
availability.  If you know of non-profit organizations that can provide
such services, please let us know; if not, we’ll resort to a commercial
service.  The bottom line is: we cannot reasonably pretend to offer such
a service ourselves.

(Note that we’re just talking about substitute delivery—I wouldn’t want
to *build* packages on one of these commercial hosting services.)

I hope this clarifies my position.

Ludo’.



Re: Using a CDN or some other mirror?

2018-12-04 Thread Hartmut Goebel
Am 03.12.2018 um 17:12 schrieb Ludovic Courtès:
> Thus, I’m thinking about using a similar setup, but hosting the mirror
> on some Big Corp CDN or similar.

Isn't this a contradiction: Building a free infrastructure relaying on
servers from some Big Corporation? Let allow the privacy concerns
raising when delivering data via some Big Corporation.

If delivering "packages" works via static data without requiring any
additional service, we could ask universities to host Guix, too. IMHO
this is a much preferred solution since this is a decentralized publish
infrastructure already in place for many GNU/Linux distributions.

-- 
+++hartmut

| Hartmut Goebel|   |
| hart...@goebel-consult.de | www.goebel-consult.de |




Re: Using a CDN or some other mirror?

2018-12-03 Thread Ricardo Wurmus


Ludovic Courtès  writes:

> Hello,
>
> Ludovic Courtès  skribis:
>
>> These patches (actually the last one) switch Guix to default to
>>  for substitutes, in preparation for the
>> upcoming 0.16.0 release (hopefully this week!).
>
> Right now, ci.guix.info points to berlin.guixsd.org, the front-end of
> the build farm hosted at the MDC.
>
> The previous setup was that mirror.hydra.gnu.org mirrors hydra.gnu.org
> (the actual build farm front-end) using an nginx proxy:
>
>   
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf
>
> This provides a bit of redundancy that we don’t have currently for
> berlin.
>
> Thus, I’m thinking about using a similar setup, but hosting the mirror
> on some Big Corp CDN or similar.  Chris Marusich came up with a setup
> along these lines a while back:
>
>   https://lists.gnu.org/archive/html/guix-devel/2016-03/msg00312.html

Large ISPs also provide CDN services.  I already contacted Deutsche
Telekom so that we can compare their CDN offer with the Amazon Cloudfont
setup that Chris has configured.

--
Ricardo




Using a CDN or some other mirror?

2018-12-03 Thread Ludovic Courtès
Hello,

Ludovic Courtès  skribis:

> These patches (actually the last one) switch Guix to default to
>  for substitutes, in preparation for the
> upcoming 0.16.0 release (hopefully this week!).

Right now, ci.guix.info points to berlin.guixsd.org, the front-end of
the build farm hosted at the MDC.

The previous setup was that mirror.hydra.gnu.org mirrors hydra.gnu.org
(the actual build farm front-end) using an nginx proxy:

  
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf

This provides a bit of redundancy that we don’t have currently for
berlin.

Thus, I’m thinking about using a similar setup, but hosting the mirror
on some Big Corp CDN or similar.  Chris Marusich came up with a setup
along these lines a while back:

  https://lists.gnu.org/archive/html/guix-devel/2016-03/msg00312.html

Compared to Chris’s setup, given that ‘guix publish’ now provides
‘Cache-Control’ headers (that was not the case back then, see
),
caching in the proxy should Just Work.

I would like us to set up such a mirror for berlin and then have
ci.guix.info point to that.  The project should be able to pay the
hosting fees.

Thoughts?

Would someone like to get started?  You’ll undoubtedly get all the
appreciation of each one of us and a beverage of your choice next time
we meet!  :-)

Thanks,
Ludo’.