Follow-up Comment #10, sr #111374 (group administration):

> One of the cgit mirror sites was stale for groff by about two whole weeks!

Anything I say here won't help my case. I know I should just stand up and own
up that the mirrors were stale. And I will do that because I had one job and
that was to keep the mirrors in sync and clearly that failed. But the logs at
the time only showed that the mirror was stale by about 4 days not 2 weeks.
Why might that be different? I think git may have fooled things here because
if one commits locally then that's the date. Then a few days later that is
pushed and the data the git log shows is the date of the commit not the push.
So sure the log may show the last commit to be 2 weeks ago but that is not a
direct insight into how long the mirror has been since the last successful
mirror sync. But I had one job and when that broke down it is still a fail.
Sorry!

Fortunately git when fetching will do the right thing. And I counted upon that
in the design of things. If a mirror is behind and git is asked to fetch from
it then nothing happens. Later when git is asked to fetch and it hits an
updated mirror then of course it will fetch the new bits. As the RR-DNS
rotates around through the addresses things will work out for the CI/CD
automated systems.

How does Round-Robin-DNS work? Yes as you surmise there are completely
independent mirror systems running in parallel. There are currently 7 systems
with two more in the queue to be added. It's growing to be somewhat of a large
collection to manage. The primary is the upstream system that is used for git
push. The mirrors shield the primary from the load from the AI Scraper bots
and DDOS attacks.

RR-DNS is only somewhat randomly distributed. It depends upon client
implementation. Every DNS query will rotate through the list of addresses. But
clients such as web browsers will cache the DNS lookup and therefore stick
with one address rather than rotate through the list. There are other
limitations. The technique is somewhat more thoroughly described on
Wikipedia.

https://en.wikipedia.org/wiki/Round-robin_DNS

The advantage is that it is a completely independently distributed technique.
There is no other fully independent distributed technique available to us to
use with standard servers, networking, and DNS. Vendors such as CloudFlare and
Akamai use stronger infrastructure methods of distribution which are not
readily available to us. So even though RR-DNS is not without drawbacks it is
the best system readily available to us.

I know that HA High Availability systems are often suggested such as haproxy,
traefik, nginx proxy, others are not distributed across sites. They are great
HA solutions within a single datacenter location. They are not designed for
distributing across a heterogeneous collection of volunteer contributed
systems across many datacenters. Though haproxy is an excellent solution
within a datacenter site. So I will just say this in order to get ahead of
it.

> I'm not seeing any hostname at the top of any cgit pages, and don't remember
> having ever seen it. Also, I would expect such a hostname in the footer:

There are only a few options available when using cgit and one of them is the
cgit root-desc string. So I embedded the hostname in that string. It's not
perfect but it is at least a clue. Here is an automated way to print that
string. This will show the mirror that is being queried for each wget run.

wget -O- -q https://cgit.git.savannah.gnu.org/cgit/ | grep cgit.browser

Since wget does not cache the DNS lookup it will perform a new DNS query each
time. It is operating as git itself will operate. It will rotate through the
RR-DNS address list. The web browser such as Firefox, Chromium, others will
cache result though and stick with one until the cache expires in the
browser.

> ... this didn't work due to HTTPS default and inappropriate or missing SSL
> certificate matches

When looking at git-daemon cloning that is the easiest way to probe individual
IP addresses. For example picking a single address from the list can be cloned
from a selected mirror like this.

git clone --depth=1 git://15.204.9.231/test-project.git
git clone --depth=1 git://"[2604:2dc0:202:300::5d3]"/test-project.git

Don't fixate on that IP address. It's dynamic. That one in particular is going
to be removed from the pool soon. But it should be online today.

I have a checker which is testing that the service is online using the above
technique. That reports if it is online. It does not know if it is stale. I
need to set up something different which will test for stale repositories.
There are a thousand plus repositories however so it will need to be something
which is a surrogate for the collection. It is not practical to test each of
the thousand individually all of the tiem.

To dig into a cgit page one needs something a little more invasive. Saying
cgit of course implies HTTPS protocol. Without using a chroot and overriding
/etc/hosts or creating an nsswitch module we just have to use the addres and
disable certificate checking becuase the certificate won't match the address.
(I suppose we could now add IP certificates.)

wget -q -O- --no-check-certificate --header="Host: cgit.git.savannah.gnu.org"
https://15.204.9.231/cgit/test-project.git/commit/ | grep -F '/commit/?id='

That gets messier very quickly. That's also part of my automated testing to
ensure that the gitweb, cgit, http, https services are all operating.

I don't know what else to add here so I am just going to post this and keep
working on things.



    _______________________________________________________

Reply to this item at:

  <https://savannah.nongnu.org/support/?111374>

_______________________________________________
Message sent via Savannah
https://savannah.nongnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to