Re: Pool dried up

2021-03-29 Thread Martin Dobrev


On 29/03/2021 11:47, Marcel Waldvogel wrote:



Looking at the cached metadata it appears that when the spider ran,
pod02.fleetstreetops nodes was unavailable, as was pgpkeys.co.uk



Apologies, I didn't mean to cast doubt on the reliability of your node,
but rather on that of the spider. It does not maintain much of a
historical record, and so depends on a single measurement per node each



yes, that's the case with sks.infcs.de as well.

That server sometimes handles a reconcil task that it does not return
the stat-page in time. Per log, the server is running the whole time,
but the spider cannot retrieve the stat page all the time, because of
the sequential design of SKS server.



Would you "core server people" consider caching the status page in the 
proxy?
(I do fetch the status page every 5 minutes from the backend servers 
to create a "homegeneous" view of the pool, as each is configured 
slightly differently; and serve that statically from the load balancer.)


-Marcel



I'm doing similar things in my cluster. All data gets cached for 5 
minutes and, looking at metrics, this is good enough for about 22-30% 
cache hit ratio.


Regards,
Martin



OpenPGP_0xCAAAE2B8C198C9AE.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: Pool dried up

2021-03-29 Thread Andrew Gallagher

On 29/03/2021 11:47, Marcel Waldvogel wrote:
Would you "core server people" consider caching the status page in the 
proxy?
(I do fetch the status page every 5 minutes from the backend servers to 
create a "homegeneous" view of the pool, as each is configured slightly 
differently; and serve that statically from the load balancer.)


I recently changed sks.pgpkeys.eu to serve the status page from the 
recon instance rather than the load-balanced workers - this appears to 
have improved its apparent reliability (from the spider POV), but it's 
still not perfect (it blipped this morning).


I think caching (of everything) is the way to go, and will experiment 
with it later.


--
Andrew Gallagher



OpenPGP_signature
Description: OpenPGP digital signature


Re: Pool dried up

2021-03-29 Thread Marcel Waldvogel
> > 
> > > > Looking at the cached metadata it appears that when the spider
> > > > ran,
> > > > pod02.fleetstreetops nodes was unavailable, as was
> > > > pgpkeys.co.uk
> 
> > Apologies, I didn't mean to cast doubt on the reliability of your
> > node,
> > but rather on that of the spider. It does not maintain much of a
> > historical record, and so depends on a single measurement per node
> > each
> 
> 
> yes, that's the case with sks.infcs.de as well.
> 
> That server sometimes handles a reconcil task that it does not return
> the stat-page in time. Per log, the server is running the whole time,
> but the spider cannot retrieve the stat page all the time, because of
> the sequential design of SKS server.
> 

Would you "core server people" consider caching the status page in the
proxy?
(I do fetch the status page every 5 minutes from the backend servers to
create a "homegeneous" view of the pool, as each is configured slightly
differently; and serve that statically from the load balancer.)

-Marcel



signature.asc
Description: This is a digitally signed message part


Re: Pool dried up

2021-03-29 Thread Steffen Kaiser
On 23.03.21 10:38, Andrew Gallagher wrote:
> On 23/03/2021 03:37, Todd Fleisher wrote:
>>> On Mar 22, 2021, at 13:28, Andrew Gallagher >> > wrote:
>>>
>>> 1pgpkeys.uk [@]
>>> 2sks.pod01.fleetstreetops.com [@]
>>> 3sks.pod02.fleetstreetops.com [@]
> 
> BTW it has just happened again:
> 
> 1 pgpkeys.eu[@]   
> 2 pgpkeys.uk[@]   
> 3 sks.pod02.fleetstreetops.com[@]   
> 
>>> Looking at the cached metadata it appears that when the spider ran,
>>> pod02.fleetstreetops nodes was unavailable, as was pgpkeys.co.uk

> Apologies, I didn't mean to cast doubt on the reliability of your node,
> but rather on that of the spider. It does not maintain much of a
> historical record, and so depends on a single measurement per node each


yes, that's the case with sks.infcs.de as well.

That server sometimes handles a reconcil task that it does not return
the stat-page in time. Per log, the server is running the whole time,
but the spider cannot retrieve the stat page all the time, because of
the sequential design of SKS server.

Kind regards,


-- 
Steffen



signature.asc
Description: OpenPGP digital signature


Re: Pool dried up

2021-03-23 Thread Todd Fleisher
> On Mar 23, 2021, at 02:38, Andrew Gallagher  wrote:
> 
> Hi, Todd.
> 
> On 23/03/2021 03:37, Todd Fleisher wrote:
>>> On Mar 22, 2021, at 13:28, Andrew Gallagher >> > wrote:
>>> 
>>> I happened to check the pool just now, and there are only three nodes in it:
>>> 
>>> 1pgpkeys.uk [@]
>>> 2sks.pod01.fleetstreetops.com [@]
>>> 3sks.pod02.fleetstreetops.com [@]
> 
> BTW it has just happened again:
> 
> 1 pgpkeys.eu[@]
> 2 pgpkeys.uk[@]
> 3 sks.pod02.fleetstreetops.com[@]
> 
>>> Looking at the cached metadata it appears that when the spider ran, 
>>> pod02.fleetstreetops nodes was unavailable, as was pgpkeys.co.uk 
>>>  (the domain registration has expired).
>> I can’t speak for pgpkeys.co.uk , but I have not seen 
>> any issues with sks.pod02.fleetstreetops.com 
>>  (nor hkps.pool.sks-keyservers.net 
>> , which it powers) today.
> 
> Apologies, I didn't mean to cast doubt on the reliability of your node, but 
> rather on that of the spider. It does not maintain much of a historical 
> record, and so depends on a single measurement per node each hour for its 
> operation. This makes it very vulnerable to transient issues, such as 
> connection timeouts. One dropped connection, and 90% of the pool disappears 
> for an hour, even if all the nodes stay up.

No worries. I was neither offended nor trying to dispute  that at times the 
pools run thin (if not fully bottom out). I was just trying to keep the things 
in perspective. My nodes also suffer from issues periodically and as a result I 
have seen issues where they are non-responsive at times. This usually isn’t 
visible to the public on account of the load balanced setup.

> There are a few interlocking design issues at work here IMO, none of which 
> are the responsibility of individual operators.

Indeed.

-T

> --
> Andrew Gallagher
> 



signature.asc
Description: Message signed with OpenPGP


Re: Pool dried up

2021-03-23 Thread Andrew Gallagher

Hi, Todd.

On 23/03/2021 03:37, Todd Fleisher wrote:
On Mar 22, 2021, at 13:28, Andrew Gallagher > wrote:


I happened to check the pool just now, and there are only three nodes 
in it:


1pgpkeys.uk [@]
2sks.pod01.fleetstreetops.com [@]
3sks.pod02.fleetstreetops.com [@]


BTW it has just happened again:

1 pgpkeys.eu[@] 
2 pgpkeys.uk[@] 
3 sks.pod02.fleetstreetops.com[@]   

Looking at the cached metadata it appears that when the spider ran, 
pod02.fleetstreetops nodes was unavailable, as was pgpkeys.co.uk 
 (the domain registration has expired).


I can’t speak for pgpkeys.co.uk , but I have not 
seen any issues with sks.pod02.fleetstreetops.com 
 (nor hkps.pool.sks-keyservers.net 
, which it powers) today.


Apologies, I didn't mean to cast doubt on the reliability of your node, 
but rather on that of the spider. It does not maintain much of a 
historical record, and so depends on a single measurement per node each 
hour for its operation. This makes it very vulnerable to transient 
issues, such as connection timeouts. One dropped connection, and 90% of 
the pool disappears for an hour, even if all the nodes stay up.


There are a few interlocking design issues at work here IMO, none of 
which are the responsibility of individual operators.


--
Andrew Gallagher



OpenPGP_signature
Description: OpenPGP digital signature


Re: Pool dried up

2021-03-22 Thread Todd Fleisher
> On Mar 22, 2021, at 13:28, Andrew Gallagher  wrote:
> 
> I happened to check the pool just now, and there are only three nodes in it:
> 
> 1 pgpkeys.uk[@]
> 2 sks.pod01.fleetstreetops.com[@]
> 3 sks.pod02.fleetstreetops.com[@]
> 
> Looking at the cached metadata it appears that when the spider ran, 
> pod02.fleetstreetops nodes was unavailable, as was pgpkeys.co.uk (the domain 
> registration has expired).

I can’t speak for pgpkeys.co.uk , but I have not seen 
any issues with sks.pod02.fleetstreetops.com 
 (nor hkps.pool.sks-keyservers.net 
, which it powers) today.

> pod01.fleetstreetops does not advertise any peers,

This happens intermittently when the non-recon node services the status 
request, but it is a multi-node pool so rest assured there is external 
reconciliation configured.

> This demonstrates that connectivity is a serious issue with the pool right 
> now. A few key nodes going down can orphan all other nodes, no matter how 
> well-behaved.

While the core idea behind this take is valid, I do not believe the current 
state of affairs is as dire as it is being portrayed.

> It should probably also be noted that pod02.fleetstreetops has been the only 
> node in the HKPS pool now for some time. This certainly can't be good for its 
> load.

Yes, this has been the case since last summer. This is completely reliant on 
Kristian’s continued signing of CSRs for member nodes.

-T



signature.asc
Description: Message signed with OpenPGP


Re: Pool dried up

2021-03-22 Thread Andrew Gallagher

On 22/03/2021 20:45, Martin Dobrev wrote:

Is it not time to extend the list of initial servers then?


$initial_servers = array("keys2.kfwebs.net", "zimmermann.mayfirst.org", "keyserver.kim-minh.com", 
"pgp.circl.lu", "keys.niif.hu", "sks.b4ckbone.de", "keyserver.opensuse.org");

>

keyserver.dobrev.eu is peered to two of them and yet dropped from the list.


Of those, only zimmermann is functional, and it isn't paired with any 
other functional servers. I'm pretty sure the running copy of the spider 
is using a local config file with many more seeds than in the repo copy.


--
Andrew Gallagher



OpenPGP_signature
Description: OpenPGP digital signature


Re: Pool dried up

2021-03-22 Thread Martin Dobrev

Is it not time to extend the list of initial servers then?


$initial_servers = array("keys2.kfwebs.net", "zimmermann.mayfirst.org", "keyserver.kim-minh.com", 
"pgp.circl.lu", "keys.niif.hu", "sks.b4ckbone.de", "keyserver.opensuse.org");

keyserver.dobrev.eu is peered to two of them and yet dropped from the list.

Regards,
Martin

On 22/03/2021 20:28, Andrew Gallagher wrote:
I happened to check the pool just now, and there are only three nodes 
in it:


1    pgpkeys.uk[@]
2    sks.pod01.fleetstreetops.com[@]
3    sks.pod02.fleetstreetops.com[@]

Looking at the cached metadata it appears that when the spider ran, 
pod02.fleetstreetops nodes was unavailable, as was pgpkeys.co.uk (the 
domain registration has expired).


pod01.fleetstreetops does not advertise any peers, and of pgpkeys.uk's 
peers, only pgpkeys.eu (which does not yet pretend to be SKS) and 
pgp.surf.nl (which is several thousand keys behind) were reachable. At 
second remove, of surf.nl's peers only keys.andreas-puls.de was 
available, but was just outside the reduced max delta of 300, and it 
does not advertise any further peers. So the spiders ran dry at this 
point.


This demonstrates that connectivity is a serious issue with the pool 
right now. A few key nodes going down can orphan all other nodes, no 
matter how well-behaved.


It should probably also be noted that pod02.fleetstreetops has been 
the only node in the HKPS pool now for some time. This certainly can't 
be good for its load.




Pool dried up

2021-03-22 Thread Andrew Gallagher

I happened to check the pool just now, and there are only three nodes in it:

1   pgpkeys.uk[@]
2   sks.pod01.fleetstreetops.com[@]
3   sks.pod02.fleetstreetops.com[@]

Looking at the cached metadata it appears that when the spider ran, 
pod02.fleetstreetops nodes was unavailable, as was pgpkeys.co.uk (the 
domain registration has expired).


pod01.fleetstreetops does not advertise any peers, and of pgpkeys.uk's 
peers, only pgpkeys.eu (which does not yet pretend to be SKS) and 
pgp.surf.nl (which is several thousand keys behind) were reachable. At 
second remove, of surf.nl's peers only keys.andreas-puls.de was 
available, but was just outside the reduced max delta of 300, and it 
does not advertise any further peers. So the spiders ran dry at this point.


This demonstrates that connectivity is a serious issue with the pool 
right now. A few key nodes going down can orphan all other nodes, no 
matter how well-behaved.


It should probably also be noted that pod02.fleetstreetops has been the 
only node in the HKPS pool now for some time. This certainly can't be 
good for its load.


--
Andrew Gallagher



OpenPGP_signature
Description: OpenPGP digital signature