Re: help needed to manage s390x host for ci.debian.net
Hi, On 12.02.23 22:38, Paul Gevers wrote: I have munin [1], but as said, I'm not a trained sysadmin. I don't know what I'm looking for if you ask "statistics on the network". This is more of a software development / devops question than a sysadmin question, but alas. What I am interested in is *application-level* logging on reconnects. Presumably the connection to RabbitMQ is outbound? Is it tunneled? Does your application log somewhere when a reconnect happens? Does it say when it successfully connected? I'd expect good software to log something like this: [10:00:00] Connecting to broker "rabbitmq.debci.debian.net:12345"... [10:00:05] Connected to broker "rabbitmq.debci.debian.net:12345". And also: [10:00:00] Connecting to broker "rabbitmq.debci.debian.net:12345"... [10:00:01] Connection to broker "rabbitmq.debci.debian.net:12345" failed: Connection refused Kind regards Philipp Kern
RE: help needed to manage s390x host for ci.debian.net
I am not CI/networking expert, but I will be more than happy to assist. I am at +0530 hrs available widely. Thanks, -Dipak Zope Debian s390 porting team On 13/02/23, 3:09 AM, "Paul Gevers" wrote: Hi Phil and all others offering help, On 12-02-2023 20:32, Philipp Kern wrote: > On 11.02.23 18:18, Paul Gevers wrote: > * [suspect 1] network issues between the s390x and the main ci.d.n >> server (the results (log files) of the autopkgtests are transferred to >> the main server). Our ppc64el hosts are also located at Marist, so I >> would expect commonality here, but also ppc64el isn't performing >> great, so maybe part of the problem is common. > > Do you have any kind of statistics on the network connections? I.e. how > often it reconnects and how long it takes to reconnect? The Marist > network has a very weird firewall inbound (e.g. if I do too many SSH > requests in a row, I'm backholed) - so I would not be surprised if there > is some weirdness there. I have munin [1], but as said, I'm not a trained sysadmin. I don't know what I'm looking for if you ask "statistics on the network". Also, I have no experience with s390x except for deploying the Debian software on the server setup by Phil. All the quirks of s390x are beyond me. I can provide logging from the host, but I'll need detailed instructions of what people find useful to look at. Recently Antonio taught me a trick to provide temporary access to a lxc container on any of our hosts, so if it helps to be on the host (but inside lxc) we can provide for that. Paul [1] https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/index.html
Re: help needed to manage s390x host for ci.debian.net
Hi Phil and all others offering help, On 12-02-2023 20:32, Philipp Kern wrote: On 11.02.23 18:18, Paul Gevers wrote: * [suspect 1] network issues between the s390x and the main ci.d.n server (the results (log files) of the autopkgtests are transferred to the main server). Our ppc64el hosts are also located at Marist, so I would expect commonality here, but also ppc64el isn't performing great, so maybe part of the problem is common. Do you have any kind of statistics on the network connections? I.e. how often it reconnects and how long it takes to reconnect? The Marist network has a very weird firewall inbound (e.g. if I do too many SSH requests in a row, I'm backholed) - so I would not be surprised if there is some weirdness there. I have munin [1], but as said, I'm not a trained sysadmin. I don't know what I'm looking for if you ask "statistics on the network". Also, I have no experience with s390x except for deploying the Debian software on the server setup by Phil. All the quirks of s390x are beyond me. I can provide logging from the host, but I'll need detailed instructions of what people find useful to look at. Recently Antonio taught me a trick to provide temporary access to a lxc container on any of our hosts, so if it helps to be on the host (but inside lxc) we can provide for that. Paul [1] https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/index.html OpenPGP_signature Description: OpenPGP digital signature
Re: help needed to manage s390x host for ci.debian.net
On 11.02.23 18:18, Paul Gevers wrote: * [suspect 1] network issues between the s390x and the main ci.d.n server (the results (log files) of the autopkgtests are transferred to the main server). Our ppc64el hosts are also located at Marist, so I would expect commonality here, but also ppc64el isn't performing great, so maybe part of the problem is common. Do you have any kind of statistics on the network connections? I.e. how often it reconnects and how long it takes to reconnect? The Marist network has a very weird firewall inbound (e.g. if I do too many SSH requests in a row, I'm backholed) - so I would not be surprised if there is some weirdness there. Kind regards Philipp Kern