Hi,

I’ve been experiencing this “start request repeated too quickly” issue, but 
IIRC for the pmsensors service instead, for instance when the GUI was set up 
against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not 
properly installed. That is, something was misconfigured at the cluster level, 
and not necessarily on the node for which the service is failing. Your issue 
might point at something similar but on the other end of the spectrum (sic).

In this case the issue is usually resolved by deleting/recreating the 
performance monitoring configuration for the whole cluster:

mmchnode --noperfmon -N all   # required before deleting the perfmon config
mmperfmon config delete --all
mmperfmon config generate --collectors <GUINODES>  # start the pmcollector 
service on the GUI nodes
mmchnode --perfmon -N all  # start the pmsensors service on all nodes

It might work when targeting individual nodes instead, though again the problem 
might be caused by cluster inconsistencies.

HTH

--
Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, 
D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg

From: [email protected] 
<[email protected]> On Behalf Of Oesterlin, Robert
Sent: Monday, November 15, 2021 19:44
To: gpfsug main discussion list <[email protected]>
Subject: [External] [gpfsug-discuss] Pmcollector fails to start

Any idea why pmcollector fails to start via service? If I start it manually, it 
runs just fine. Scale 5.1.1.4

This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C 
/opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon

“service pmcollector start” - fails:

Redirecting to /bin/systemctl status pmcollector.service
● pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor 
preset: disabled)
   Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 
10min ago
  Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C 
/opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, 
status=203/EXEC)
Main PID: 2055 (code=exited, status=203/EXEC)

Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed 
state.
Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, 
scheduling restart.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for 
pmcollector.service
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed 
state.
Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed.


Bob Oesterlin
Sr Principal Storage Engineer
Nuance Communications
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to