Hi, I’ve been experiencing this “start request repeated too quickly” issue, but IIRC for the pmsensors service instead, for instance when the GUI was set up against Spectrum Scale nodes on which the gpfs.gss.pmsensors RPM was not properly installed. That is, something was misconfigured at the cluster level, and not necessarily on the node for which the service is failing. Your issue might point at something similar but on the other end of the spectrum (sic).
In this case the issue is usually resolved by deleting/recreating the performance monitoring configuration for the whole cluster: mmchnode --noperfmon -N all # required before deleting the perfmon config mmperfmon config delete --all mmperfmon config generate --collectors <GUINODES> # start the pmcollector service on the GUI nodes mmchnode --perfmon -N all # start the pmsensors service on all nodes It might work when targeting individual nodes instead, though again the problem might be caused by cluster inconsistencies. HTH -- Nicolas Calimet, PhD | HPC System Architect | Lenovo ISG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 | https://www.lenovo.com/dssg From: [email protected] <[email protected]> On Behalf Of Oesterlin, Robert Sent: Monday, November 15, 2021 19:44 To: gpfsug main discussion list <[email protected]> Subject: [External] [gpfsug-discuss] Pmcollector fails to start Any idea why pmcollector fails to start via service? If I start it manually, it runs just fine. Scale 5.1.1.4 This worksfrom the command line: /opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon “service pmcollector start” - fails: Redirecting to /bin/systemctl status pmcollector.service ● pmcollector.service - zimon collector daemon Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2021-11-15 13:22:34 EST; 10min ago Process: 2055 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R /var/run/perfmon (code=exited, status=203/EXEC) Main PID: 2055 (code=exited, status=203/EXEC) Nov 15 13:22:33 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:33 nrg1-zimon1 systemd[1]: pmcollector.service failed. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service holdoff time over, scheduling restart. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Stopped zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: start request repeated too quickly for pmcollector.service Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Failed to start zimon collector daemon. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: Unit pmcollector.service entered failed state. Nov 15 13:22:34 nrg1-zimon1 systemd[1]: pmcollector.service failed. Bob Oesterlin Sr Principal Storage Engineer Nuance Communications
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
