add this line to the ticket/record information, when opening a service ticket ..
fput failed: Version mismatch on conditional put (err 805)
----- Ursprüngliche Nachricht -----
Von: "Luis Bolinches" <[email protected]>
Gesendet von: [email protected]
An: [email protected]
CC: [email protected]
Betreff: [EXTERNAL] Re: [gpfsug-discuss] gpfsgui in a core dump/restart loop
Datum: Di, 30. Nov 2021 14:30
HiNot really a solution ...first disable the systemd servicesystemd disable gpfsguiSo at least does not go on this loopThis can be indicative of few issues going on. 2 or more nodes trying to modify the same file; removed nodes that were perfmon; "too many" collectors on certain conditions; ... and probably many other.I strongly suggest you get the last round of generated dump data and open a case to IBM (assuming this is IBM, whoever else the vendor is if not). Maybe a snap with it to speed up things so there is a clear picture of the cluster and CCR nodes and collectors.--
Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions
Luis BolinchesIBM Spectrum Scale developmentMobile Phone: +358503112585Ab IBM Finland OyLaajalahdentie 2300330 HelsinkiUusimaa - Finland
"If you always give you will always have" -- Anonymous----- Original message -----
From: "Losen, Stephen C (scl)" <[email protected]>
Sent by: [email protected]
To: "gpfsug main discussion list" <[email protected]>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] gpfsgui in a core dump/restart loop
Date: Tue, Nov 30, 2021 14:48
Hi folks,
Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt
-rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp
-rw-r--r-- 1 scalemgmt scalemgmt 2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt
-rw-r--r-- 1 scalemgmt scalemgmt 1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc
-rw-r--r-- 1 scalemgmt scalemgmt 202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp
The core.*.dmp files are cores from the java command.
And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log.
Any suggestions? Thanks for any help.
2021-11-30_07:25:09.944-0500: [W] ET_gui Event=gui_down identifier= arg0=started arg1=stopped
2021-11-30_07:25:09.961-0500: [I] ET_gui state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572
2021-11-30_07:25:09.963-0500: [I] ClientThread-4 received command: 'thresholds refresh collectors 4021694'
2021-11-30_07:25:09.964-0500: [I] ClientThread-4 reload collectors
2021-11-30_07:25:09.964-0500: [I] ClientThread-4 read_collectors
2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryHandler: query response has no data results
2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryHandler: query response has no data results
2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.061-0500: [I] ClientThread-4 _activate_rules_scheduler completed
2021-11-30_07:25:10.147-0500: [I] ET_gui Event=component_state_change identifier= arg0=GUI arg1=FAILED
2021-11-30_07:25:10.148-0500: [I] ET_gui StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN
2021-11-30_07:25:10.148-0500: [I] ET_gui Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4
2021-11-30_07:25:10.148-0500: [I] ET_gui Monitor: LocalState:FAILED Events:607 Entities:0 RT: 0.83
2021-11-30_07:25:11.975-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693']
2021-11-30_07:25:11.975-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805)
- CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread ---------------------------------
2021-11-30_07:25:04.553-0500: [D] ET_perfmon File collectors has no newer version than 4021693 - CCRProxy.getFile:119
2021-11-30_07:25:11.975-0500: [W] ET_perfmon Conditional put for file collectors with version 4021693 failed
2021-11-30_07:25:11.975-0500: [W] ET_perfmon New version received, start new collectors update cycle
2021-11-30_07:25:11.976-0500: [I] ET_perfmon read_collectors
2021-11-30_07:25:12.077-0500: [I] ET_perfmon write_collectors
2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds refresh collectors 4021695'
2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors
2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors
2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results
2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed
2021-11-30_07:25:15.528-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694']
2021-11-30_07:25:15.528-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805)
- CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread ---------------------------------
2021-11-30_07:25:12.076-0500: [D] ET_perfmon File collectors has no newer version than 4021694 - CCRProxy.getFile:119
2021-11-30_07:25:15.529-0500: [W] ET_perfmon Conditional put for file collectors with version 4021694 failed
2021-11-30_07:25:15.529-0500: [W] ET_perfmon New version received, start new collectors update cycle
2021-11-30_07:25:15.529-0500: [I] ET_perfmon read_collectors
2021-11-30_07:25:15.626-0500: [I] ET_perfmon write_collectors
2021-11-30_07:25:16.594-0500: [I] ClientThread-3 received command: 'thresholds refresh collectors 4021696'
2021-11-30_07:25:16.595-0500: [I] ClientThread-3 reload collectors
2021-11-30_07:25:16.595-0500: [I] ClientThread-3 read_collectors
2021-11-30_07:25:19.780-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695']
2021-11-30_07:25:19.780-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805)
- CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread ---------------------------------
2021-11-30_07:25:15.625-0500: [D] ET_perfmon File collectors has no newer version than 4021695 - CCRProxy.getFile:119
2021-11-30_07:25:16.781-0500: [D] ClientThread-3 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119
2021-11-30_07:25:19.780-0500: [W] ET_perfmon Conditional put for file collectors with version 4021695 failed
2021-11-30_07:25:19.781-0500: [W] ET_perfmon New version received, start new collectors update cycle
2021-11-30_07:25:19.781-0500: [I] ET_perfmon read_collectors
2021-11-30_07:25:19.881-0500: [I] ET_perfmon write_collectors
2021-11-30_07:25:21.238-0500: [I] ClientThread-7 received command: 'thresholds refresh collectors 4021697'
2021-11-30_07:25:21.239-0500: [I] ClientThread-7 reload collectors
2021-11-30_07:25:21.239-0500: [I] ClientThread-7 read_collectors
2021-11-30_07:25:21.324-0500: [W] NMES monitor event arrived while still busy for perfmon
2021-11-30_07:25:21.481-0500: [I] ET_threshold Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor
2021-11-30_07:25:21.482-0500: [I] ET_threshold Monitor: LocalState:HEALTHY Events:1 Entities:1 RT: 0.16
2021-11-30_07:25:24.211-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696']
2021-11-30_07:25:24.211-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805)
- CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread ---------------------------------
2021-11-30_07:25:19.881-0500: [D] ET_perfmon File collectors has no newer version than 4021696 - CCRProxy.getFile:119
2021-11-30_07:25:21.411-0500: [D] ClientThread-7 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119
2021-11-30_07:25:24.211-0500: [W] ET_perfmon Conditional put for file collectors with version 4021696 failed
2021-11-30_07:25:24.212-0500: [W] ET_perfmon New version received, start new collectors update cycle
2021-11-30_07:25:24.212-0500: [I] ET_perfmon read_collectors
2021-11-30_07:25:24.314-0500: [I] ET_perfmon write_collectors
2021-11-30_07:25:24.543-0500: [I] ET_gui ServiceMonitor => out=Type=notify
And then gpfsgui apparently crashes and systemd automatically restarts it.
Steve Losen
Research Computing
University of Virginia
[email protected] 434-924-0640
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3
Registered in Finland
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
