Hi folks,
Our gpfsgui service keeps crashing and restarting. About every three minutes we 
get files like these in /var/crash/scalemgmt

-rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 
core.20211130.065414.59174.0001.dmp
-rw-r--r-- 1 scalemgmt scalemgmt    2636747 Nov 30 06:54 
javacore.20211130.065414.59174.0002.txt
-rw-r--r-- 1 scalemgmt scalemgmt    1903304 Nov 30 06:54 
Snap.20211130.065414.59174.0003.trc
-rw-r--r-- 1 scalemgmt scalemgmt        202 Nov 30 06:54 
jitdump.20211130.065414.59174.0004.dmp

The core.*.dmp files are cores from the java command.

And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log.

Any suggestions? Thanks for any help.


2021-11-30_07:25:09.944-0500: [W] ET_gui          Event=gui_down identifier= 
arg0=started arg1=stopped
2021-11-30_07:25:09.961-0500: [I] ET_gui          state_change for service: gui 
to FAILED at 2021.11.30 07.25.09.961572
2021-11-30_07:25:09.963-0500: [I] ClientThread-4  received command: 'thresholds 
 refresh  collectors  4021694'
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  reload collectors             
                    
2021-11-30_07:25:09.964-0500: [I] ClientThread-4  read_collectors               
                    
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryHandler: query response 
has no data results  
2021-11-30_07:25:10.059-0500: [W] ClientThread-4  QueryProcessor::execute: 
Error sending query in execute, quitting
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryHandler: query response 
has no data results  
2021-11-30_07:25:10.060-0500: [W] ClientThread-4  QueryProcessor::execute: 
Error sending query in execute, quitting
2021-11-30_07:25:10.061-0500: [I] ClientThread-4  _activate_rules_scheduler 
completed               
2021-11-30_07:25:10.147-0500: [I] ET_gui          Event=component_state_change 
identifier= arg0=GUI arg1=FAILED
2021-11-30_07:25:10.148-0500: [I] ET_gui          StateChange: change_to=FAILED 
nodestate=DEGRADED CESState=UNKNOWN
2021-11-30_07:25:10.148-0500: [I] ET_gui          Service gui state changed. 
isInRunningState=True, wasInRunningState=True. New state=4
2021-11-30_07:25:10.148-0500: [I] ET_gui          Monitor: LocalState:FAILED 
Events:607 Entities:0 RT:  0.83
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      got rc (153) while executing 
['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', 
'-c 4021693']
2021-11-30_07:25:11.975-0500: [E] ET_perfmon      fput failed: Version mismatch 
on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      
---------------------------------                 
2021-11-30_07:25:04.553-0500: [D] ET_perfmon      File collectors has no newer 
version than 4021693  - CCRProxy.getFile:119
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      Conditional put for file 
collectors with version 4021693 failed
2021-11-30_07:25:11.975-0500: [W] ET_perfmon      New version received, start 
new collectors update cycle
2021-11-30_07:25:11.976-0500: [I] ET_perfmon      read_collectors               
                    
2021-11-30_07:25:12.077-0500: [I] ET_perfmon      write_collectors              
                    
2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds 
 refresh  collectors  4021695'
2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors             
                    
2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors               
                    
2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response 
has no data results  
2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: 
Error sending query in execute, quitting
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response 
has no data results  
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: 
Error sending query in execute, quitting
2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler 
completed               
2021-11-30_07:25:15.528-0500: [W] ET_perfmon      got rc (153) while executing 
['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', 
'-c 4021694']
2021-11-30_07:25:15.528-0500: [E] ET_perfmon      fput failed: Version mismatch 
on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      
---------------------------------                 
2021-11-30_07:25:12.076-0500: [D] ET_perfmon      File collectors has no newer 
version than 4021694  - CCRProxy.getFile:119
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      Conditional put for file 
collectors with version 4021694 failed
2021-11-30_07:25:15.529-0500: [W] ET_perfmon      New version received, start 
new collectors update cycle
2021-11-30_07:25:15.529-0500: [I] ET_perfmon      read_collectors               
                    
2021-11-30_07:25:15.626-0500: [I] ET_perfmon      write_collectors              
                    
2021-11-30_07:25:16.594-0500: [I] ClientThread-3  received command: 'thresholds 
 refresh  collectors  4021696'
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  reload collectors             
                    
2021-11-30_07:25:16.595-0500: [I] ClientThread-3  read_collectors               
                    
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      got rc (153) while executing 
['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', 
'-c 4021695']
2021-11-30_07:25:19.780-0500: [E] ET_perfmon      fput failed: Version mismatch 
on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      
---------------------------------                 
2021-11-30_07:25:15.625-0500: [D] ET_perfmon      File collectors has no newer 
version than 4021695  - CCRProxy.getFile:119
2021-11-30_07:25:16.781-0500: [D] ClientThread-3  File zmrules.json has no 
newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:19.780-0500: [W] ET_perfmon      Conditional put for file 
collectors with version 4021695 failed
2021-11-30_07:25:19.781-0500: [W] ET_perfmon      New version received, start 
new collectors update cycle
2021-11-30_07:25:19.781-0500: [I] ET_perfmon      read_collectors               
                    
2021-11-30_07:25:19.881-0500: [I] ET_perfmon      write_collectors              
                    
2021-11-30_07:25:21.238-0500: [I] ClientThread-7  received command: 'thresholds 
 refresh  collectors  4021697'
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  reload collectors             
                    
2021-11-30_07:25:21.239-0500: [I] ClientThread-7  read_collectors               
                    
2021-11-30_07:25:21.324-0500: [W] NMES            monitor event arrived while 
still busy for perfmon
2021-11-30_07:25:21.481-0500: [I] ET_threshold    
Event=thresh_monitor_del_active identifier=active_thresh_monitor 
arg0=active_thresh_monitor
2021-11-30_07:25:21.482-0500: [I] ET_threshold    Monitor: LocalState:HEALTHY 
Events:1 Entities:1 RT:  0.16
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      got rc (153) while executing 
['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', 
'-c 4021696']
2021-11-30_07:25:24.211-0500: [E] ET_perfmon      fput failed: Version mismatch 
on conditional put (err 805)
 - CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread      
---------------------------------                 
2021-11-30_07:25:19.881-0500: [D] ET_perfmon      File collectors has no newer 
version than 4021696  - CCRProxy.getFile:119
2021-11-30_07:25:21.411-0500: [D] ClientThread-7  File zmrules.json has no 
newer version than 1      - CCRProxy.getFile:119
2021-11-30_07:25:24.211-0500: [W] ET_perfmon      Conditional put for file 
collectors with version 4021696 failed
2021-11-30_07:25:24.212-0500: [W] ET_perfmon      New version received, start 
new collectors update cycle
2021-11-30_07:25:24.212-0500: [I] ET_perfmon      read_collectors               
                    
2021-11-30_07:25:24.314-0500: [I] ET_perfmon      write_collectors              
                    
2021-11-30_07:25:24.543-0500: [I] ET_gui          ServiceMonitor => 
out=Type=notify

And then gpfsgui apparently crashes and systemd automatically restarts it.


Steve Losen
Research Computing
University of Virginia
s...@virginia.edu   434-924-0640

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to