No, this is not a known issue.  Please send more details.  Is there
anything relevant in the munge log files after a crash?

-Chris


On Sun, 2008-05-18 at 08:10am EDT, Jeff Squyres wrote:
> 
> Greetings.
> 
> I have been using munge in a SLURM environment on my MPI development  
> cluster at Cisco for quite a while.  I have noticed over the past year  
> or so that a munge 0.5.8 daemon on a compute node sometimes just  
> randomly dies, leaving the slurmd on that node unable to communicate  
> with the slurmctld on the cluster head node.  SLURM therefore thinks  
> that the node is down.
> 
> Over the past month, this has been happening to about a dozen nodes a  
> week.  To this point, I haven't been paying closer attention than that.
> 
> Is this a known issue?  I can send more details if it is not.
> 
> FWIW, here's a summary of my setup:
> 
> - RHEL4U4 on all machines
> - SLURM 1.3.1 (SLURM has been steadily upgraded over time, staying  
> more-or-less current)
> - Using Perceus to image/provision the back-end nodes
> - Dell 1950 Intel Xeon servers (a few different specific flavors)
> 
> -- 
> Jeff Squyres
> Cisco Systems

_______________________________________________
munge-users mailing list
[email protected]
https://mail.gna.org/listinfo/munge-users

Reply via email to