Is reload-agents known to be unreliable, if mcollective has lost its STOMP 
connection?

Let me explain...:

When I run "/etc/init.d/mcollective reload-agents", it sends a USR1 signal 
to mcollectived
to cause it to reload its agents.

Usually, this works fine.   But if I do this when the mcollectived has lost 
its STOMP connection
(because I restart RabbitMQ server at around the same time) the results are 
unreliable.  It may
work okay, or it may leave mcollectived with some missing agents/plugins. 
 For example,
here is a fragment from /var/log/mcollective.log during a failure case:

I, [2015-11-18T15:15:23.735468 #11689]  INFO -- : rabbitmq.rb:15:in 
`on_connected' Connected to stomp://mcollective@ms1:61613
E, [2015-11-18T15:19:50.982806 #11689] ERROR -- : rabbitmq.rb:30:in 
`on_miscerr' Unexpected error on connection stomp://mcollective@ms1:61613: 
es_recv: connection.receive returning EOF as nil - resetting connection.
I, [2015-11-18T15:19:50.985885 #11689]  INFO -- : rabbitmq.rb:10:in 
`on_connecting' TCP Connection attempt 0 to stomp://mcollective@ms1:61613
I, [2015-11-18T15:19:50.993417 #11689]  INFO -- : rabbitmq.rb:25:in 
`on_connectfail' TCP Connection to stomp://mcollective@ms1:61613 failed on 
attempt 0
I, [2015-11-18T15:19:56.398467 #11689]  INFO -- : runner.rb:24:in 
`initialize' Reloading all agents after receiving USR1 signal
E, [2015-11-18T15:19:56.400925 #11689] ERROR -- : rabbitmq.rb:30:in 
`on_miscerr' Unexpected error on connection stomp://mcollective@ms1:61613: 
es_oldrecv: receive failed: Stomp::Error::NoCurrentConnection
I, [2015-11-18T15:19:56.401329 #11689]  INFO -- : rabbitmq.rb:10:in 
`on_connecting' TCP Connection attempt 0 to stomp://mcollective@ms1:61613
I, [2015-11-18T15:19:56.444731 #11689]  INFO -- : rabbitmq.rb:15:in 
`on_connected' Connected to stomp://mcollective@ms1:61613
E, [2015-11-18T15:19:57.778045 #11689] ERROR -- : agents.rb:138:in 
`dispatch' Execution of rpcutil failed: No plugin rpcutil_agent defined
E, [2015-11-18T15:19:57.778889 #11689] ERROR -- : agents.rb:139:in 
`dispatch' /usr/lib/ruby/site_ruby/1.8/mcollective/pluginmanager.rb:73:in 
`[]'

In that case, I restarted the RabbitMQ server, ran "/etc/init.d/mcollective 
reload-agents"
and then ran an mco command that tried to use the rpc_util agent.

You'll notice that after it had supposedly reloaded all agents, mcollective 
seemed to
no longer have the "rpcutil_agent" plugin.  This situation persisted until 
I ran
reload-agents again.

Has anyone seen anything like this?  Is this a known bug?

I can't find an existing bug for this. There was an old one, way back, 
where the process
actually died in similar circumstances: 
https://projects.puppetlabs.com/issues/8753

There is another unrelated ticket, where the first comment mentions that 
the USR1 handling
"doesn't work too well anyway because ruby": 
https://tickets.puppetlabs.com/browse/MCO-328

Any suggestions?   Is there some way I can work around this?

Thanks in advance, for any ideas or information on this.

-- 
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify the system manager. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/3da97f6f-15fb-4248-b2ff-c9d0fb670937%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to