[Puppet-dev] Corrupt YAML client_state due to high load - how to fix?

Peter Meier Tue, 09 Dec 2008 11:33:45 -0800

Hi

it looks like it can happen that a node-yaml for a certain node gets
broken. I had this now already a small amount of times and every time
only a few (2-3) nodes were affected.


So whats the actual problem?

Suddenly I find Log entries like:

Tue Dec 09 15:34:27 +0100 2008 Puppet (err): Could not read YAML data
for node foobar: syntax error on line 11, col 14: `  xen_domains: "3"'

in the puppetmaster.log and the master can't compile the node -> the
node therefore won't get newer manifests, however it looks like the node
itself gets in a corrupetd state and is unable to apply a cached manifest.

I can fix this problem by deleting the yaml file of that certain node in
$puppetmaster_dir/yaml/node/ .

It often looks like that the master had a high load when this corrupt
occurs. However I couldn't yet find a way to reproduce it, but from
discussion in IRC it looks like other people also have randomly this
problem. Randomly as it's not always the same node that has this problem
and randomly that it happens very rarely.

So this looks certainly like a bug. However I was unsure if the data I
gathered until now might be sufficient to file a bug. As well as I was
in this more something-happens-magically-situation I'd rather like to
investigate a bit more and maybe even come up with a solution or at
least with an idea for a solution.

It looks like the yaml data got broken, as it might have happen due to
the highload that there have been problems during the transmission or
writing. Deleting the corrupt YAML file fixes the problem and as far as
I saw it doesn't have any impact on the next run of the node.
After examining the logs on the master and the client, it looks like the
problem first occurs on the master. During the time it happened the
first time it might be reasonable that the master had a very high load.

A solution I thought of might be to simply delete the yaml file on the
master. The client could then exit with an error (like the present one)
and if it rerun the next time everything would be fine.
But this might be not the right way to fix. As I can't yet see when the
yaml file is transferred, nor what the actual impact it has on compiling
the manifest etc. I mean we could also simply delete it and restart
again the client-run procedure (if that is possible), so we can fix the
problem within a client-run (maybe with a max retries of 3).
Another option might be to check if the yaml data get stored correctly
and if not and if the yaml in the memory is still correct rewrite it,
otherwise request it again from the client.
Another idea I had is that it might be a problem in the yaml lib of ruby
or whatever.

So do you guys think if this is certainly a bug and what would be the
best location to look for the actual problem and what might be the best
solution for it?

Testing the solution would be very easy: simply corrupt the yaml file
and see if puppet behaves the expected way.
However I'm yet really unsure how to reproduce the actual cause.

thanks for additional ideas or information. If I have a more concrete
idea what might be the actual source of the problem and what might be
the best way to fix the problem I'm more confident to file a bug.

cheers pete

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] Corrupt YAML client_state due to high load - how to fix?

Reply via email to