Re: [Puppet-dev] Handling failures in prefetch

Nan Liu Thu, 23 Jul 2015 10:43:41 -0700

On Thu, Jul 23, 2015 at 6:09 AM, Trevor Vaughan <[email protected]>
wrote:


> Just out of curiosity, how difficult would it be to defer the resource
> chain until the end of the catalog if a prefetch fails?
>

Rather then retrying I would be happy if prefetch can specify a specific
resource dependency. It will resolve some of the issues I've seen where I
have to split puppet into multiple catalogs or multiple runs.

It may not work in all cases, but it's certainly feasable that the prefetch
> is failing because of some other item in the catalog that hasn't been
> applied and, given the complexity of the system, the amount of effort that
> it would involve to proper chain all resources just isn't worth the cost of
> doing two runs to fix the issue. I'm particularly thinking of things like
> package repositories here where you really don't want to artificially bind
> all packages to your repositories for no good reason and/or you may be
> pulling data from a different source that providers are going to rely on
> (RH Satellite, Pulp, something else?).
>
> Since it's a prefetch, and nothing has yet changed on the system, could
> that execution chain just be deferred until just before the post-execution
> section of the catalog? If this were possible, the impact of a catalog
> failure would, in theory, be minimized with everything that could be
> applied happening before those items that could not.
>
> Even then, I think that all resource failures should be localized to the
> resource, and resource chain, to which they apply. The *best* thing about
> Puppet is the fact that it does as much as it can on every run. That really
> shouldn't be broken in my opinion. My classic example is "Puppet applies my
> security settings across the system, regardless of Apache failing to
> install or configure". In this case, Puppet should apply my security
> settings regardless of some random resource having a prefetch failure.
>
> Circling this all the way around to my favorite pet bug that I never have
> time to work on PUP-4002, in the case where prefetch fails, but the
> provider (or a dependent of the provider) has a way to handle this later,
> recording it for all instance of the resource to query later would be a
> nice to have. I'm thinking of the 'cron' type when I write this since the
> prefetch phase of cron appears to fail if the crontab file for a user does
> not exist. But, it doesn't matter in the least because the provider merrily
> carries on and does what it is supposed to do after raising an alarming,
> but harmless, error message. (Note: I haven't checked the 4.X behavior on
> this yet).
>
> Thanks,
>
> Trevor
>
> On Thu, Jul 23, 2015 at 8:37 AM, Gareth Rushgrove <
> [email protected]> wrote:
>
>> On 23 July 2015 at 03:44, Kylo Ginsberg <[email protected]> wrote:
>> > On Wed, Jul 22, 2015 at 1:37 PM, Branan Riley <[email protected]>
>> wrote:
>> >>
>> >> On Wed, Jul 22, 2015 at 12:44 PM Kylo Ginsberg <[email protected]>
>> >> wrote:
>> >>>
>> >>> On Tue, Jul 21, 2015 at 12:40 PM, Branan Riley <[email protected]
>> >
>> >>> wrote:
>> >>>>
>> >>>> <This is email 2/2 inspired by PUP-4813>
>> >>>>
>> >>>> With puppet 4, we made a change to how prefetch failures are handled
>> >>>> by the transaction[1]. This is actually a pretty good change, as it
>> >>>> prevents puppet from some nasty misbehaviors.
>> >>>>
>> >>>> I still don't think it's the "correct" behavior, though. Aborting the
>> >>>> entire catalog because one resource type is broken doesn't seem like
>> >>>> the best behavior, especially when it comes to reporting[2]
>> >>>>
>> >>>> I feel like the correct behavior is something like this:
>> >>>>
>> >>>> * If an exception escapes a prefetch, mark all resources of that type
>> >>>>   in the catalog as failed and do not try to apply them
>> >>>
>> >>>
>> >>> Background for a question:
>> >>>
>> >>> Wrt exceptions in prefetch, as of 4.0 there is a distinction between
>> the
>> >>> expected/supported exceptions (there are two: Puppet::MissingCommand
>> and
>> >>> LoadError) and everything else.
>> >>>
>> >>> I'm not sure this is *super* well documented, but it is mentioned as a
>> >>> breaking change in the 4.0 release notes:
>> >>>
>> >>>
>> >>>
>> https://docs.puppetlabs.com/puppet/4.0/reference/release_notes.html#stuff-a-lot-of-people-will-notice
>> >>>
>> >>> and the rescue in question is here:
>> >>>
>> >>>
>> >>>
>> https://github.com/puppetlabs/puppet/blob/810624fe6f88d520339173c0e811c5ede4009fac/lib/puppet/transaction.rb#L322
>> >>>
>> >>> So with that in mind, is the proposal here about the handling of the
>> two
>> >>> supported exceptions or the handling of other exceptions?
>> >>>
>> >>>
>> >>> For the two supported exceptions, this seems like a reasonable
>> behavior
>> >>> change and could also speed up catalog application in this failure
>> case.
>> >>>
>> >>>
>> >>> For *other* exceptions, I'm not sure. I walked away from PUP-3656
>> >>> thinking that those two exceptions were now part of the contract we're
>> >>> specifying for providers. If so, a provider that throws any *other*
>> >>> exception has a bug which should be fixed, and we could be making
>> matters
>> >>> worse by trying to soldier on. I.e. this seems like a situation where
>> it's
>> >>> better to fail hard and the user should file a bug, upgrade/downgrade
>> the
>> >>> module, etc.
>> >>
>> >>
>> >> I would say this should be for ALL exceptions. We absolutely should
>> stop
>> >> puppet from soldiering on in ways that could cause problems, but I
>> don't
>> >> think trying to apply other resource types in the catalog is one of
>> those
>> >> ways. Prefetch failure should be a localized catastrophe, not a global
>> one.
>> >>
>> >> The move away from rescuing everything was definitely good, but
>> aborting
>> >> the entire transaction when only a single type is unprefetchable is a
>> bit of
>> >> an over-reaction. We know that instances of that type (really just
>> ones of
>> >> that type/provider pair) are bad, and we absolutely shouldn't apply
>> them.
>> >> The rest of the catalog is still fine, though.
>> >>
>> >> The two exceptions that are supported allow a Puppet run to install any
>> >> packages/gems that a provider might need in order to prefetch.
>> Skipping only
>> >> the resources that would have been prefetched instead of the entire
>> catalog
>> >> serves that same purpose.
>> >
>> >
>> > I can see this argument. I'm still concerned that this would mean we
>> aren't
>> > surfacing bugs that really should be exposed to the light of day. I
>> suppose
>> > we could report the one case with a Warning and the other with an Error.
>> >
>> > But assuming we go ahead and do something like this (I see I'm the only
>> one
>> > with the slightest reservation :)), I'm thinking we should audit how
>> puppet
>> > handles exceptions in providers during catalog application but *outside*
>> > prefetch. It would be nice to be consistent throughout catalog
>> application.
>> >
>> >>
>> >> Aborting the entire catalog also makes remediation with Puppet (when
>> >> puppet could be used for such) impossible, which seems super bad to me.
>> >
>> >
>> > Is it impossible? Couldn't you remediate with a catalog that doesn't
>> include
>> > resources that will tickle prefetch from that provider? Not trying to
>> > quibble (i.e. yes, that could still be a PITA), but making sure I'm not
>> > missing something.
>> >
>> >>
>> >>
>> >>>
>> >>> Btw (and perhaps I should have asked this from the get-go before
>> opining
>> >>> above): what was the exception in PUP-4813 that precipitated this
>> thread?
>> >>
>> >>
>> >> The bad catalog from that ticket set the `target` field of mount to a
>> >> directory, which causes some sort of IO exception to be raised when
>> >> ParsedFile tries to access that directory as a regular file.
>> >
>> >
>> > Ooh fun. So we should fix that to be more robust. Separate ticket than
>> this
>> > thread though.
>> >
>> >>
>> >>
>> >> This is what inspired my bullet point below - ParsedFile should be
>> smart
>> >> enough to know which resources have a bad target and which are OK, and
>> >> appropriately fail only the ones it can't work with.
>> >>
>> >>>>
>> >>>>
>> >>>> * Make it possible to fail resources from *within* a prefetch. This
>> >>>>   lets a sufficiently smart prefetch handle the case when only some
>> of
>> >>>>   its resources are hosed (parsedfile resources with multiple
>> targets,
>> >>>>   for example, might be only partially unfetchable)
>> >>>
>> >>>
>> >>> How would this work? A new hook for providers to implement? If so,
>> it'd
>> >>> be nice to flesh out a couple examples use cases so that we can think
>> >>> through the API with something concrete.
>> >>
>> >>
>> >> It's probably already "doable" by having prefetch mark the returned
>> >> provider instances as bad in some way, and the provider implementation
>> for
>> >> flush could raise an exception. I'd prefer to see something at the
>> >> transaction layer, though. I'll think through what this might look like
>> >> more.
>> >
>> >
>> > Well, definitely some charm to not adding new methods that we have to
>> > sprinkle respond_to? tests around for. So the former approach might just
>> > fly.
>> >
>> > I'd also like to think through whether there are additional motivating
>> use
>> > cases.
>> >
>>
>> For some relevant colour we hit this problem hard with the AWS module.
>> We're prefetching a lot of complex data in some cases and we're going
>> it over the wire - so it going wrong is pretty likely.
>>
>> We worked around that at the time by creating our own Exception
>> inherited from Exception which does get caught by Puppet:
>>
>>
>> https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet_x/puppetlabs/aws.rb#L7-L23
>>
>> And then raising that in cases of StandardError based exceptions being
>> raised in prefetch.
>>
>>
>> https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet/provider/ec2_instance/v2.rb#L34-L36
>>
>> Some more details in this issue too about the changes in 4.0.
>>
>> https://jira.puppetlabs.com/browse/PUP-3656
>>
>> Gareth
>>
>> > Kylo
>> >
>> > --
>> > Kylo Ginsberg | [email protected] | irc: kylo | twitter: @kylog
>> >
>> > PuppetConf 2015 is coming to Portland, Oregon! Join us October 5-9.
>> > Register now to take advantage of the Early Bird discount —save $249!
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "Puppet Developers" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to [email protected].
>> > To view this discussion on the web visit
>> >
>> https://groups.google.com/d/msgid/puppet-dev/CALsUZFHyPv0mGjeE5Xhn-sv0WYXJ2QbDzY802sBe2JJQR%3DLT5g%40mail.gmail.com
>> .
>> >
>> > For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> Gareth Rushgrove
>> @garethr
>>
>> devopsweekly.com
>> morethanseven.net
>> garethrushgrove.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Puppet Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/puppet-dev/CAFi_6y%2BxV9Gk5f_C7S4%2BQ6Oxyk5JmdNsRY%3D2XchPTRqa29ywBQ%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Trevor Vaughan
> Vice President, Onyx Point, Inc
> (410) 541-6699
>
> -- This account not approved for unencrypted proprietary information --
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoVVkP7fo1hesBUn4TSijCqzrHZggfTOvgu5j6SV44DftQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoVVkP7fo1hesBUn4TSijCqzrHZggfTOvgu5j6SV44DftQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CACqVBqCdPApZKPX1EF1N4ouxxV1ocfVXXFqQwjwQ6EnNGoa3SQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet-dev] Handling failures in prefetch

Reply via email to