[Puppet-dev] Re: Puppet RFC 23 - XPP Files

Thomas Gelf Mon, 04 Apr 2016 10:42:59 -0700

Am 04.04.2016 um 03:21 schrieb Henrik Lindberg:
>>> We are happy if we initially only get 5-10% out of this...
>>
>> And this is where I currently disagree. Very often I invest lots of time
>> for just 1%. But being able to run without a fragile caching layer could
>> be worth even 50% as long as I'm able to scale. When someone has to stop
>> a deployment chain because he needs to troubleshoot a caching layer,
>> lot's of people are sitting around and cannot work. Ask them whether
>> they would have preferred to buy more hardwar.
>>
> 
> We have to start somewhere and when doing so we want to apply the KISS
> principle. The intent is for puppet server to automatically keep the XPP
> files in sync. There may be no need for "caching" - it is simply done as
> a step in atomic deploy of modified puppet code.


Details apart it seems that what we disagree on is "caching", more on
this (and a related proposal) below. Btw, until now I experienced "real"
atomic deploys only in non-standard environments. Everything that would
currently automagically let files pop up in module directories would
have a very good chance to cause trouble for a lot of environments with
something I'd like to name "custom-tuned" deployments.

>> So, where will this transit phase lead to? That's IMO the question that
>> many would love to see answered. Will ruby still be there? So where is
>> the transition? If it won't, how would it's successor look like? I guess
>> you know what I mean, please enlighten us!
>>
> 
> It is too premature to describe this in detail. Happy to share the ideas
> which we plan to pursuit though.
> 
> At this point we have decided to try an approach where the c++ compiler
> will use an RPC mechanism to talk to co-processors. When doing so it
> will use the same serialization technology that is used in XPP
> (basically based on the Puppet Type System). (Rationale: linking native
> things into the same memory image is complex and creates vulnerabilities).

Honestly, I expected a little bit more on this. An "RPC mechanism to
talk to co-processors" is pretty far from what I'd call an idea of how
it should work in future. Sorry for insisting, but this is IMO one of
the most essential questions the "we are moving to C++" strategy should
be able to answer.

To me, faster compilation for the cost of slower data lookups might
eventually not give a very good deal, just to give one example of my
concerns. Same for "forking" a replacement for custom functions. Running
co-processors sounds good at first, but after a single catalog build
they would be as dirty as they are now. And there will still be
different environments and versions for the very same "function".
Everything but a new fork at every run would have it's own drawbacks and
issues, wouldn't it?

As long as "custom functions" or their replacement will be able to
generate resources (what many of them do), they will have strong
influence on the generated catalog. They are also the main reason while
caching catalogs rarely made any sense. Many of the external factors
have an unpredictable influence on them, with Facter and Hiera of course
being the most prominent ones. But back to what this section was all
about: the "ruby successor".

It doesn't have to be immediately, but please try to figure out whether
you could tell us a little bit more on this. Currently for an outsider
it feels like this is still very, very unclear. But going forward and
hoping that this issue would silently vanish over time wouldn't work I
guess.

If no decision has yet been taken, why not share some details about the
possible variants that are still in the game? I guess quite some people
would love to help out with their ideas, influenced by their very own
completely different circumstances. Extensibility and ease of
customization to me was one of the key factors of Puppet's success
story. No DSL could ever replace this.

> That pretty much leaves functions written in Ruby, and hiera backends.
> As a hiera backend/data provider can be thought of as functions as well,
> we believe that the RPC based approach will work fine. This also to be
> continued after XPP (as we then have the serialization/deserialization
> parts in place in both the c++ and ruby implementations.

RPC like in XML-RPC? Like in forking a Plugin? Like in forking a plugin
through a preforking daemon?

> In the long run, in general, we want it to be possible to express as
> much as possible using the Puppet Language itself, and where that is not
> practical, that it is easy to integrate an implementation (written in
> c++, ruby, or whatever the logic is best written in for the target).

People tend to use custom functions for the most awful hacks you have
ever seen. But it works for them, by the end it solves their very own
problems. That's what they need Puppet for: getting work done. Sometimes
dirty work. There will hardly be SQL-Adapters, memcaches, message
queues, LDAP and and and in the Puppet language.

> Some clues above to what we are thinking above. Cannot promise when we
> have something more concrete to talk about - would love to be able to do
> so around next Puppet Conf.

I'll be there :)

>> I would mostly agree, but experience teaches me to not trust such
>> statements. And your problem is: an AST is not data. It cannot be
>> represented in a defined structure. And we are in a phase where even
>> data types are still subject to change, with lot's of new related
>> features in 4.4. All this would affect an AST, wouldn't it?
>>
> The AST is indeed a data structure, not even a very complicated one.
> The rate of change has dramatically gone done. We rarely touch the
> grammar and the AST itself, and the last couple of changes have been
> additions. This is the benefit of the "expression based approach" taken
> in the "future parser" - the semantics are not implemented in the
> grammar, and they are not implemented as methods/behavior inside the AST
> objects.

We are back to XPP. Sorry, my wording wasn't precise enough I guess. The
non-data "thing" I meant to talk about was the already parsed and
validated AST. So for example I didn't distinguish between lexing and
parsing. What I intended to name when I talked about "AST as data" was
more "what's written to the XPP file". And from what I understood that
will at least be lexed & parsed & validated.

Probably not evaluated, because that's where from my understanding the
"it's no longer data" starts. If I'm wrong on that: nice. If not, just
out of curiosity: is evaluation in Ruby expensive?

> The "shipped with modules" is what seems to be what most have concerns
> about and where it seems that a "produce all of them at deploy time" is
> perceived as far less complex.

Let me throw in one more idea. This "produce all of them at deploy time"
will probably only work fine if "deploy" describes a specific (atomic,
as mentioned before) process. Every possible user interference could be
troublesome. Users do not want to see those files, they do not want to
pollute their GIT workdirs.

So why not "hiding" them completely? Think more of a bytecode-cache like
opcache in PHP, rather than .pyc in Python. Doesn't even have to mirror
the module directory structure. Could be flat, structured differently,
eventually binary...  Store "XPP" in a dedicated place, vardir/whatever,
with that "place" referring exactly one specific environment (or module)
in a specific version.

> Basically extrapolated from benchmarks of small/medium catalog
> compilation doing non crazy stuff. It assumes though that very long
> compilation times are more of the same rather than user "design flaws"
> (managing lots of small things vs. larger, poor design of data lookup,
> poor algorithms used for data transformation etc.).

That's what I experienced too. Catalog compilation is slow, but for me
it never turned out to be the root cause of the issues I've met. Sure,
it wouldn't hurt if it was a fraction of a second instead of "a few" or
"a little bit more than a few" seconds. But I never arrived to a point
where I would have said "OMG, we need a faster compiler, otherwise we
are lost".

So, I have absolutely no problem with any optimizations getting catalogs
compiled A LOT faster. But I do not want to pay this with the the
potential trouble "yet another caching layer" could bring. I see no
problem with "this is the bytecode cache for module X in version Y". But
I see a lot of problems with "we store related cache-files directly to
our module directories". Imagine someone going there, manually, running
"git checkout v4.0.3" for a specific module. Sure, he (or his tool) is
then doing it wrong. But that's gonna be hard to argue I guess.

> To be continued over beers somewhere...

I'd love to join you :)

Thomas


-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/ndu91a%24km6%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

[Puppet-dev] Re: Puppet RFC 23 - XPP Files

Reply via email to