On 04/04/16 19:42, Thomas Gelf wrote:
Am 04.04.2016 um 03:21 schrieb Henrik Lindberg:
We are happy if we initially only get 5-10% out of this...

And this is where I currently disagree. Very often I invest lots of time
for just 1%. But being able to run without a fragile caching layer could
be worth even 50% as long as I'm able to scale. When someone has to stop
a deployment chain because he needs to troubleshoot a caching layer,
lot's of people are sitting around and cannot work. Ask them whether
they would have preferred to buy more hardwar.


We have to start somewhere and when doing so we want to apply the KISS
principle. The intent is for puppet server to automatically keep the XPP
files in sync. There may be no need for "caching" - it is simply done as
a step in atomic deploy of modified puppet code.

Details apart it seems that what we disagree on is "caching", more on
this (and a related proposal) below. Btw, until now I experienced "real"
atomic deploys only in non-standard environments. Everything that would
currently automagically let files pop up in module directories would
have a very good chance to cause trouble for a lot of environments with
something I'd like to name "custom-tuned" deployments.


I am saying it may be thought of as "non caching" if each start of an environment produces XPP for every .pp file. Since the C++ is much faster we should still come out a head. It could perhaps do that brute force every time. If it is faster to skip doing it if files are up to date, then we may do that.

So, where will this transit phase lead to? That's IMO the question that
many would love to see answered. Will ruby still be there? So where is
the transition? If it won't, how would it's successor look like? I guess
you know what I mean, please enlighten us!


It is too premature to describe this in detail. Happy to share the ideas
which we plan to pursuit though.

At this point we have decided to try an approach where the c++ compiler
will use an RPC mechanism to talk to co-processors. When doing so it
will use the same serialization technology that is used in XPP
(basically based on the Puppet Type System). (Rationale: linking native
things into the same memory image is complex and creates vulnerabilities).

Honestly, I expected a little bit more on this. An "RPC mechanism to
talk to co-processors" is pretty far from what I'd call an idea of how
it should work in future. Sorry for insisting, but this is IMO one of
the most essential questions the "we are moving to C++" strategy should
be able to answer.


It is too early to talk about as there are experiments to carry out, measurements to be made etc and things to think through and express in words.

To me, faster compilation for the cost of slower data lookups might
eventually not give a very good deal, just to give one example of my
concerns. Same for "forking" a replacement for custom functions. Running
co-processors sounds good at first, but after a single catalog build
they would be as dirty as they are now. And there will still be
different environments and versions for the very same "function".
Everything but a new fork at every run would have it's own drawbacks and
issues, wouldn't it?


First, we have new hiera 4 based data providers that now live inside of puppet (for json and yaml). They will be reimplemented in C++ and live inside of the main process. I don't necessarily think that a lookup using a RPC will be any slower than one that is currently doing the same work in Ruby. I think it will come out on par.

Regarding life cycle of co-processors; this is not yet designed. I am inclined to keep things simple and compilation is a one shot, co-processors hang around until compilation is done, then everything is teared down. That fits well with Ruby in general as it starts fast but runs slowly and bloats quickly. Experiments and measurements to support the ideas are naturally required.

As long as "custom functions" or their replacement will be able to
generate resources (what many of them do), they will have strong
influence on the generated catalog. They are also the main reason while
caching catalogs rarely made any sense. Many of the external factors
have an unpredictable influence on them, with Facter and Hiera of course
being the most prominent ones. But back to what this section was all
about: the "ruby successor".

It doesn't have to be immediately, but please try to figure out whether
you could tell us a little bit more on this. Currently for an outsider
it feels like this is still very, very unclear. But going forward and
hoping that this issue would silently vanish over time wouldn't work I
guess.


When presented I prefer if it is coherent and backed with some facts and experiments. I can opine, but I don't think that is particularly valuable as I am also prepared to change my opinion as we learn what will work.

If no decision has yet been taken, why not share some details about the
possible variants that are still in the game? I guess quite some people
would love to help out with their ideas, influenced by their very own
completely different circumstances. Extensibility and ease of
customization to me was one of the key factors of Puppet's success
story. No DSL could ever replace this.


At this point we are more inclined to favor smaller things talking to each other than a new big ball of vax. We are also going to be focused on APIs. Cannot say that we have completely ruled anything out. Atm. we are focusing on:

* Getting the c++ based parser to be on par with the Ruby impl (nothing much will work unless that is done)
* Try to provide value to users sooner rather than later (XPP).

That pretty much leaves functions written in Ruby, and hiera backends.
As a hiera backend/data provider can be thought of as functions as well,
we believe that the RPC based approach will work fine. This also to be
continued after XPP (as we then have the serialization/deserialization
parts in place in both the c++ and ruby implementations.

RPC like in XML-RPC? Like in forking a Plugin? Like in forking a plugin
through a preforking daemon?


Too early to talk about. Very unlikely that it will involve XML ;-)
It needs to be something that is very fast (i.e. this is not based on REST) - technically some kind of IPC mechanism, or possibly a socket.

In the long run, in general, we want it to be possible to express as
much as possible using the Puppet Language itself, and where that is not
practical, that it is easy to integrate an implementation (written in
c++, ruby, or whatever the logic is best written in for the target).

People tend to use custom functions for the most awful hacks you have
ever seen. But it works for them, by the end it solves their very own
problems. That's what they need Puppet for: getting work done. Sometimes
dirty work. There will hardly be SQL-Adapters, memcaches, message
queues, LDAP and and and in the Puppet language.


True. What people invent and share though are typically not advanced things like that. (90% of stdlib can probably be replaced with puppet logic today).

Some clues above to what we are thinking above. Cannot promise when we
have something more concrete to talk about - would love to be able to do
so around next Puppet Conf.

I'll be there :)

I would mostly agree, but experience teaches me to not trust such
statements. And your problem is: an AST is not data. It cannot be
represented in a defined structure. And we are in a phase where even
data types are still subject to change, with lot's of new related
features in 4.4. All this would affect an AST, wouldn't it?

The AST is indeed a data structure, not even a very complicated one.
The rate of change has dramatically gone done. We rarely touch the
grammar and the AST itself, and the last couple of changes have been
additions. This is the benefit of the "expression based approach" taken
in the "future parser" - the semantics are not implemented in the
grammar, and they are not implemented as methods/behavior inside the AST
objects.

We are back to XPP. Sorry, my wording wasn't precise enough I guess. The
non-data "thing" I meant to talk about was the already parsed and
validated AST. So for example I didn't distinguish between lexing and
parsing. What I intended to name when I talked about "AST as data" was
more "what's written to the XPP file". And from what I understood that
will at least be lexed & parsed & validated.


Yes, lexed, parsed, validated.

Probably not evaluated, because that's where from my understanding the
"it's no longer data" starts. If I'm wrong on that: nice. If not, just
out of curiosity: is evaluation in Ruby expensive?


No, not evaluated - that is done as part of compilation.

The "shipped with modules" is what seems to be what most have concerns
about and where it seems that a "produce all of them at deploy time" is
perceived as far less complex.

Let me throw in one more idea. This "produce all of them at deploy time"
will probably only work fine if "deploy" describes a specific (atomic,
as mentioned before) process. Every possible user interference could be
troublesome. Users do not want to see those files, they do not want to
pollute their GIT workdirs.


Yes, exactly.

So why not "hiding" them completely? Think more of a bytecode-cache like
opcache in PHP, rather than .pyc in Python. Doesn't even have to mirror
the module directory structure. Could be flat, structured differently,
eventually binary...  Store "XPP" in a dedicated place, vardir/whatever,
with that "place" referring exactly one specific environment (or module)
in a specific version.


Yes, we will probably do something like that.

Basically extrapolated from benchmarks of small/medium catalog
compilation doing non crazy stuff. It assumes though that very long
compilation times are more of the same rather than user "design flaws"
(managing lots of small things vs. larger, poor design of data lookup,
poor algorithms used for data transformation etc.).

That's what I experienced too. Catalog compilation is slow, but for me
it never turned out to be the root cause of the issues I've met. Sure,
it wouldn't hurt if it was a fraction of a second instead of "a few" or
"a little bit more than a few" seconds. But I never arrived to a point
where I would have said "OMG, we need a faster compiler, otherwise we
are lost".


It is also a matter of scale. While the single threaded performance may not be that important (if it is 5 or 10 seconds). But when that translates to "twice the cost", or "you cannot have that many agents on a single master" it becomes a real problem.

So, I have absolutely no problem with any optimizations getting catalogs
compiled A LOT faster. But I do not want to pay this with the the
potential trouble "yet another caching layer" could bring. I see no
problem with "this is the bytecode cache for module X in version Y". But
I see a lot of problems with "we store related cache-files directly to
our module directories". Imagine someone going there, manually, running
"git checkout v4.0.3" for a specific module. Sure, he (or his tool) is
then doing it wrong. But that's gonna be hard to argue I guess.

To be continued over beers somewhere...

I'd love to join you :)

Cheers.

- henrik

--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/5702B510.8070908%40puppetlabs.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to