[Puppet-dev] Re: Puppet RFC 23 - XPP Files

Thomas Gelf Fri, 01 Apr 2016 16:19:32 -0700

Hi Henrik,

thanks a lot for your response!


Am 01.04.2016 um 20:21 schrieb Henrik Lindberg:
> The C++ implementation is several orders of magnitudes faster than the
> ruby implementation. i.e. something silly like tens of thousands of
> times faster.

No doubt on this, I believe you without any benchmark.

> The Ruby lexing/parsing and validation alone can take minutes on a
> complex set up. We have shown earlier though benchmarks that lexing
> alone is a bottleneck in any catalog compilation - every optimization
> there contributes greatly to the bottom line.

Could you share some details on this? What kind of catalogs are you
talking about? How many resources, parameters, how large are they - and
what makes them so large and slow? Still, no doubt that C++ will be able
to lex the same catalogs in a fraction of the time.

> We have already measured the approach. The benefit on the ruby side is
> that the lexing is delegated to a native implementation that reads
> binary. A spike was performed with Ruby Marshal, which also compared to
> a native MsgPack.

Ok, so basically a linked c-based lexer could give the same performance
boost? Yes, I know, JRuby. But still, could this be true?

> The main point here is that we are transitioning to a full
> implementation of the puppet catalog compiler to C++ ... The use of XPP
> makes this possible.

This is where I started to feel no longer comfortable while reading the
proposal. No caching mechanism that helped my in Puppet comes to my
mind, but I could immediately tell a lot of anecdotes involving severe
Puppet issues breaking whole environments just because of caching issues.

> We are happy if we initially only get 5-10% out of this...

And this is where I currently disagree. Very often I invest lots of time
for just 1%. But being able to run without a fragile caching layer could
be worth even 50% as long as I'm able to scale. When someone has to stop
a deployment chain because he needs to troubleshoot a caching layer,
lot's of people are sitting around and cannot work. Ask them whether
they would have preferred to buy more hardwar.

> We are hoping for more though.

I hope you're pretty confident on this ;)

> That is not the intent (drop all Ruby things) - our plan is to make a
> smooth transition. All "boil the ocean" strategies tends to fail, so
> backwards compatibility and gradual change is important to us.

Eeeeeeh... Sorry, this is slightly OT, but by the end it isn't. This
"transition" is the root cause for a proposal with the potential for a
lot of additional trouble. You do not want to "drop all the Ruby
things", but you want to have a smooth transition. Without knowing where
this transition should lead this sounds like a contradiction to me.

So, where will this transit phase lead to? That's IMO the question that
many would love to see answered. Will ruby still be there? So where is
the transition? If it won't, how would it's successor look like? I guess
you know what I mean, please enlighten us!

>> In a current Puppet ecosystem a C++ parser able to generate an AST from
>> a .pp file to me still seems far from anything that could completely
>> replace the current Ruby-based parser in a helpful way very soon. At
>> least not in a real-world environment with lot's of modules, custom
>> functions and external data sources, often provided by custom lookup
>> functions. At least not in a way that would bring any benefit to the
>> average Puppet user.
>
> The goal is to do this transparently.

Sorry, couldn't follow you. Referring what?

>> So, to me the former one remains a key question to the performance
>> benefit we could get from all this. As long as the Ruby runtime is
>> supported, I do not really see how this could work out. But this is just
>> a blind guess, please prove me wrong on this. ... But then we should
>> add something else to the big picture: how should we build custom
>> extensions and interfaces to custom data in the future? Forking plugins?
>
> That topic is indeed a big topic, and one that will continue as we are
> working towards a C++ based environment. The key here is
> interoperability where extensions are supported in Ruby, or in a
> language it makes sense to implement them in.

Shouldn't those questions be answered first? Aren't external data
lookups, Hiera, Database persistence, plugin-sync, file-shipping and all
the rest still far more expensive than the lexer? I would love to
understand how my I should expect to do my daily work in a world unless
the "smooth transition away from Ruby".

It's hard to judge the value of a brick without knowing how the expected
building should look like.

> Expect to see a lot more about this later in the game.

I'm sure I will. But Eric asked for feedback on XPP right now ;)

> It is anything but academic. What we could do, but are reluctant to do
> is to link the C++ parser into the ruby runtime...

I would immediately support that approach!

> ...it would still need to pass the native/ruby object barrier - which
> XPP is handling - if linked into memory it would just be an internal
> affair.

Correct. I have no problem with this part of "XPP". Eric presented it as
"something like pyc", being pre-parsed and therefore behaving like a
caching layer. This week I worked for a European national bank, brought
them Puppet Enterprise, deployments are rare and well planned. It would
work for them. In ten days I work for a customer where I see Puppetfile
commits every two minutes, r10k and more, OpenSource Puppet, all
environments changing and moving all the time. Not only wouldn't they
benefit from some "intelligent" caching layer. I bet they would suffer.
Badly.

So: C++ Lexer -> fine. Linked into Ruby -> my preferred variant. Using
an XPP-like interface: also fine. ".pyc"-like precaching: no. This is
what I'm completely against right now, this is where I see no real
advantage. Please postpone this, do not even suggest to store those
files. Let the lexer grow and get mature, then let's re-evaluate whether
polluting our modules (or mirrored structures) with all those files
would make any sense.

>> But please do forget that the extensibility of a tool is one of the key
>> features of any OpenSource software. ...breaking them is a no-go.
>
> Backwards compatibility and interop is of the utmost concern. We belive
> that breaking things apart and specifying good APIs and providing well
> performing communication between the various part of the system is key
> from moving away from the now quite complicated and slow monolithic
> implementation in Ruby.

Cool! I know I repeat myself, but could you already leak some details
how this Ruby-less "interop" will look like?

>> * longevity of file formats: ... An AST would per definition be a lot
>> more fragile. Why should we believe that those cache files would survive
>> longer?
>
> Because the are well defined as opposed to how things were earlier where
> things just happened to be a certain way because of how it was
> implemented. Knowing what something means is the foundation that allows
> it to be transformed. And when something is "all data" as opposed to
> "all messy code", it can be processed by tools.

I would mostly agree, but experience teaches me to not trust such
statements. And your problem is: an AST is not data. It cannot be
represented in a defined structure. And we are in a phase where even
data types are still subject to change, with lot's of new related
features in 4.4. All this would affect an AST, wouldn't it?

This wouldn't be an issue for the "C++ is our lexer" approach, but it is
obviously essential when XPP will be used as cache files, designed to be
shipped with modules.

> As an example - what makes things expensive in Ruby is creation of many
> object and garbage collection. (in lexing, each and every character in
> the source needs to be individually processed... When this is done with
> a C++ serializer all of the cost is on the serializing side...

Ruby didn't impress me with it's unserialization speed either. So some
cost will still be there in our overall picture. I blindly believe that
the C++ lexer is ways faster. But the only number I'm interested in is
the the difference between "catalog built and shipped by Ruby" and
"catalog built while being lexed with c++, serialized, unserialized with
Ruby and shipped with Clojure". That's the real saving.

> (These secondary effects have not been benchmarked in puppet, but has
> proven to be very beneficial in implementations we have used in the past).

Would be interesting. Languages behaving similar have proven to
outperform "better" ones in specific use cases even if wasting a lot
more memory. But honestly, no, it doesn't really interest me. But I'd
love to learn more about what kind of catalogs you where talking about
when you are facing minutes! of lexing time.

Even lot's of .pp files summing up to a few thousand single resources
shouldn't require more than 10-30 MB of lexing memory (blind guess,
didn't measure) and more than 3 seconds of parsing/validation time in
Ruby. None of the large environments I'm playing with are facing such
issues.

Disclaimer: all "my" large ones are still running 3.x, so no idea
whether 4.x and/or Puppet Server is so much slower - but I don't think
so. And usually when catalogs tend to have tens of thousands of
resources the root cause is quickly identified and easily replaced with
a cheaper approach. Something like "Use a custom function, aggregate on
the master, ship a single file instead of thousands" more than once
helped to bring Puppet runs from lasting more than half an hour down to
10 seconds.

Back to my question: could you let us know what kind of catalogs tend to
require minutes of lexing time?

> These concerns are shared. It is the overall process more than the lower
> level technical things that I worry about getting right.

:)

> The requirements and exactly how/when/where XPPs gets created and used
> will require an extra round or two of thought and debate.

Agreed. C++ Lexer, AST handed over to Ruby, linked or not: go for it.
XPPs on my disk: please not. Not yet. Not unless we have more experience
with the new lexing construct. Not unless we know how to tackle various
potential caching pitfalls in endless customized variants of Puppet
module deployments.

> Thanks you Thomas for all of the valuable comment and insights.

Thank you for reading all this, Henrik - and thanks a lot for sharing
your thoughts!

Cheers,
Thomas


-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/ndmvks%245hr%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

[Puppet-dev] Re: Puppet RFC 23 - XPP Files

Reply via email to