Hi Henrik, thanks a lot for your response!
Am 01.04.2016 um 20:21 schrieb Henrik Lindberg: > The C++ implementation is several orders of magnitudes faster than the > ruby implementation. i.e. something silly like tens of thousands of > times faster. No doubt on this, I believe you without any benchmark. > The Ruby lexing/parsing and validation alone can take minutes on a > complex set up. We have shown earlier though benchmarks that lexing > alone is a bottleneck in any catalog compilation - every optimization > there contributes greatly to the bottom line. Could you share some details on this? What kind of catalogs are you talking about? How many resources, parameters, how large are they - and what makes them so large and slow? Still, no doubt that C++ will be able to lex the same catalogs in a fraction of the time. > We have already measured the approach. The benefit on the ruby side is > that the lexing is delegated to a native implementation that reads > binary. A spike was performed with Ruby Marshal, which also compared to > a native MsgPack. Ok, so basically a linked c-based lexer could give the same performance boost? Yes, I know, JRuby. But still, could this be true? > The main point here is that we are transitioning to a full > implementation of the puppet catalog compiler to C++ ... The use of XPP > makes this possible. This is where I started to feel no longer comfortable while reading the proposal. No caching mechanism that helped my in Puppet comes to my mind, but I could immediately tell a lot of anecdotes involving severe Puppet issues breaking whole environments just because of caching issues. > We are happy if we initially only get 5-10% out of this... And this is where I currently disagree. Very often I invest lots of time for just 1%. But being able to run without a fragile caching layer could be worth even 50% as long as I'm able to scale. When someone has to stop a deployment chain because he needs to troubleshoot a caching layer, lot's of people are sitting around and cannot work. Ask them whether they would have preferred to buy more hardwar. > We are hoping for more though. I hope you're pretty confident on this ;) > That is not the intent (drop all Ruby things) - our plan is to make a > smooth transition. All "boil the ocean" strategies tends to fail, so > backwards compatibility and gradual change is important to us. Eeeeeeh... Sorry, this is slightly OT, but by the end it isn't. This "transition" is the root cause for a proposal with the potential for a lot of additional trouble. You do not want to "drop all the Ruby things", but you want to have a smooth transition. Without knowing where this transition should lead this sounds like a contradiction to me. So, where will this transit phase lead to? That's IMO the question that many would love to see answered. Will ruby still be there? So where is the transition? If it won't, how would it's successor look like? I guess you know what I mean, please enlighten us! >> In a current Puppet ecosystem a C++ parser able to generate an AST from >> a .pp file to me still seems far from anything that could completely >> replace the current Ruby-based parser in a helpful way very soon. At >> least not in a real-world environment with lot's of modules, custom >> functions and external data sources, often provided by custom lookup >> functions. At least not in a way that would bring any benefit to the >> average Puppet user. > > The goal is to do this transparently. Sorry, couldn't follow you. Referring what? >> So, to me the former one remains a key question to the performance >> benefit we could get from all this. As long as the Ruby runtime is >> supported, I do not really see how this could work out. But this is just >> a blind guess, please prove me wrong on this. ... But then we should >> add something else to the big picture: how should we build custom >> extensions and interfaces to custom data in the future? Forking plugins? > > That topic is indeed a big topic, and one that will continue as we are > working towards a C++ based environment. The key here is > interoperability where extensions are supported in Ruby, or in a > language it makes sense to implement them in. Shouldn't those questions be answered first? Aren't external data lookups, Hiera, Database persistence, plugin-sync, file-shipping and all the rest still far more expensive than the lexer? I would love to understand how my I should expect to do my daily work in a world unless the "smooth transition away from Ruby". It's hard to judge the value of a brick without knowing how the expected building should look like. > Expect to see a lot more about this later in the game. I'm sure I will. But Eric asked for feedback on XPP right now ;) > It is anything but academic. What we could do, but are reluctant to do > is to link the C++ parser into the ruby runtime... I would immediately support that approach! > ...it would still need to pass the native/ruby object barrier - which > XPP is handling - if linked into memory it would just be an internal > affair. Correct. I have no problem with this part of "XPP". Eric presented it as "something like pyc", being pre-parsed and therefore behaving like a caching layer. This week I worked for a European national bank, brought them Puppet Enterprise, deployments are rare and well planned. It would work for them. In ten days I work for a customer where I see Puppetfile commits every two minutes, r10k and more, OpenSource Puppet, all environments changing and moving all the time. Not only wouldn't they benefit from some "intelligent" caching layer. I bet they would suffer. Badly. So: C++ Lexer -> fine. Linked into Ruby -> my preferred variant. Using an XPP-like interface: also fine. ".pyc"-like precaching: no. This is what I'm completely against right now, this is where I see no real advantage. Please postpone this, do not even suggest to store those files. Let the lexer grow and get mature, then let's re-evaluate whether polluting our modules (or mirrored structures) with all those files would make any sense. >> But please do forget that the extensibility of a tool is one of the key >> features of any OpenSource software. ...breaking them is a no-go. > > Backwards compatibility and interop is of the utmost concern. We belive > that breaking things apart and specifying good APIs and providing well > performing communication between the various part of the system is key > from moving away from the now quite complicated and slow monolithic > implementation in Ruby. Cool! I know I repeat myself, but could you already leak some details how this Ruby-less "interop" will look like? >> * longevity of file formats: ... An AST would per definition be a lot >> more fragile. Why should we believe that those cache files would survive >> longer? > > Because the are well defined as opposed to how things were earlier where > things just happened to be a certain way because of how it was > implemented. Knowing what something means is the foundation that allows > it to be transformed. And when something is "all data" as opposed to > "all messy code", it can be processed by tools. I would mostly agree, but experience teaches me to not trust such statements. And your problem is: an AST is not data. It cannot be represented in a defined structure. And we are in a phase where even data types are still subject to change, with lot's of new related features in 4.4. All this would affect an AST, wouldn't it? This wouldn't be an issue for the "C++ is our lexer" approach, but it is obviously essential when XPP will be used as cache files, designed to be shipped with modules. > As an example - what makes things expensive in Ruby is creation of many > object and garbage collection. (in lexing, each and every character in > the source needs to be individually processed... When this is done with > a C++ serializer all of the cost is on the serializing side... Ruby didn't impress me with it's unserialization speed either. So some cost will still be there in our overall picture. I blindly believe that the C++ lexer is ways faster. But the only number I'm interested in is the the difference between "catalog built and shipped by Ruby" and "catalog built while being lexed with c++, serialized, unserialized with Ruby and shipped with Clojure". That's the real saving. > (These secondary effects have not been benchmarked in puppet, but has > proven to be very beneficial in implementations we have used in the past). Would be interesting. Languages behaving similar have proven to outperform "better" ones in specific use cases even if wasting a lot more memory. But honestly, no, it doesn't really interest me. But I'd love to learn more about what kind of catalogs you where talking about when you are facing minutes! of lexing time. Even lot's of .pp files summing up to a few thousand single resources shouldn't require more than 10-30 MB of lexing memory (blind guess, didn't measure) and more than 3 seconds of parsing/validation time in Ruby. None of the large environments I'm playing with are facing such issues. Disclaimer: all "my" large ones are still running 3.x, so no idea whether 4.x and/or Puppet Server is so much slower - but I don't think so. And usually when catalogs tend to have tens of thousands of resources the root cause is quickly identified and easily replaced with a cheaper approach. Something like "Use a custom function, aggregate on the master, ship a single file instead of thousands" more than once helped to bring Puppet runs from lasting more than half an hour down to 10 seconds. Back to my question: could you let us know what kind of catalogs tend to require minutes of lexing time? > These concerns are shared. It is the overall process more than the lower > level technical things that I worry about getting right. :) > The requirements and exactly how/when/where XPPs gets created and used > will require an extra round or two of thought and debate. Agreed. C++ Lexer, AST handed over to Ruby, linked or not: go for it. XPPs on my disk: please not. Not yet. Not unless we have more experience with the new lexing construct. Not unless we know how to tackle various potential caching pitfalls in endless customized variants of Puppet module deployments. > Thanks you Thomas for all of the valuable comment and insights. Thank you for reading all this, Henrik - and thanks a lot for sharing your thoughts! Cheers, Thomas -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/ndmvks%245hr%241%40ger.gmane.org. For more options, visit https://groups.google.com/d/optout.
