On Fri, Oct 24, 2014 at 2:47 AM, Erik Dalén <[email protected]> wrote:
> On 24 October 2014 03:24, Henrik Lindberg <[email protected]> > wrote: > >> On 2014-24-10 2:04, Andy Parker wrote: >> >>> A while ago we removed support for puppet to *send* YAML on the network. >>> At the same time we converted to using safe_yaml for receiving YAML in >>> order to keep compatibility with existing agents. Instead of YAML all of >>> the communication was done with PSON, which is a variant of JSON that >>> has been in use in puppet since at least 2010. As far as I understand >>> PSON started out as simply a vendored version of json_pure. The name >>> PSON was apparently because rails would try to patch anything named >>> JSON, and so they needed to name it something different to stop that >>> from happening (that is all hearsay, so I don't know how truthful it is). >>> >>> Over time PSON started to evolve. Little changes were made to it here >>> and there. The largest change came about because of >>> http://projects.puppetlabs.com/issues/5261. The changes for that ticket >>> removed the restriction that only valid UTF-8 could be sent in PSON, >>> which opened the door to a) binary data as file contents and b) >>> absolutely no control over what encodings puppet was using. Over time >>> there have been a large number of issues that have been related to not >>> keeping track of what encoding puppet is dealing with. >>> >>> I'd like to move us away from PSON and onto a standard format. YAML is >>> out of the question because it is either slow and unsafe (all of the >>> YAML vulnerabilities) or extremely slow and safe (safe_yaml). >>> MessagePack might be nice. It is pretty well specified, has a fairly >>> large number of libraries written for it, but it doesn't do much to help >>> us solve the wild west of encoding in puppet. In MessagePack there >>> aren't really any enforcements of string encodings and everything is >>> treated as an array of bytes. >>> >>> In order to keep consistency across various puppet projects we'll be >>> going with JSON. JSON requires that everything is valid UTF-8, which >>> gives us a nice deliberateness to handling data. JSON is pretty fast >>> (not as fast as MessagePack) and there are a lot of libraries if it >>> turns out that the built in json isn't fast enough (puppet-server could >>> use jrjackson, for instance). >>> >>> So what all would be changing? >>> >>> 1. Network communication that is using PSON would move to JSON >>> 2. YAML files that the master and agent write would move to JSON >>> (node, facts, last_run_summary, state, etc.). >>> 3. A new exec node terminus would be written to handle JSON, or the >>> existing one would be updated (check the first byte for '{'). >>> >>> That is just some of the changes that will need to happen. There will be >>> a ripple of other changes based on the fact that JSON has to be UTF-8. >>> >>> 1. A new "encoding" parameter on File and a base64() function.. This >>> will allow transferring non-UTF-8 data as file content until we can get >>> a new catalog structure that allows tracking data types and more changes >>> to the language to differentiate Strings from Blobs. >>> >> >> I would like us to add a Binary datatype upfront instead of doing the >> base64 encoding in the puppet code. Instead, it is the serialization >> formats responsibility to transform it into a form that can be transported. >> A JSON in text form can then do the base64 encoding. A MsgPack / JSON can >> instead use the binary directly. >> >> Even if our first cut of this always performs a base64 encoding the user >> logic does not have to change. >> >> Thus, instead of calling base64(content) and setting the encoding in the >> File resource, a Binary is created directly with a binary(encoding, >> content) function. >> > > How do you differentiate between an encoded binary string and a regular > string in the JSON though? > You would need some sort of annotation, and if that is inside the string > (which it is in the content parameter of files already btw) you might need > a way to escape it to be able to have a regular string that contains that > annotation stuff. > I talked to Henrik about this and his idea is that we make file content a special case. We write a binary() function that takes a String and produces a hash of { "encoding" => ..., "data" => ... } (or something like that) in the serialized form. Then the file content is written to allow either a string or a hash of that structure. We could even implement this as a type in the puppet language and update the serializer to do that. Perhaps we should also create a new binary_file() function so that non-UTF-8 values don't leak in via file(). > > -- > Erik Dalén > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/puppet-dev/CAAAzDLeNghCj3M61S%2BYVEdEEUkDB-Nkq3iVaOJBpvyCE1LAn2A%40mail.gmail.com > <https://groups.google.com/d/msgid/puppet-dev/CAAAzDLeNghCj3M61S%2BYVEdEEUkDB-Nkq3iVaOJBpvyCE1LAn2A%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Andrew Parker [email protected] Freenode: zaphod42 Twitter: @aparker42 Software Developer *Join us at **PuppetConf 2015, October 5-9 in Portland, OR - * http://2015.puppetconf.com *Register early to save 40%!* -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CANhgQXv41VrftSCBTQmXhqqzFwPbaN0yTawX3ULgyRLsDW9bDw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
