On Fri, Oct 24, 2014 at 2:47 AM, Erik Dalén <[email protected]>
wrote:

> On 24 October 2014 03:24, Henrik Lindberg <[email protected]>
> wrote:
>
>> On 2014-24-10 2:04, Andy Parker wrote:
>>
>>> A while ago we removed support for puppet to *send* YAML on the network.
>>> At the same time we converted to using safe_yaml for receiving YAML in
>>> order to keep compatibility with existing agents. Instead of YAML all of
>>> the communication was done with PSON, which is a variant of JSON that
>>> has been in use in puppet since at least 2010. As far as I understand
>>> PSON started out as simply a vendored version of json_pure. The name
>>> PSON was apparently because rails would try to patch anything named
>>> JSON, and so they needed to name it something different to stop that
>>> from happening (that is all hearsay, so I don't know how truthful it is).
>>>
>>> Over time PSON started to evolve. Little changes were made to it here
>>> and there. The largest change came about because of
>>> http://projects.puppetlabs.com/issues/5261. The changes for that ticket
>>> removed the restriction that only valid UTF-8 could be sent in PSON,
>>> which opened the door to a) binary data as file contents and b)
>>> absolutely no control over what encodings puppet was using. Over time
>>> there have been a large number of issues that have been related to not
>>> keeping track of what encoding puppet is dealing with.
>>>
>>> I'd like to move us away from PSON and onto a standard format. YAML is
>>> out of the question because it is either slow and unsafe (all of the
>>> YAML vulnerabilities) or extremely slow and safe (safe_yaml).
>>> MessagePack might be nice. It is pretty well specified, has a fairly
>>> large number of libraries written for it, but it doesn't do much to help
>>> us solve the wild west of encoding in puppet. In MessagePack there
>>> aren't really any enforcements of string encodings and everything is
>>> treated as an array of bytes.
>>>
>>> In order to keep consistency across various puppet projects we'll be
>>> going with JSON. JSON requires that everything is valid UTF-8, which
>>> gives us a nice deliberateness to handling data. JSON is pretty fast
>>> (not as fast as MessagePack) and there are a lot of libraries if it
>>> turns out that the built in json isn't fast enough (puppet-server could
>>> use jrjackson, for instance).
>>>
>>> So what all would be changing?
>>>
>>>    1. Network communication that is using PSON would move to JSON
>>>    2. YAML files that the master and agent write would move to JSON
>>> (node, facts, last_run_summary, state, etc.).
>>>    3. A new exec node terminus would be written to handle JSON, or the
>>> existing one would be updated (check the first byte for '{').
>>>
>>> That is just some of the changes that will need to happen. There will be
>>> a ripple of other changes based on the fact that JSON has to be UTF-8.
>>>
>>>    1. A new "encoding" parameter on File and a base64() function.. This
>>> will allow transferring non-UTF-8 data as file content until we can get
>>> a new catalog structure that allows tracking data types and more changes
>>> to the language to differentiate Strings from Blobs.
>>>
>>
>> I would like us to add a Binary datatype upfront instead of doing the
>> base64 encoding in the puppet code. Instead, it is the serialization
>> formats responsibility to transform it into a form that can be transported.
>> A JSON in text form can then do the base64 encoding. A MsgPack / JSON can
>> instead use the binary directly.
>>
>> Even if our first cut of this always performs a base64 encoding the user
>> logic does not have to change.
>>
>> Thus, instead of calling base64(content) and setting the encoding in the
>> File resource, a Binary is created directly with a binary(encoding,
>> content) function.
>>
>
> How do you differentiate between an encoded binary string and a regular
> string in the JSON though?
> You would need some sort of annotation, and if that is inside the string
> (which it is in the content parameter of files already btw) you might need
> a way to escape it to be able to have a regular string that contains that
> annotation stuff.
>

I talked to Henrik about this and his idea is that we make file content a
special case. We write a binary() function that takes a String and produces
a hash of { "encoding" => ..., "data" => ... } (or something like that) in
the serialized form. Then the file content is written to allow either a
string or a hash of that structure. We could even implement this as a type
in the puppet language and update the serializer to do that. Perhaps we
should also create a new binary_file() function so that non-UTF-8 values
don't leak in via file().


>
> --
> Erik Dalén
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/CAAAzDLeNghCj3M61S%2BYVEdEEUkDB-Nkq3iVaOJBpvyCE1LAn2A%40mail.gmail.com
> <https://groups.google.com/d/msgid/puppet-dev/CAAAzDLeNghCj3M61S%2BYVEdEEUkDB-Nkq3iVaOJBpvyCE1LAn2A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Andrew Parker
[email protected]
Freenode: zaphod42
Twitter: @aparker42
Software Developer

*Join us at **PuppetConf 2015, October 5-9 in Portland, OR - *
http://2015.puppetconf.com
*Register early to save 40%!*

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CANhgQXv41VrftSCBTQmXhqqzFwPbaN0yTawX3ULgyRLsDW9bDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to