Re: [Puppet-dev] Native encodings support in PSON

Daniel Pittman Thu, 30 Sep 2010 04:29:22 -0700

Brice Figureau <[email protected]> writes:
> On Thu, 2010-09-30 at 10:57 +0200, Felix Frank wrote:
>> [discussion elided]
>> The central question is - are such manifests to be acceptable or not?
>> If I'm reading you correctly, you suggest that all non-ASCII content
>> should be required to be escaped. Is that notion shared by the community?


For what it is worth, and based on dealing with a whole lot of systems
handling international and 8-bit locale data over the years:

The path of least insanity is to standardize on a single encoding in the
internals of the product, and do encoding / decoding at the very edges.
Ideally, also, as infrequently as possible.

> To me, the PSON stuff is a transport system. It should be completely
> agnostic of what you put in your manifest.

*nod*  I absolutely agree with this: string content here would be either
UTF-8, or "raw bytes without encoding", sensibly.  Allowing a *range* of
encodings invites pain, suffering, and confusion.


Oh, and since PSON is JSON the *standard* says that it will be "Unicode", and
suggests UTF-8 as the default.  That allows for UTF-16 and friends, but *not*
for Latin1 out of the box.

> We should be able to put binary data like this and Puppet should do what is
> necessary to transport the data verbatim to the other end:
>
> file {
>  "/tmp/binary": content => file("binary-file")
> }
>
> Note, we have the exact same issue with templates. We should be able to
> transport templates verbatim independently of their encodings.

*nod*  To my mind the best approach would be:

1. Allow some specification of manifest file (*.pp) encoding, but decode that
   into whatever internal Unicode representation your Ruby uses[1] during
   reading so that everything (AST, resources, etc) are internal-Unicode.

2. Allow some specification of the encoding for types and providers:

   # ...or do I want 'filename_encoding' and 'content_encoding'?
   file { "/etc/mëtäl": encoding => 'ISO-8859-1', content => 'foo' }
   File { encoding => 'ISO-8859-15' }

3. Do our level best to make it easy to get encoding handling right when you
   write Ruby stuff in the puppet world, get it right in the shipped types and
   providers, and just assume Unicode everywhere else.

> Now, we should be clear in the documentation (if that's not the case) if we
> require double or single quoted string in manifests to be UTF-8 or ascii
> only. Of course, it would be great if puppet could auto-detect the manifests
> encoding :)

I would strongly suggest that a rule of UTF-8 unless you have an explicit
encoding tag in the file, or otherwise have it explicitly specified by the
user.  Magic tends toward pain and suffering here.


Thankfully the world is getting better at just using a Unicode of some sort,
so local 8-bit encodings are less and less common.  Supporting this sort of
stuff without going insane is ... painful, at best, even when you control both
ends of the spectrum.

        Daniel

Just be thankful y'all are using Ruby, a language of the Unicode generation,
rather than Perl, a language that predates it, so is full of a thousand
half-baked solutions to little parts of it, all treading on each other and
breaking things all over the place.

Footnotes: 
[1]  I assume that MRI and JRuby both use the same internal encoding, but
     have not actually checked.

-- 
✣ Daniel Pittman            ✉ [email protected]            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Re: [Puppet-dev] Native encodings support in PSON

Reply via email to