Issue #4832 has been updated by Daniel Pittman.

Just a side note: RFC 2647, the JSON standard, requires that the content be 
strictly UTF-8, nothing else.
(This is one of the reasons that JSON is not a terribly efficient protocol for 
non-textual data :)

My personal preference would be to require UTF-8, and find some other solution 
to binary data transport, but I doubt that will fly.

The next least-worst solution would be to encode data that violates the JSON 
spec (eg: binary data, including 8-bit encodings) either in the encoder, or in 
the layer above that.  (In the encoder is probably a worse choice because it 
forces any future encoder to also be hacked to support this.)

(It is a pity that PSON insists on encoding valid UTF-8 content in the 
generator, since that is another inefficiency in using JSON, especially given 
that the parser will accept content outside that space.  It is an ugly 
work-around for other people writing poor code. :/ )

----------------------------------------
Feature #4832: Character encodings support for PSON
https://projects.puppetlabs.com/issues/4832

Author: Felix Frank
Status: Ready for Testing
Priority: Normal
Assignee: Markus Roberts
Category: parser
Target version: 2.6.3
Affected Puppet version: 2.6.1
Keywords: encoding, pson, serialization, utf8
Branch: MarkusQ:tickets/2.6.x/4832


PSON is currently hardcoded to expect UTF8 in manifests and output UTF8 on 
client machines.

The former is a genuine problem when manifests include non-UTF8 non-ASCII 
characters (e.g. in content arguments for files), the latter is annoying if 
client OSes or applications have no UTF8 support.

I'm working on a patch that will allow the user to control (via puppet.conf)
a) what non-UTF8 encoding the puppet master will accept as a fallback encoding 
(PSON currently throws errors when trying to serialize non-UTF8 characters)
b) to which encoding PSON will deserialize in the client

I'm gunning for a "fallback" solution in the master, because a minimal change 
will leave the UTF8<->UTF16 as is and add an additional iconv layer. However, 
in most cases UTF8 is fine, because it handles pure ASCII without problems. So 
the chosen native encoding is really a fallback for strings that raise UTF8 
errors (because they are in fact not UTF8).

Does this even make sense, or am I on the road to crazy town?


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to