Issue #4832 has been updated by Felix Frank.
To clarify, my current prototypical patch against the 0.25.5 master is
<pre>
--- puppet/external/pson/pure/generator.rb.orig 2010-09-20 17:59:26.000000000
+0200
+++ puppet/external/pson/pure/generator.rb.latin1 2010-09-22
14:46:13.000000000 +0200
@@ -374,6 +374,8 @@
# \u????.
def to_pson(*)
'"' << PSON.utf8_to_pson(self) << '"'
+ rescue GeneratorError
+ '"' << PSON.utf8_to_pson(Latin1toUTF8.iconv(self)) << '"'
end
# Module that holds the extinding methods if, the String module is
--- puppet/external/pson/pure.rb.orig 2010-09-22 13:59:25.000000000 +0200
+++ puppet/external/pson/pure.rb.latin1 2010-09-22 14:00:29.000000000 +0200
@@ -9,6 +9,8 @@
UTF16toUTF8 = Iconv.new('utf-8', 'utf-16be') # :nodoc:
# An iconv instance to convert from UTF16 Big Endian to UTF8.
UTF8toUTF16 = Iconv.new('utf-16be', 'utf-8') # :nodoc:
+ # An iconv instance to convert from Latin1 to UTF8.
+ Latin1toUTF8 = Iconv.new('utf-8', 'LATIN1') # :nodoc:
UTF8toUTF16.iconv('no bom')
rescue LoadError
# We actually don't care
</pre>
It allows for manifests containing Latin1. A configurable patch should work
along the same lines.
Also note that the first prototype applied the Latin1 recoder invariably, which
had hideous results wrt. master performance.
----------------------------------------
Feature #4832: Character encodings support for PSON
http://projects.puppetlabs.com/issues/4832
Author: Felix Frank
Status: Unreviewed
Priority: Normal
Assignee:
Category: parser
Target version: 2.6.x
Affected version: 2.6.1
Keywords: encoding, pson, serialization, utf8
Branch:
PSON is currently hardcoded to expect UTF8 in manifests and output UTF8 on
client machines.
The former is a genuine problem when manifests include non-UTF8 non-ASCII
characters (e.g. in content arguments for files), the latter is annoying if
client OSes or applications have no UTF8 support.
I'm working on a patch that will allow the user to control (via puppet.conf)
a) what non-UTF8 encoding the puppet master will accept as a fallback encoding
(PSON currently throws errors when trying to serialize non-UTF8 characters)
b) to which encoding PSON will deserialize in the client
I'm gunning for a "fallback" solution in the master, because a minimal change
will leave the UTF8<->UTF16 as is and add an additional iconv layer. However,
in most cases UTF8 is fine, because it handles pure ASCII without problems. So
the chosen native encoding is really a fallback for strings that raise UTF8
errors (because they are in fact not UTF8).
Does this even make sense, or am I on the road to crazy town?
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://projects.puppetlabs.com/my/account
--
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en.