Issue #4832 has been updated by Felix Frank.

To clarify, my current prototypical patch against the 0.25.5 master is
<pre>
--- puppet/external/pson/pure/generator.rb.orig 2010-09-20 17:59:26.000000000 
+0200
+++ puppet/external/pson/pure/generator.rb.latin1       2010-09-22 
14:46:13.000000000 +0200
@@ -374,6 +374,8 @@
           # \u????.
           def to_pson(*)
             '"' << PSON.utf8_to_pson(self) << '"'
+          rescue GeneratorError
+            '"' << PSON.utf8_to_pson(Latin1toUTF8.iconv(self)) << '"'
           end
 
           # Module that holds the extinding methods if, the String module is
--- puppet/external/pson/pure.rb.orig   2010-09-22 13:59:25.000000000 +0200
+++ puppet/external/pson/pure.rb.latin1 2010-09-22 14:00:29.000000000 +0200
@@ -9,6 +9,8 @@
     UTF16toUTF8 = Iconv.new('utf-8', 'utf-16be') # :nodoc:
     # An iconv instance to convert from UTF16 Big Endian to UTF8.
     UTF8toUTF16 = Iconv.new('utf-16be', 'utf-8') # :nodoc:
+    # An iconv instance to convert from Latin1 to UTF8.
+    Latin1toUTF8 = Iconv.new('utf-8', 'LATIN1') # :nodoc:
     UTF8toUTF16.iconv('no bom')
   rescue LoadError
     # We actually don't care
</pre>
It allows for manifests containing Latin1. A configurable patch should work 
along the same lines.

Also note that the first prototype applied the Latin1 recoder invariably, which 
had hideous results wrt. master performance.
----------------------------------------
Feature #4832: Character encodings support for PSON
http://projects.puppetlabs.com/issues/4832

Author: Felix Frank
Status: Unreviewed
Priority: Normal
Assignee: 
Category: parser
Target version: 2.6.x
Affected version: 2.6.1
Keywords: encoding, pson, serialization, utf8
Branch: 


PSON is currently hardcoded to expect UTF8 in manifests and output UTF8 on 
client machines.

The former is a genuine problem when manifests include non-UTF8 non-ASCII 
characters (e.g. in content arguments for files), the latter is annoying if 
client OSes or applications have no UTF8 support.

I'm working on a patch that will allow the user to control (via puppet.conf)
a) what non-UTF8 encoding the puppet master will accept as a fallback encoding 
(PSON currently throws errors when trying to serialize non-UTF8 characters)
b) to which encoding PSON will deserialize in the client

I'm gunning for a "fallback" solution in the master, because a minimal change 
will leave the UTF8<->UTF16 as is and add an additional iconv layer. However, 
in most cases UTF8 is fine, because it handles pure ASCII without problems. So 
the chosen native encoding is really a fallback for strings that raise UTF8 
errors (because they are in fact not UTF8).

Does this even make sense, or am I on the road to crazy town?


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to