Issue #4832 has been updated by Felix Frank.
Status changed from Needs design decision to Ready for Testing
After some discussion on the dev list and digging in the pson parser code, it's
becoming clear to me that this is actually a bug.
My guess is that the idea of reporting invalid UTF8 bytes is sound with
ruby1.9, which can enforce string encodings. In ruby1.8, the generator should
ignore binary non-UTF data and forego any substitution.
I propose the following patch:
--- a/lib/puppet/external/pson/pure/generator.rb
+++ b/lib/puppet/external/pson/pure/generator.rb
@@ -69,10 +69,8 @@ module PSON
[\xc2-\xdf][\x80-\xbf] |
[\xe0-\xef][\x80-\xbf]{2} |
[\xf0-\xf4][\x80-\xbf]{3}
- )+ |
- [\x80-\xc1\xf5-\xff] # invalid
- )/nx) { |c|
- c.size == 1 and raise GeneratorError, "invalid utf8 byte: '#{c}'"
+ )+
+ )/nx) { |c|
s = PSON::UTF8toUTF16.iconv(c).unpack('H*')[0]
s.gsub!(/.{4}/n, '\\\\u\&')
}
----------------------------------------
Feature #4832: Character encodings support for PSON
http://projects.puppetlabs.com/issues/4832
Author: Felix Frank
Status: Ready for Testing
Priority: Normal
Assignee:
Category: parser
Target version: 2.6.x
Affected version: 2.6.1
Keywords: encoding, pson, serialization, utf8
Branch:
PSON is currently hardcoded to expect UTF8 in manifests and output UTF8 on
client machines.
The former is a genuine problem when manifests include non-UTF8 non-ASCII
characters (e.g. in content arguments for files), the latter is annoying if
client OSes or applications have no UTF8 support.
I'm working on a patch that will allow the user to control (via puppet.conf)
a) what non-UTF8 encoding the puppet master will accept as a fallback encoding
(PSON currently throws errors when trying to serialize non-UTF8 characters)
b) to which encoding PSON will deserialize in the client
I'm gunning for a "fallback" solution in the master, because a minimal change
will leave the UTF8<->UTF16 as is and add an additional iconv layer. However,
in most cases UTF8 is fine, because it handles pure ASCII without problems. So
the chosen native encoding is really a fallback for strings that raise UTF8
errors (because they are in fact not UTF8).
Does this even make sense, or am I on the road to crazy town?
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://projects.puppetlabs.com/my/account
--
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en.