On 09/30/2010 05:40 PM, Markus Roberts wrote:
> Felix --
>
>> What I don't really understand is how the serialization really works.
>
>> If I read this right, each 16bit word (UTF16be encoded) is sent as
>> '\uXXXX' where XXXX is the hex representation?
>> So, "invalid" bytes should get a '/xXX' representation instead? (Without
>> attempting to convert to UTF16 of course)
>>
>
> Basically, yes. Part of this is handled by iconv which a) isn't supported
> on all platforms and b) doesn't work consistently on the platforms where it
> is. I think we may need to just write a pure ruby handler and be done with
> it, but haven't overcome the mental hurdle of "but we shouldn't have to do
> that!"
Interesting: I had another look at the PSON parser and apparently,
unescaped bytes are fine in the datastream. The attached patch (against
2.6.0) Just Works.
This is somewhat anticlimactic ;-)
Cheers,
Felix
--
You received this message because you are subscribed to the Google Groups
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-dev?hl=en.
--- /usr/lib/ruby/1.8/puppet/external/pson/pure/generator.rb.orig 2010-09-30 16:55:19.000000000 +0200
+++ /usr/lib/ruby/1.8/puppet/external/pson/pure/generator.rb 2010-09-30 17:38:50.000000000 +0200
@@ -70,11 +70,14 @@
[\xe0-\xef][\x80-\xbf]{2} |
[\xf0-\xf4][\x80-\xbf]{3}
)+ |
- [\x80-\xc1\xf5-\xff] # invalid
+ [\x80-\xc1\xf5-\xff] # non-UTF8, send verbatim
)/nx) { |c|
- c.size == 1 and raise GeneratorError, "invalid utf8 byte: '#{c}'"
- s = PSON::UTF8toUTF16.iconv(c).unpack('H*')[0]
- s.gsub!(/.{4}/n, '\\\\u\&')
+ if c.size == 1
+ c
+ else
+ s = PSON::UTF8toUTF16.iconv(c).unpack('H*')[0]
+ s.gsub!(/.{4}/n, '\\\\u\&')
+ end
}
string
rescue Iconv::Failure => e