On 09/30/2010 05:40 PM, Markus Roberts wrote:
> Felix --
> 
>>  What I don't really understand is how the serialization really works.
> 
>> If I read this right, each 16bit word (UTF16be encoded) is sent as
>> '\uXXXX' where XXXX is the hex representation?
>> So, "invalid" bytes should get a '/xXX' representation instead? (Without
>> attempting to convert to UTF16 of course)
>>
> 
> Basically, yes.  Part of this is handled by iconv which a) isn't supported
> on all platforms and b) doesn't work consistently on the platforms where it
> is.  I think we may need to just write a pure ruby handler and be done with
> it, but haven't overcome the mental hurdle of "but we shouldn't have to do
> that!"

Interesting: I had another look at the PSON parser and apparently,
unescaped bytes are fine in the datastream. The attached patch (against
2.6.0) Just Works.

This is somewhat anticlimactic ;-)

Cheers,
Felix

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

--- /usr/lib/ruby/1.8/puppet/external/pson/pure/generator.rb.orig	2010-09-30 16:55:19.000000000 +0200
+++ /usr/lib/ruby/1.8/puppet/external/pson/pure/generator.rb	2010-09-30 17:38:50.000000000 +0200
@@ -70,11 +70,14 @@
           [\xe0-\xef][\x80-\xbf]{2} |
           [\xf0-\xf4][\x80-\xbf]{3}
             )+ |
-            [\x80-\xc1\xf5-\xff]       # invalid
+            [\x80-\xc1\xf5-\xff]       # non-UTF8, send verbatim
               )/nx) { |c|
-        c.size == 1 and raise GeneratorError, "invalid utf8 byte: '#{c}'"
-        s = PSON::UTF8toUTF16.iconv(c).unpack('H*')[0]
-        s.gsub!(/.{4}/n, '\\\\u\&')
+        if c.size == 1
+		c
+	else
+		s = PSON::UTF8toUTF16.iconv(c).unpack('H*')[0]
+		s.gsub!(/.{4}/n, '\\\\u\&')
+	end
       }
       string
     rescue Iconv::Failure => e

Reply via email to