Issue Type: Bug Bug
Affects Versions: JRuby 1.7.0
Assignee: Unassigned
Components: Encoding
Created: 16/Nov/12 10:07 PM
Description:

While tracing down an encoding problem in the Mail gem, I found a situation where transcoding Windows-1252 to UTF-8 results in loss.

Specifically:

# MRI Ruby 1.9.3
str = "=96".unpack("M")[0].force_encoding('Windows-1252').encode('UTF-8')
str.codepoints.to_a
=> [8211]

# JRuby 1.7.0
str = "=96".unpack("M")[0].force_encoding('Windows-1252').encode('UTF-8')
str.codepoints.to_a
=> [150]

JRuby is converting an En-Dash in Windows-1252 (\x96) into UTF-8 (\u0096) which is a control character instead of an En-Dash (\u2013)

Let me know if you need any other information if needed.

Environment: Mac OSX 10.8.2
Project: JRuby
Labels: jruby i18n web
Priority: Major Major
Reporter: Mikel Lindsaar
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email

Reply via email to