UTF-8 chacters don't pass through hpricot gracefully since Jruby 1.1.6
----------------------------------------------------------------------
Key: JRUBY-3732
URL: http://jira.codehaus.org/browse/JRUBY-3732
Project: JRuby
Issue Type: Bug
Affects Versions: JRuby 1.3, JRuby 1.2
Environment: Linux x64
jruby 1.3.0 (ruby 1.8.6p287) (2009-06-03 5dc2e22) (Java HotSpot(TM) 64-Bit
Server VM 1.6.0_02) [amd64-java]
jruby 1.2.0 (ruby 1.8.6 patchlevel 287) (2009-03-16 rev 9419) [amd64-java]
jruby 1.1.6 (ruby 1.8.6 patchlevel 114) (2008-12-17 rev 8388) [amd64-java]
*** LOCAL GEMS ***
hpricot (0.6.164)
Reporter: David Kellum
Assignee: Thomas E Enebo
UTF-8 characters no longer pass gracefully through hpricot (after jruby 1.1.6)
The following code sample, tested with UTF-8 encoding, has input string
containing unicode mdash:
{code:ruby}
require 'rubygems'
require 'hpricot'
input = "<p>TUCSON, Ariz. The driver</p>"
puts input
doc = Hpricot.parse( input )
puts doc.inner_html
{code:ruby}
Here is comparative output:
{code}
% ruby ./utf8_sample_2.rb
<p>TUCSON, Ariz. The driver</p>
<p>TUCSON, Ariz. The driver</p>
david) /opt/dist/jruby-1.1.6/bin/jruby ./utf8_sample_2.rb
<p>TUCSON, Ariz. The driver</p>
<p>TUCSON, Ariz. The driver</p>
% /opt/dist/jruby-1.2.0/bin/jruby ./utf8_sample_2.rb
<p>TUCSON, Ariz. The driver</p>
<p>TUCSON, Ariz. — The driver</p>
% /opt/dist/jruby-1.3.0/bin/jruby ./utf8_sample_2.rb
<p>TUCSON, Ariz. The driver</p>
<p>TUCSON, Ariz. — The driver</p>
{code}
Where jruby 1.2.0 and 1.3.0 show a mangled mdash (—).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email