Charles Oliver Nutter created JRUBY-6403:
--------------------------------------------
Summary: Regexp + encoding errors in REXML
Key: JRUBY-6403
URL: https://jira.codehaus.org/browse/JRUBY-6403
Project: JRuby
Issue Type: Bug
Components: Core Classes/Modules
Reporter: Charles Oliver Nutter
The attached script produces encoding mismatch errors from regexp. It also
produces an error when it tries to eventually construct the exception, since
the contents of the message are incorrectly encoded. I had to add some logging
to rexml's parseexception.rb to get the actual errors to print out:
{noformat}
diff --git a/lib/ruby/1.9/rexml/parseexception.rb
b/lib/ruby/1.9/rexml/parseexception.rb
index 0c4d55a..9a2d885 100644
--- a/lib/ruby/1.9/rexml/parseexception.rb
+++ b/lib/ruby/1.9/rexml/parseexception.rb
@@ -21,6 +21,11 @@ module REXML
end
# Get the stack trace and error message
+ puts err
+ p err.encoding
+ s = super
+ puts s
+ p s.encoding
err << super
# Add contextual information
{noformat}
It seems we're still having some encoding mismatch problems.
My full output (with extra logging) follows:
{noformat}
UTF-8
#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8
regexp with ASCII-8BIT string)>
org/jruby/RubyRegexp.java:1504:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
`parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
`xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
`xml_in'
test.rb:13:in `(root)'
...
#<Encoding:UTF-8>
Exception parsing
#<Encoding:US-ASCII>
#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8
regexp with ASCII-8BIT string)>
org/jruby/RubyRegexp.java:1504:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
`parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
`xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
`xml_in'
test.rb:13:in `(root)'
...
#<Encoding:UTF-8>
Exception parsing
#<Encoding:US-ASCII>
#<REXML::ParseException: #<Encoding::CompatibilityError: incompatible encoding
regexp match (UTF-8 regexp with ASCII-8BIT string)>
org/jruby/RubyRegexp.java:1504:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
`parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
`xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
`xml_in'
test.rb:13:in `(root)'
...
Exception parsing
Line: 4
Position: 94
Last 80 unconsumed characters:
<!-- Savi žemės unitai -->>
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:427:in
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
`parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
`xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
`xml_in'
test.rb:13:in `(root)'
...
#<Encoding:UTF-8>
#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8
regexp with ASCII-8BIT string)>
org/jruby/RubyRegexp.java:1504:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
`parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
`xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
`xml_in'
test.rb:13:in `(root)'
...
Exception parsing
Line: 4
Position: 94
Last 80 unconsumed characters:
<!-- Savi emės unitai -->
#<Encoding:ASCII-8BIT>
Encoding::CompatibilityError: incompatible character encodings: UTF-8 and
ASCII-8BIT
concat at org/jruby/RubyString.java:2521
to_s at
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parseexception.rb:29
message at org/jruby/RubyException.java:266
{noformat}
Note that the last error listed is the one from attempting to append the
superclass exception's to_s result to the current error.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email