ÿ breaks jruby lexer, EOF == -1 collides with signed bytes.
-----------------------------------------------------------
Key: JRUBY-5089
URL: http://jira.codehaus.org/browse/JRUBY-5089
Project: JRuby
Issue Type: Bug
Components: Interpreter
Affects Versions: JRuby 1.5
Reporter: Xore Ander
the latin 1 character ÿ has the byte value 0xff
when reading in a .rb file that contains this character, (ie,
encoding_test_value = "aeiou and sometimes ÿ", the jruby interpreter reads this
as an EOF and stops reading the .rb file. This will occur even if the file is
set with # coding: latin 1
The problem is in the lexer: the ByteArrayCursor::read() feeds the int value -1
as EOF, however the byte stream is made of signed bytes: therefore 0xff == -1
== EOF.
I fixed the problem locally as follows:
src\org\jruby\lexer\yacc\ByteArrayLexerSource.java:
ByteArrayLexerSource::ByteArrayCursor::public int read()
- return forward(region[index++]);
+ return forward(region[index++] & 0xff);
this converts the byte to it's unsigned value before converting it to an int,
so it no longer collides with EOF.
I haven't determined whether InputStreamLexerSource.java or other Cursors and
functions ( ByteArrayCursor::at(), PushbackCursor::read() && at() ) in this
file are affected as well in other contexts. It might be worth looking into.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email