ÿ breaks jruby lexer, EOF == -1 collides with signed bytes.
-----------------------------------------------------------

                 Key: JRUBY-5089
                 URL: http://jira.codehaus.org/browse/JRUBY-5089
             Project: JRuby
          Issue Type: Bug
          Components: Interpreter
    Affects Versions: JRuby 1.5
            Reporter: Xore Ander


the latin 1 character ÿ has the byte value 0xff

when reading in a .rb file that contains this character, (ie, 
encoding_test_value = "aeiou and sometimes ÿ", the jruby interpreter reads this 
as an EOF and stops reading the .rb file. This will occur even if the file is 
set with # coding: latin 1

The problem is in the lexer: the ByteArrayCursor::read() feeds the int value -1 
as EOF, however the byte stream is made of signed bytes: therefore 0xff == -1 
== EOF.

I fixed the problem locally as follows:

src\org\jruby\lexer\yacc\ByteArrayLexerSource.java:
ByteArrayLexerSource::ByteArrayCursor::public int read()
-             return forward(region[index++]);
+             return forward(region[index++] & 0xff);

this converts the byte to it's unsigned value before converting it to an int, 
so it no longer collides with EOF.

I haven't determined whether InputStreamLexerSource.java or other Cursors and 
functions ( ByteArrayCursor::at(), PushbackCursor::read() && at() ) in this 
file are affected as well in other contexts. It might be worth looking into.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to