ÿ breaks jruby lexer, EOF == -1 collides with signed bytes. -----------------------------------------------------------
Key: JRUBY-5089 URL: http://jira.codehaus.org/browse/JRUBY-5089 Project: JRuby Issue Type: Bug Components: Interpreter Affects Versions: JRuby 1.5 Reporter: Xore Ander the latin 1 character ÿ has the byte value 0xff when reading in a .rb file that contains this character, (ie, encoding_test_value = "aeiou and sometimes ÿ", the jruby interpreter reads this as an EOF and stops reading the .rb file. This will occur even if the file is set with # coding: latin 1 The problem is in the lexer: the ByteArrayCursor::read() feeds the int value -1 as EOF, however the byte stream is made of signed bytes: therefore 0xff == -1 == EOF. I fixed the problem locally as follows: src\org\jruby\lexer\yacc\ByteArrayLexerSource.java: ByteArrayLexerSource::ByteArrayCursor::public int read() - return forward(region[index++]); + return forward(region[index++] & 0xff); this converts the byte to it's unsigned value before converting it to an int, so it no longer collides with EOF. I haven't determined whether InputStreamLexerSource.java or other Cursors and functions ( ByteArrayCursor::at(), PushbackCursor::read() && at() ) in this file are affected as well in other contexts. It might be worth looking into. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email