matthiasblaesing commented on PR #6091: URL: https://github.com/apache/netbeans/pull/6091#issuecomment-1605564798
Thank you for the test file. This is indeed a worst-case. I think, that we are seeing a problem in the NetBeans lexer infrastructure when characters outside the basic Unicode Plane are encountered. One problematic character in that file is the code point 66349. That code point is: https://www.compart.com/de/unicode/U+1032D In java we normally deal with `char`s or character arrays. The 16bit size of char limits it to the BMP. For that reason Java encodes codepoints outside of the BMP as surrogate pairs (https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF). It seems, that at least one problem is, that the antlr lexer does not expect surrogate pairs and so we would need to recombine first and feed the resulting int into the lexer. A quick test looks promising. However the rendering is still off. I'll look further into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] For further information about the NetBeans mailing lists, visit: https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists
