matthiasblaesing commented on PR #6091:
URL: https://github.com/apache/netbeans/pull/6091#issuecomment-1605564798

   Thank you for the test file. This is indeed a worst-case. I think, that we 
are seeing a problem in the NetBeans lexer infrastructure when characters 
outside the basic Unicode Plane are encountered.
   
   One problematic character in that file is the code point 66349. That code 
point is:
   
   https://www.compart.com/de/unicode/U+1032D
   
   In java we normally deal with `char`s or character arrays. The 16bit size of 
char limits it to the BMP. For that reason Java encodes codepoints outside of 
the BMP as surrogate pairs 
(https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF).
   
   It seems, that at least one problem is, that the antlr lexer does not expect 
surrogate pairs and so we would need to recombine first and feed the resulting 
int into the lexer. A quick test looks promising. However the rendering is 
still off. I'll look further into this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists

Reply via email to