Hi Peter, > > The Question: > In Java generally, Is there an easy way to get the unicode name of a > character? (e.g. "LATIN SMALL LETTER A" from 'a') > ... > > I'm considering taking the unicode name for each character I encounter > and regexping it against something like: > ^LATIN .* LETTER (.) WITH .*$ > ... to try and extract the single A-Z|a-z character. > There used to be a list (ASCII) on some ftp server at unicode.org. I have a version 'UnicodeData.txt' here. It lists ~ 12000 characters in the form 01A4;LATIN CAPITAL LETTER P WITH HOOK;Lu;0;L;;;;;N;LATIN CAPITAL LETTER P HOOK;;;01A5; 01A5;LATIN SMALL LETTER P WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER P HOOK;;01A4;;01A4
If you cannot find that list somewhere I can mail you a copy. It would be a nice contribution if you could add your filter to lucenes sandbox, once it's finished. Morus --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]