Hi Peter,
> 
> The Question:
> In Java generally, Is there an easy way to get the unicode name of a 
> character?  (e.g. "LATIN SMALL LETTER A" from 'a')
> 
...
> 
> I'm considering taking the unicode name for each character I encounter 
> and regexping it against something like:
> ^LATIN .* LETTER (.) WITH .*$
> ... to try and extract the single A-Z|a-z character.
> 
There used to be a list (ASCII) on some ftp server at unicode.org.
I have a version 'UnicodeData.txt' here.
It lists ~ 12000 characters in the form
01A4;LATIN CAPITAL LETTER P WITH HOOK;Lu;0;L;;;;;N;LATIN CAPITAL LETTER P 
HOOK;;;01A5;
01A5;LATIN SMALL LETTER P WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER P 
HOOK;;01A4;;01A4

If you cannot find that list somewhere I can mail you a copy.

It would be a nice contribution if you could add your filter to lucenes
sandbox, once it's finished.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to