+1
I took an all-of-the-above approach, including the Unicode character
description, for the ASCIIFoldingFilter-based stuff. E.g. from the mapping
file
<http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/mapping-FoldToASCII.txt?view=markup>:
# Ä [LATIN CAPITAL LETTER A WITH DIAERESIS]
"\u00C4" => "A"
Steve
> -----Original Message-----
> From: Chris Hostetter [mailto:[email protected]]
> Sent: Thursday, April 07, 2011 4:28 PM
> To: Lucene Dev
> Subject: character escapes in source? ... was: Re: Eclipse: Invalid
> character constant
>
>
> replying to dev...
>
> : in eclipse you need to set your project's character encoding to UTF-8.
> ...
> : > Some language specific classes like GermanLightStemmer has invalid
> : > character
> : > compiler errors for code like:
> : > switch(s[i]) {
> : > case 'ä':
> : > case 'Ã ':
> : > case 'á':
> : > in Eclipse with JDK 1.6
>
> ...i seem to remember something similar coming up in the past, and I
> thought we decided we should use java unicode character escapes instead of
> literal UTF-8 characters in the source to minimize the number of headaches
> (and make it more self documenting *exactly* what character we were using.
>
> should we revisit this?
>
>
> -Hoss