+1

I took an all-of-the-above approach, including the Unicode character 
description, for the ASCIIFoldingFilter-based stuff.  E.g. from the mapping 
file 
<http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/mapping-FoldToASCII.txt?view=markup>:

        # Ä [LATIN CAPITAL LETTER A WITH DIAERESIS]
        "\u00C4" => "A"

Steve

> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: Thursday, April 07, 2011 4:28 PM
> To: Lucene Dev
> Subject: character escapes in source? ... was: Re: Eclipse: Invalid
> character constant
> 
> 
> replying to dev...
> 
> : in eclipse you need to set your project's character encoding to UTF-8.
>       ...
> : > Some language specific classes like GermanLightStemmer has invalid
> : > character
> : > compiler errors for code like:
> : >      switch(s[i]) {
> : >        case 'ä':
> : >        case 'Ã ':
> : >        case 'á':
> : > in Eclipse with JDK 1.6
> 
> ...i seem to remember something similar coming up in the past, and I
> thought we decided we should use java unicode character escapes instead of
> literal UTF-8 characters in the source to minimize the number of headaches
> (and make it more self documenting *exactly* what character we were using.
> 
> should we revisit this?
> 
> 
> -Hoss

Reply via email to