+1 I took an all-of-the-above approach, including the Unicode character description, for the ASCIIFoldingFilter-based stuff. E.g. from the mapping file <http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/mapping-FoldToASCII.txt?view=markup>:
# Ä [LATIN CAPITAL LETTER A WITH DIAERESIS] "\u00C4" => "A" Steve > -----Original Message----- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > Sent: Thursday, April 07, 2011 4:28 PM > To: Lucene Dev > Subject: character escapes in source? ... was: Re: Eclipse: Invalid > character constant > > > replying to dev... > > : in eclipse you need to set your project's character encoding to UTF-8. > ... > : > Some language specific classes like GermanLightStemmer has invalid > : > character > : > compiler errors for code like: > : > switch(s[i]) { > : > case 'ä': > : > case 'à ': > : > case 'á': > : > in Eclipse with JDK 1.6 > > ...i seem to remember something similar coming up in the past, and I > thought we decided we should use java unicode character escapes instead of > literal UTF-8 characters in the source to minimize the number of headaches > (and make it more self documenting *exactly* what character we were using. > > should we revisit this? > > > -Hoss