As we have a very large index, I'm interested in knowing what others do, before I commit to doing the below.
If I do go down that route, I assume I use a StandardAnalyzer once again? In a Test, I did the following... public class TestLuceneIndexCreateAndIndex extends TestCase { public void index() throws IOException { String indexName = "c:\\lucene\\test"; Analyzer analyzer = new StandardAnalyzer(); IndexWriter writer = new IndexWriter(indexName, analyzer, true); Document d = new Document(); d.add(new Field("headline", "& > < ´ ¸ ˆ ¯ · ˜ ¨ Á á Â â Æ æ À à Å å à ã Ä ä Ç ç É é Ê ê È è Ð ð Ë ë Í í Î î Ì ì Ï ï Ñ ñ Ó ó Ô ô Œ œ Ò ò Ø ø Õ õ Ö ö Š š ß Þ þ Ú ú Û û Ù ù Ü ü Ý ý ÿ Ÿ", true, true, true)); writer.addDocument(d); writer.close(); IndexReader reader = IndexReader.open(indexName); assertTrue(reader.numDocs()>0); } } Using luke I searched for headline:Ê which corrceted returned the article. However, when I did headline:& it returned nothing which I didn't expect. Thanks On 15/11/05, Daniel Noll <[EMAIL PROTECTED]> wrote: > Mordo, Aviran (EXP N-NANNATEK) wrote: > > >You can use your own Analyzer to support special characters. Just > >process the special characters in your analyzer > > > > > That's one option. The "correct" solution would be, since this is > presumably HTML or XML, replacing entities with their actual string > values before analysing the text. > > Daniel > > -- > Daniel Noll > > NUIX Pty Ltd > Level 8, 143 York Street, Sydney 2000 > Phone: (02) 9283 9010 > Fax: (02) 9283 9020 > > This message is intended only for the named recipient. If you are not > the intended recipient you are notified that disclosing, copying, > distributing or taking any action in reliance on the contents of this > message or attachment is strictly prohibited. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >