As we have a very large index, I'm interested in knowing what others
do, before I commit to doing the below.

If I do go down that route, I assume I use a StandardAnalyzer once again?

In a Test, I did the following...

public class TestLuceneIndexCreateAndIndex extends TestCase {
    public void index() throws IOException {
        String indexName = "c:\\lucene\\test";
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriter writer = new IndexWriter(indexName, analyzer, true);
        Document d = new Document();
        d.add(new Field("headline", "& > < ´ ¸ ˆ ¯ · ˜ ¨ Á á Â â Æ æ À
à Å å à ã Ä ä Ç ç É é Ê ê È è Ð ð Ë ë Í í Î î Ì ì Ï ï Ñ ñ Ó ó Ô ô Œ œ
Ò ò Ø ø Õ õ Ö ö Š š ß Þ þ Ú ú Û û Ù ù Ü ü Ý ý ÿ Ÿ", true, true,
true));
        writer.addDocument(d);
        writer.close();
        IndexReader reader = IndexReader.open(indexName);
        assertTrue(reader.numDocs()>0);
    }
}

Using luke I searched for headline:Ê which corrceted returned the
article.  However, when I did headline:& it returned nothing which I
didn't expect.

Thanks

On 15/11/05, Daniel Noll <[EMAIL PROTECTED]> wrote:
> Mordo, Aviran (EXP N-NANNATEK) wrote:
>
> >You can use your own Analyzer to support special characters. Just
> >process the special characters in your analyzer
> >
> >
> That's one option.  The "correct" solution would be, since this is
> presumably HTML or XML, replacing entities with their actual string
> values before analysing the text.
>
> Daniel
>
> --
> Daniel Noll
>
> NUIX Pty Ltd
> Level 8, 143 York Street, Sydney 2000
> Phone: (02) 9283 9010
> Fax:   (02) 9283 9020
>
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to