Well, there are a couple of approaches:
1> enable leading wildcards and search for *arabic*. You
probably don't want to do this, it's really, really expensive.
2> use the ngram (edgengram?) tokenizers. This'll cost
you some index space, but that may be acceptable.
HTH
Erick
2010/1/28 Lutischán Ferenc <[email protected]>
> Hi,
>
> I have a problem with Lucene:
> I'm indexed an english phrase list with Lucene:
> doc.add(new Field("r1", r1.toLowerCase(), Field.Store.NO,
> Field.Index.ANALYZED));
>
> I searched for the word 'arabic':
>
> Analyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_CURRENT);
> QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> this.searchedField, analyzer);
> Query query = parser.parse(searchedStr);
> TopScoreDocCollector collector = TopScoreDocCollector.create(10,
> true);
> this.memDict.isearcher.search(query, collector);
> foundCnt=collector.getTotalHits();
> System.out.println(searchedStr + ":" + foundCnt);
>
> // Iterate through the results:
> ScoreDoc[] hits = collector.topDocs().scoreDocs;
> for (int i = 0; i < hits.length; i++) {
> Document hitDoc = this.memDict.isearcher.doc(hits[i].doc);
> System.out.println("\"r1\"=" + hitDoc.get("r1"));
> }
>
> The result list is:
> *arabic
> **arabic* numerals
> gum *arabic
> *
> But is not in the result list:
> moz*arabic*
>
> How to use Lucene to find all the words contains 'arabic'?
>
> Regards,
> Ferenc
>