AW: "Umlaute" getting lost

Clemens Wyss Mon, 25 Apr 2011 23:16:21 -0700

>Out of curiosity, what is the problem you are trying to solve?
I am trying to provide suggestions for search terms/word, such as google does. 
When the user starts typing the search term, I look up my TermIndex to provide 
possible search terms which fit the characters provided...


Thx
Clemens

> -----Ursprüngliche Nachricht-----
> Von: Grant Ingersoll [mailto:gsing...@apache.org]
> Gesendet: Sonntag, 24. April 2011 08:30
> An: java-user@lucene.apache.org
> Betreff: Re: "Umlaute" getting lost
> 
> 
> On Apr 21, 2011, at 5:02 PM, Clemens Wyss wrote:
> 
> > I keep my search terms in a dedicated RAMDirectory (the termIndex).
> > In there I palce all the term of my real index. When putting the terms
> > into the termIndex I can still see [using the debugger] the Umlaute
> > (äöü). Unfortunately when searching the termIndex the documents no
> more contain these Umlaute.
> >
> > Populating the termIndex:
> > termIndex = new RAMDirectory();
> > IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_31,
> > new TermAnalyzer( locale ) ); termIndexWriter = new IndexWriter(
> > termIndex, config ); TermEnum tEnum = realIndexReader.terms(); while (
> > tEnum.next() ) {
> >     Term t = tEnum.term();
> >     String termText = t.text();
> >     Document termDocument = new Document();
> >     Field field = new Field( FIELDNAME_TERM, termText, Field.Store.YES,
> Field.Index.ANALYZED );
> >     termDocument.add( field );
> >     // and add term into the index
> >     termIndexWriter.addDocument( termDocument ); }
> > termIndexWriter.commit(); termIndexWriter.optimize();
> > termIndexWriter.close();
> >
> > termIndexReader = IndexReader.open( termIndex, true );
> > ---------- searching terms
> > Query q = fuzzy ? new FuzzyQuery( new Term( FIELDNAME_TERM,
> termFilter.toLowerCase() ) ) :
> >                                     new WildcardQuery( new Term(
> FIELDNAME_TERM, "*" + termFilter.toLowerCase() + "*" ) );
> > TopDocs topDocs = new IndexSearcher( getTermIndexReader() ).search( q,
> 100 );
> > for ( ScoreDoc hit : topDocs.scoreDocs ) {
> >     Document doc = getTermIndexReader().document( hit.doc );
> >     String indexTerm = doc.get( FIELDNAME_TERM );
> >     if ( !returnValue.contains( indexTerm  ) )
> >     {
> >             returnValue.add( indexTerm );
> >     }
> > }
> > ----------
> > The TermAbnalyzer is the same analyzer as the main index analyzer with
> the exception that a LowerCaseFilter is applied.
> 
> What is the Analyzer for the Main Index?  What is the tokenizer and token
> filters used?
> 
> Out of curiosity, what is the problem you are trying to solve?
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

AW: "Umlaute" getting lost

Reply via email to