>Out of curiosity, what is the problem you are trying to solve? I am trying to provide suggestions for search terms/word, such as google does. When the user starts typing the search term, I look up my TermIndex to provide possible search terms which fit the characters provided...
Thx Clemens > -----Ursprüngliche Nachricht----- > Von: Grant Ingersoll [mailto:gsing...@apache.org] > Gesendet: Sonntag, 24. April 2011 08:30 > An: java-user@lucene.apache.org > Betreff: Re: "Umlaute" getting lost > > > On Apr 21, 2011, at 5:02 PM, Clemens Wyss wrote: > > > I keep my search terms in a dedicated RAMDirectory (the termIndex). > > In there I palce all the term of my real index. When putting the terms > > into the termIndex I can still see [using the debugger] the Umlaute > > (äöü). Unfortunately when searching the termIndex the documents no > more contain these Umlaute. > > > > Populating the termIndex: > > termIndex = new RAMDirectory(); > > IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_31, > > new TermAnalyzer( locale ) ); termIndexWriter = new IndexWriter( > > termIndex, config ); TermEnum tEnum = realIndexReader.terms(); while ( > > tEnum.next() ) { > > Term t = tEnum.term(); > > String termText = t.text(); > > Document termDocument = new Document(); > > Field field = new Field( FIELDNAME_TERM, termText, Field.Store.YES, > Field.Index.ANALYZED ); > > termDocument.add( field ); > > // and add term into the index > > termIndexWriter.addDocument( termDocument ); } > > termIndexWriter.commit(); termIndexWriter.optimize(); > > termIndexWriter.close(); > > > > termIndexReader = IndexReader.open( termIndex, true ); > > ---------- searching terms > > Query q = fuzzy ? new FuzzyQuery( new Term( FIELDNAME_TERM, > termFilter.toLowerCase() ) ) : > > new WildcardQuery( new Term( > FIELDNAME_TERM, "*" + termFilter.toLowerCase() + "*" ) ); > > TopDocs topDocs = new IndexSearcher( getTermIndexReader() ).search( q, > 100 ); > > for ( ScoreDoc hit : topDocs.scoreDocs ) { > > Document doc = getTermIndexReader().document( hit.doc ); > > String indexTerm = doc.get( FIELDNAME_TERM ); > > if ( !returnValue.contains( indexTerm ) ) > > { > > returnValue.add( indexTerm ); > > } > > } > > ---------- > > The TermAbnalyzer is the same analyzer as the main index analyzer with > the exception that a LowerCaseFilter is applied. > > What is the Analyzer for the Main Index? What is the tokenizer and token > filters used? > > Out of curiosity, what is the problem you are trying to solve? > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org