How does an simple Analyzer look that just "n-grams" the docs/fields.
class SimpleNGramAnalyzer extends Analyzer { @Override public TokenStream tokenStream ( String fieldName, Reader reader ) { EdgeNGramTokenFilter... ??? } } > -----Ursprüngliche Nachricht----- > Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Gesendet: Dienstag, 3. Mai 2011 13:36 > An: java-user@lucene.apache.org > Betreff: Re: AW: "fuzzy prefix" search > > Hi, > > I didn't read this thread closely, but just in case: > * Is this something you can handle with synonyms? > * If this is for English and you are trying to handle typos, there is a list > of > common English misspellings out there that you could use for this perhaps. > * Have you considered n-gramming your tokens? Not sure if this would help, > didn't read messages/examples closely enough, but you may want to look at > this if you haven't done so yet. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem > search :: http://search-lucene.com/ > > > > ----- Original Message ---- > > From: Clemens Wyss <clemens...@mysign.ch> > > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > > Sent: Tue, May 3, 2011 5:25:30 AM > > Subject: AW: "fuzzy prefix" search > > > > >PrefixQuery > > I'd like the combination of prefix and fuzzy ;-) because people could > >also type "menlo" or "märl" and in any of these cases I'd like to get > >a hit on Merlot (for suggesting Merlot) > > > > > -----Ursprüngliche Nachricht----- > > > Von: Ian Lea [mailto:ian....@gmail.com] > > > Gesendet: Dienstag, 3. Mai 2011 11:22 > > > An: java-user@lucene.apache.org > > > Betreff: Re: "fuzzy prefix" search > > > > > > I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong. > > > What would be the edit distance between "mer" and "merlot"? Would > > > it be less that 1.5 which I reckon would be the value of > > > length(term)*0.5 as detailed in the javadocs? Seems unlikely, but > > > I don't really know anything about the Levenshtein (edit distance) > algorithm as used by FuzzyQuery. > > > Wouldn't a PrefixQuery be more appropriate here? > > > > > > > > > -- > > > Ian. > > > > > > On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss > > > <clemens...@mysign.ch> > > > wrote: > > > > Unfortunately lowercasing doesn't help. > > > > Also, doesn't the FuzzyQuery ignore casing? > > > > > > > >> -----Ursprüngliche Nachricht----- > > > >> Von: Ian Lea [mailto:ian....@gmail.com] > > > >> Gesendet: Dienstag, 3. Mai 2011 11:06 > > > >> An: java-user@lucene.apache.org > > > >> Betreff: Re: "fuzzy prefix" search > > > >> > > > >> Mer != mer. The latter will be what is indexed because > > > >> StandardAnalyzer calls LowerCaseFilter. > > > >> > > > >> -- > > > >> Ian. > > > >> > > > >> > > > >> On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss > > > <clemens...@mysign.ch> > > > >> wrote: > > > >> > Sorry for coming back to my issue. Can anybody explain why my > > > "simple" > > > >> unit test below fails? Any hint/help appreciated. > > > >> > > > > >> > Directory directory = new RAMDirectory(); IndexWriter > > > >> > indexWriter = new IndexWriter( directory, new > > > >> > StandardAnalyzer( > > > Version.LUCENE_31 > > > >> > ), IndexWriter.MaxFieldLength.UNLIMITED ); Document document > = > > > new > > > >> > Document(); document.add( new Field( "test", "Merlot", > > > >> > Field.Store.YES, Field.Index.ANALYZED ) ); > > > >> > indexWriter.addDocument( > > > >> > document ); IndexReader indexReader = > > > indexWriter.getReader(); > > > >> > IndexSearcher searcher = new IndexSearcher( indexReader ); > > > >> > Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, > > > >> > 10 ); // or Query q = new FuzzyQuery( new Term( "test", "Mer" > > > >> > ), 0.5f); TopDocs result = searcher.search( q, 10 ); > > > >> > Assert.assertEquals( 1, result.totalHits ); > > > >> > > > > >> > - Clemens > > > >> > > > > >> >> -----Ursprüngliche Nachricht----- > > > >> >> Von: Clemens Wyss [mailto:clemens...@mysign.ch] > > > >> >> Gesendet: Montag, 2. Mai 2011 23:01 > > > >> >> An: java-user@lucene.apache.org > > > >> >> Betreff: AW: "fuzzy prefix" search > > > >> >> > > > >> >> Is it the combination of FuzzyQuery and Term which makes the > > > >> >> search to go for "word boundaries"? > > > >> >> > > > >> >> > -----Ursprüngliche Nachricht----- > > > >> >> > Von: Clemens Wyss [mailto:clemens...@mysign.ch] > > > >> >> > Gesendet: Montag, 2. Mai 2011 14:13 > > > >> >> > An: java-user@lucene.apache.org > > > >> >> > Betreff: AW: "fuzzy prefix" search > > > >> >> > > > > >> >> > I tried this too, but unfortunately I only get hits when > > > >> >> > the search term is a least as long as the word to be looked up. > > > >> >> > > > > >> >> > E.g.: > > > >> >> > ... > > > >> >> > Directory directory = new RAMDirectory(); IndexWriter > > > >> >> > indexWriter = new IndexWriter( directory, >> >> > > > > IndexManager.getIndexingAnalyzer( > > > >> >> LOCALE_DE ), > > > >> >> > IndexWriter.MaxFieldLength.UNLIMITED ); > > > >> >> > > > > >> >> > Document document = new Document(); document.add( new > > > Field( > > > >> >> > "test", "Merlot", > > > >> >> > Field.Store.YES, Field.Index.ANALYZED ) ); > > > >> >> indexWriter.addDocument( > > > >> >> > document ); > > > >> >> > > > > >> >> > IndexReader indexReader = indexWriter.getReader(); > > > >> >> > IndexSearcher > > > >> >> > searcher = new IndexSearcher( indexReader ); >> >> > > > > >> >> > Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, > > > >> >> > 1 ); TopDocs result = searcher.search( q, 10 ); > > > >> >> > Assert.assertEquals( > > > >> >> > 1, > > > >> >> result.totalHits ); ... > > > >> >> > > > > >> >> > > -----Ursprüngliche Nachricht----- > > > >> >> > > Von: Uwe Schindler [mailto:u...@thetaphi.de] > > > >> >> > > Gesendet: Montag, 2. Mai 2011 13:50 > > > >> >> > > An: java-user@lucene.apache.org > > > >> >> > > Betreff: RE: "fuzzy prefix" search > > > >> >> > > > > > >> >> > > Hi, > > > >> >> > > > > > >> >> > > You can pass an integer to FuzzyQuery which defines the > > > >> >> > > number of characters that are seen as prefix. So all > > > >> >> > > terms must match > > > >> >> > > this prefix and the rest of each term is matched using fuzzy. > > > >> >> > > > > > >> >> > > Uwe > > > >> >> > > > > > >> >> > > ----- > > > >> >> > > Uwe Schindler > > > >> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > >> http://www.thetaphi.de > > > >> >> > > eMail: u...@thetaphi.de > > > >> >> > > > > > >> >> > > > -----Original Message----- > > > >> >> > > > From: Clemens Wyss [mailto:clemens...@mysign.ch] > > > >> >> > > > Sent: Monday, May 02, 2011 1:47 PM >> > > > To: > > > >> java-user@lucene.apache.org > > > >> >> > > > Subject: "fuzzy prefix" search >> >> > > > > > > >> >> > > > I'd like to search fuzzily but not on a full term. > > > >> >> > > > E.g. > > > >> >> > > > I have a text "Merlot del Ticino" > > > >> >> > > > I'd like > > > >> >> > > > "mer", "merr", "melo", ... to match. > > > >> >> > > > > > > >> >> > > > If I use FuzzyQuery only "merlot, "merlott" hit. What > > > >> >> > > > Query-combination should I use? > > > >> >> > > > > > > >> >> > > > Thx > > > >> >> > > > Clemens > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > -------------------------------------------------------- > > > >> >> > > > ---- > > > >> >> > > > --- > > > >> >> > > > --- > > > >> >> > > > -- > > > >> >> > > > - To unsubscribe, e-mail: > > > >> >> > > > java-user-unsubscr...@lucene.apache.org > > > >> >> > > > For additional commands, e-mail: > > > >> >> > > > java-user-h...@lucene.apache.org >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > ---------------------------------------------------------- > > > >> >> > > ---- > > > >> >> > > --- > > > >> >> > > --- > > > >> >> > > - To unsubscribe, e-mail: > > > >> >> > > java-user-unsubscr...@lucene.apache.org > > > >> >> > > For additional commands, e-mail: > > > >> >> > > java-user-h...@lucene.apache.org > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> -------------------------------------------------------------- > > > >> >> -- > > > >> >> > --- > > > >> >> > -- To unsubscribe, e-mail: > > > >> >> > java-user-unsubscr...@lucene.apache.org > > > >> >> > For additional commands, e-mail: > > > >> >> > java-user-h...@lucene.apache.org > > > >> >> > > > >> >> > > > >> >> > > > >> >> -------------------------------------------------------------- > > > >> >> ---- > > > >> >> --- To unsubscribe, e-mail: > > > >> >> java-user-unsubscr...@lucene.apache.org > > > >> >> For additional commands, e-mail: > > > java-user-h...@lucene.apache.org >> > > > > >> > > > > >> > > > > >> > --------------------------------------------------------------- > > > >> > ---- > > > >> > -- To unsubscribe, e-mail: > > > java-user-unsubscr...@lucene.apache.org > > > >> > For additional commands, e-mail: > > > java-user-h...@lucene.apache.org >> > > > > >> > > > > >> > > > >> > > > >> ----------------------------------------------------------------- > > > >> ---- > > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > >> For additional commands, e-mail: > > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > --- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org