How does an simple Analyzer look that just "n-grams" the docs/fields.
class SimpleNGramAnalyzer extends Analyzer
{
@Override
public TokenStream tokenStream ( String fieldName, Reader reader )
{
EdgeNGramTokenFilter... ???
}
}
> -----Ursprüngliche Nachricht-----
> Von: Otis Gospodnetic [mailto:[email protected]]
> Gesendet: Dienstag, 3. Mai 2011 13:36
> An: [email protected]
> Betreff: Re: AW: "fuzzy prefix" search
>
> Hi,
>
> I didn't read this thread closely, but just in case:
> * Is this something you can handle with synonyms?
> * If this is for English and you are trying to handle typos, there is a list
> of
> common English misspellings out there that you could use for this perhaps.
> * Have you considered n-gramming your tokens? Not sure if this would help,
> didn't read messages/examples closely enough, but you may want to look at
> this if you haven't done so yet.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
> search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
> > From: Clemens Wyss <[email protected]>
> > To: "[email protected]" <[email protected]>
> > Sent: Tue, May 3, 2011 5:25:30 AM
> > Subject: AW: "fuzzy prefix" search
> >
> > >PrefixQuery
> > I'd like the combination of prefix and fuzzy ;-) because people could
> >also type "menlo" or "märl" and in any of these cases I'd like to get
> >a hit on Merlot (for suggesting Merlot)
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Ian Lea [mailto:[email protected]]
> > > Gesendet: Dienstag, 3. Mai 2011 11:22
> > > An: [email protected]
> > > Betreff: Re: "fuzzy prefix" search
> > >
> > > I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong.
> > > What would be the edit distance between "mer" and "merlot"? Would
> > > it be less that 1.5 which I reckon would be the value of
> > > length(term)*0.5 as detailed in the javadocs? Seems unlikely, but
> > > I don't really know anything about the Levenshtein (edit distance)
> algorithm as used by FuzzyQuery.
> > > Wouldn't a PrefixQuery be more appropriate here?
> > >
> > >
> > > --
> > > Ian.
> > >
> > > On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
> > > <[email protected]>
> > > wrote:
> > > > Unfortunately lowercasing doesn't help.
> > > > Also, doesn't the FuzzyQuery ignore casing?
> > > >
> > > >> -----Ursprüngliche Nachricht-----
> > > >> Von: Ian Lea [mailto:[email protected]]
> > > >> Gesendet: Dienstag, 3. Mai 2011 11:06
> > > >> An: [email protected]
> > > >> Betreff: Re: "fuzzy prefix" search
> > > >>
> > > >> Mer != mer. The latter will be what is indexed because
> > > >> StandardAnalyzer calls LowerCaseFilter.
> > > >>
> > > >> --
> > > >> Ian.
> > > >>
> > > >>
> > > >> On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
> > > <[email protected]>
> > > >> wrote:
> > > >> > Sorry for coming back to my issue. Can anybody explain why my
> > > "simple"
> > > >> unit test below fails? Any hint/help appreciated.
> > > >> >
> > > >> > Directory directory = new RAMDirectory(); IndexWriter
> > > >> > indexWriter = new IndexWriter( directory, new
> > > >> > StandardAnalyzer(
> > > Version.LUCENE_31
> > > >> > ), IndexWriter.MaxFieldLength.UNLIMITED ); Document document
> =
> > > new
> > > >> > Document(); document.add( new Field( "test", "Merlot",
> > > >> > Field.Store.YES, Field.Index.ANALYZED ) );
> > > >> > indexWriter.addDocument(
> > > >> > document ); IndexReader indexReader =
> > > indexWriter.getReader();
> > > >> > IndexSearcher searcher = new IndexSearcher( indexReader );
> > > >> > Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0,
> > > >> > 10 ); // or Query q = new FuzzyQuery( new Term( "test", "Mer"
> > > >> > ), 0.5f); TopDocs result = searcher.search( q, 10 );
> > > >> > Assert.assertEquals( 1, result.totalHits );
> > > >> >
> > > >> > - Clemens
> > > >> >
> > > >> >> -----Ursprüngliche Nachricht-----
> > > >> >> Von: Clemens Wyss [mailto:[email protected]]
> > > >> >> Gesendet: Montag, 2. Mai 2011 23:01
> > > >> >> An: [email protected]
> > > >> >> Betreff: AW: "fuzzy prefix" search
> > > >> >>
> > > >> >> Is it the combination of FuzzyQuery and Term which makes the
> > > >> >> search to go for "word boundaries"?
> > > >> >>
> > > >> >> > -----Ursprüngliche Nachricht-----
> > > >> >> > Von: Clemens Wyss [mailto:[email protected]]
> > > >> >> > Gesendet: Montag, 2. Mai 2011 14:13
> > > >> >> > An: [email protected]
> > > >> >> > Betreff: AW: "fuzzy prefix" search
> > > >> >> >
> > > >> >> > I tried this too, but unfortunately I only get hits when
> > > >> >> > the search term is a least as long as the word to be looked up.
> > > >> >> >
> > > >> >> > E.g.:
> > > >> >> > ...
> > > >> >> > Directory directory = new RAMDirectory(); IndexWriter
> > > >> >> > indexWriter = new IndexWriter( directory, >> >> >
> > > IndexManager.getIndexingAnalyzer(
> > > >> >> LOCALE_DE ),
> > > >> >> > IndexWriter.MaxFieldLength.UNLIMITED );
> > > >> >> >
> > > >> >> > Document document = new Document(); document.add( new
> > > Field(
> > > >> >> > "test", "Merlot",
> > > >> >> > Field.Store.YES, Field.Index.ANALYZED ) );
> > > >> >> indexWriter.addDocument(
> > > >> >> > document );
> > > >> >> >
> > > >> >> > IndexReader indexReader = indexWriter.getReader();
> > > >> >> > IndexSearcher
> > > >> >> > searcher = new IndexSearcher( indexReader ); >> >> >
> > > >> >> > Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.6f,
> > > >> >> > 1 ); TopDocs result = searcher.search( q, 10 );
> > > >> >> > Assert.assertEquals(
> > > >> >> > 1,
> > > >> >> result.totalHits ); ...
> > > >> >> >
> > > >> >> > > -----Ursprüngliche Nachricht-----
> > > >> >> > > Von: Uwe Schindler [mailto:[email protected]]
> > > >> >> > > Gesendet: Montag, 2. Mai 2011 13:50
> > > >> >> > > An: [email protected]
> > > >> >> > > Betreff: RE: "fuzzy prefix" search
> > > >> >> > >
> > > >> >> > > Hi,
> > > >> >> > >
> > > >> >> > > You can pass an integer to FuzzyQuery which defines the
> > > >> >> > > number of characters that are seen as prefix. So all
> > > >> >> > > terms must match
> > > >> >> > > this prefix and the rest of each term is matched using fuzzy.
> > > >> >> > >
> > > >> >> > > Uwe
> > > >> >> > >
> > > >> >> > > -----
> > > >> >> > > Uwe Schindler
> > > >> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >> http://www.thetaphi.de
> > > >> >> > > eMail: [email protected]
> > > >> >> > >
> > > >> >> > > > -----Original Message-----
> > > >> >> > > > From: Clemens Wyss [mailto:[email protected]]
> > > >> >> > > > Sent: Monday, May 02, 2011 1:47 PM >> > > > To:
> > > >> [email protected]
> > > >> >> > > > Subject: "fuzzy prefix" search >> >> > > >
> > > >> >> > > > I'd like to search fuzzily but not on a full term.
> > > >> >> > > > E.g.
> > > >> >> > > > I have a text "Merlot del Ticino"
> > > >> >> > > > I'd like
> > > >> >> > > > "mer", "merr", "melo", ... to match.
> > > >> >> > > >
> > > >> >> > > > If I use FuzzyQuery only "merlot, "merlott" hit. What
> > > >> >> > > > Query-combination should I use?
> > > >> >> > > >
> > > >> >> > > > Thx
> > > >> >> > > > Clemens
> > > >> >> > > >
> > > >> >> > > >
> > > >> >> > > >
> > > >> >> > > > --------------------------------------------------------
> > > >> >> > > > ----
> > > >> >> > > > ---
> > > >> >> > > > ---
> > > >> >> > > > --
> > > >> >> > > > - To unsubscribe, e-mail:
> > > >> >> > > > [email protected]
> > > >> >> > > > For additional commands, e-mail:
> > > >> >> > > > [email protected] >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > ----------------------------------------------------------
> > > >> >> > > ----
> > > >> >> > > ---
> > > >> >> > > ---
> > > >> >> > > - To unsubscribe, e-mail:
> > > >> >> > > [email protected]
> > > >> >> > > For additional commands, e-mail:
> > > >> >> > > [email protected]
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> --------------------------------------------------------------
> > > >> >> --
> > > >> >> > ---
> > > >> >> > -- To unsubscribe, e-mail:
> > > >> >> > [email protected]
> > > >> >> > For additional commands, e-mail:
> > > >> >> > [email protected]
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --------------------------------------------------------------
> > > >> >> ----
> > > >> >> --- To unsubscribe, e-mail:
> > > >> >> [email protected]
> > > >> >> For additional commands, e-mail:
> > > [email protected] >> >
> > > >> >
> > > >> >
> > > >> > ---------------------------------------------------------------
> > > >> > ----
> > > >> > -- To unsubscribe, e-mail:
> > > [email protected]
> > > >> > For additional commands, e-mail:
> > > [email protected] >> >
> > > >> >
> > > >>
> > > >>
> > > >> -----------------------------------------------------------------
> > > >> ----
> > > >> To unsubscribe, e-mail: [email protected]
> > > >> For additional commands, e-mail:
> > > [email protected] >
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > ---
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> > >
> > >
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]