RE: Highlight Wildcard Queries: Scores

Uwe Schindler Wed, 26 Jan 2011 08:43:47 -0800

Hi again,

Sorry, the TokenFilter for decomposing will also add the original token to
the filter, so for Donaudampfschifffahrtskapitän it will produce the
following tokens:


Donaudampfschifffahrtskapitän, donau, dampf, schiff, fahrts, kapitän

If you assume that, you would use the Decompounder only during Indexing and
on the query side you would leave this tokenfilter out (use 2 different
analyszers like solr does for its "index" and "query" analyzers). No need
for separate fields. On query side no decompounding is done so you can enter
any of the above terms and get a hit.

Stemming should be done after decompounding.

Sorry for misinformation before, sometimes you have to read the
documentation!
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Uwe Schindler [mailto:[email protected]]
> Sent: Wednesday, January 26, 2011 5:34 PM
> To: [email protected]
> Subject: RE: Highlight Wildcard Queries: Scores
> 
> You can always decompose because QueryParser will also decompose and
> will do-the-right-thing (internal using a PhraseQuery - don't hurt me,
Robert).
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
> 
> 
> > -----Original Message-----
> > From: Wulf Berschin [mailto:[email protected]]
> > Sent: Wednesday, January 26, 2011 5:07 PM
> > To: [email protected]
> > Subject: Re: Highlight Wildcard Queries: Scores
> >
> > Hallo Uwe,
> >
> > yes, thanks for the hint, that sounds good, but it seems to me I would
> then
> > need more fields for all our search modes:
> >
> > Now we have the fields "contents" without stoppwords and with
> stemming
> > and "contents-unstemmed" whithout stemming.
> >
> > The search options are:
> > - whole word (search "contents", no asterisks are being added before
> > search)
> > - exact match (search "contents-unstemmed", implies whole word)
> >
> > When decomposition comes into play I will need a third field
> > "contents- undecomposed" (sorry) to perform the whole word search.
> > Furthermore the contents-unstemmed should not be decomposed as well.
> >
> > Would you still prefer this approach?
> >
> > Viele Grüße aus Heidelberg
> > Wulf
> >
> >
> >
> >
> >
> >
> > Am 26.01.2011 16:00, schrieb Uwe Schindler:
> > > Hi Wulf,
> > >
> > > You should consider decompounding! There are filters based on
> > > dictionaries that support decompounding german words. Its a
> > > TokenFilter to be put into your analysis chain.
> > > There is a simple Lucene-Rule: Whenever you need wildcards think
> > > about your analysis, you probably did something wrong :-) Add
> > > stemming, decompounding, synonyms,...
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > eMail: [email protected]
> > >
> > >
> > >> -----Original Message-----
> > >> From: Wulf Berschin [mailto:[email protected]]
> > >> Sent: Wednesday, January 26, 2011 3:56 PM
> > >> To: [email protected]
> > >> Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
> > >>
> > >> Hi Erick,
> > >>
> > >> good points, but:
> > >>
> > >> our index is fed with german text. In german (in contrast to english)
> > > nouns
> > >> are just appended to create new words. E.g.
> > >>
> > >> Kaffee
> > >> Kaffeemaschine
> > >> Kaffeemaschinensatzbehälter
> > >>
> > >> In our scenario standard fulltext search on "Maschine" shall present
> > >> all
> > > of
> > >> these nouns. That's why we add * before and after on each term.
> > >>
> > >> Of course we provide an option "full words only" which finds none of
> > > these.
> > >>
> > >> Since we do not wrap * around words shorter than 4 characters we
> > >> weren't yet faced with the too many clauses exception.
> > >>
> > >> Greetings
> > >> Wulf
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Highlight Wildcard Queries: Scores

Reply via email to