Hi again,

Sorry, the TokenFilter for decomposing will also add the original token to
the filter, so for Donaudampfschifffahrtskapitän it will produce the
following tokens:

Donaudampfschifffahrtskapitän, donau, dampf, schiff, fahrts, kapitän

If you assume that, you would use the Decompounder only during Indexing and
on the query side you would leave this tokenfilter out (use 2 different
analyszers like solr does for its "index" and "query" analyzers). No need
for separate fields. On query side no decompounding is done so you can enter
any of the above terms and get a hit.

Stemming should be done after decompounding.

Sorry for misinformation before, sometimes you have to read the
documentation!
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Wednesday, January 26, 2011 5:34 PM
> To: java-user@lucene.apache.org
> Subject: RE: Highlight Wildcard Queries: Scores
> 
> You can always decompose because QueryParser will also decompose and
> will do-the-right-thing (internal using a PhraseQuery - don't hurt me,
Robert).
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Wulf Berschin [mailto:bersc...@dosco.de]
> > Sent: Wednesday, January 26, 2011 5:07 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Highlight Wildcard Queries: Scores
> >
> > Hallo Uwe,
> >
> > yes, thanks for the hint, that sounds good, but it seems to me I would
> then
> > need more fields for all our search modes:
> >
> > Now we have the fields "contents" without stoppwords and with
> stemming
> > and "contents-unstemmed" whithout stemming.
> >
> > The search options are:
> > - whole word (search "contents", no asterisks are being added before
> > search)
> > - exact match (search "contents-unstemmed", implies whole word)
> >
> > When decomposition comes into play I will need a third field
> > "contents- undecomposed" (sorry) to perform the whole word search.
> > Furthermore the contents-unstemmed should not be decomposed as well.
> >
> > Would you still prefer this approach?
> >
> > Viele Grüße aus Heidelberg
> > Wulf
> >
> >
> >
> >
> >
> >
> > Am 26.01.2011 16:00, schrieb Uwe Schindler:
> > > Hi Wulf,
> > >
> > > You should consider decompounding! There are filters based on
> > > dictionaries that support decompounding german words. It’s a
> > > TokenFilter to be put into your analysis chain.
> > > There is a simple Lucene-Rule: Whenever you need wildcards think
> > > about your analysis, you probably did something wrong :-) Add
> > > stemming, decompounding, synonyms,...
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > eMail: u...@thetaphi.de
> > >
> > >
> > >> -----Original Message-----
> > >> From: Wulf Berschin [mailto:bersc...@dosco.de]
> > >> Sent: Wednesday, January 26, 2011 3:56 PM
> > >> To: java-user@lucene.apache.org
> > >> Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
> > >>
> > >> Hi Erick,
> > >>
> > >> good points, but:
> > >>
> > >> our index is fed with german text. In german (in contrast to english)
> > > nouns
> > >> are just appended to create new words. E.g.
> > >>
> > >> Kaffee
> > >> Kaffeemaschine
> > >> Kaffeemaschinensatzbehälter
> > >>
> > >> In our scenario standard fulltext search on "Maschine" shall present
> > >> all
> > > of
> > >> these nouns. That's why we add * before and after on each term.
> > >>
> > >> Of course we provide an option "full words only" which finds none of
> > > these.
> > >>
> > >> Since we do not wrap * around words shorter than 4 characters we
> > >> weren't yet faced with the too many clauses exception.
> > >>
> > >> Greetings
> > >> Wulf
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to