Hallo Uwe,

yes, thanks for the hint, that sounds good, but it seems to me I would then need more fields for all our search modes:

Now we have the fields "contents" without stoppwords and with stemming and "contents-unstemmed" whithout stemming.

The search options are:
- whole word (search "contents", no asterisks are being added before search)
- exact match (search "contents-unstemmed", implies whole word)

When decomposition comes into play I will need a third field "contents-undecomposed" (sorry) to perform the whole word search. Furthermore the contents-unstemmed should not be decomposed as well.

Would you still prefer this approach?

Viele Grüße aus Heidelberg
Wulf






Am 26.01.2011 16:00, schrieb Uwe Schindler:
Hi Wulf,

You should consider decompounding! There are filters based on dictionaries
that support decompounding german words. It’s a TokenFilter to be put into
your analysis chain.
There is a simple Lucene-Rule: Whenever you need wildcards think about your
analysis, you probably did something wrong :-) Add stemming, decompounding,
synonyms,...

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-----Original Message-----
From: Wulf Berschin [mailto:bersc...@dosco.de]
Sent: Wednesday, January 26, 2011 3:56 PM
To: java-user@lucene.apache.org
Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores

Hi Erick,

good points, but:

our index is fed with german text. In german (in contrast to english)
nouns
are just appended to create new words. E.g.

Kaffee
Kaffeemaschine
Kaffeemaschinensatzbehälter

In our scenario standard fulltext search on "Maschine" shall present all
of
these nouns. That's why we add * before and after on each term.

Of course we provide an option "full words only" which finds none of
these.

Since we do not wrap * around words shorter than 4 characters we weren't
yet faced with the too many clauses exception.

Greetings
Wulf



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to