I had to moderate both Jonathan and Jon's messages in to the list. Please subscribe to the list and post to it with the address you've subscribed. I cannot always guarantee I'll catch moderation messages and send them through in a timely fashion.

        Erik

On Mar 1, 2005, at 6:18 AM, Jonathan O'Connor wrote:

Jon,
I too found some problems with the German analyser recently. Here's what
may help:
1. You can try reading Joerg Caumanns' paper "A Fast and Simple Stemming
Algorithm for German Words". This paper describes the algorithm
implemented by GermanAnalyser.
2. I guess German nouns all capitalized, so maybe that's why. Although you
would want to be indexing well written German and not emails or text
messages!
3. The German Stemmer converts umlauts into some funny form (the code is a
bit tricky, and I didn't spend any time looking at it), so maybe thats why
you can't find umlauts properly. I think the main reason for this umlaut
change is that many plurals are formed by umlauting: E.g. Haus, Haeuser
(that ae is a umlaut).


Finally, to really understand what's happening, get your hands on Luke. I
just got it last week, and its brilliant. It shows you everything about
your indexes. You can also feed text to an Analyser, and see what it makes
of it. This will show you the real reason why your umlaut search is
failing.
Ciao,
Jonathan O'Connor
XCOM Dublin




"Jon Humble" <[EMAIL PROTECTED]>
01/03/2005 09:35
Please respond to
"Lucene Users List" <lucene-user@jakarta.apache.org>


To <lucene-user@jakarta.apache.org> cc

Subject
Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]






Hello,

We?re using the GermanAnalyzer/Stemmer to index/search our (German)
Website.
I have a few questions:

(1) Why is the GermanAnalyzer case-sensitive? None of the other
language indexers seem to be. What does this feature add?
(2) With the German Analyzer, wildcard searches containing extended
German characters do not seem to work. So, a* is fine but anä* or ö*
always find zero results.
(3) In a similar vein to (2), wildcard searches with escaped special
characters fail to find results. So a search for co\-operative works but
a search for co\-op* fails.


I will be grateful for any light that can be shed on these problems.

With Thanks,

Jon.

Jon Humble
BSc (hons,)
Software Engineer
eMail: [EMAIL PROTECTED]

TecSphere Ltd
Centre for Advanced Industry
Coble Dene, Royal Quays
Newcastle upon Tyne NE29 6DE
United Kingdom

Direct Dial: +44 (191) 270 31 06
Fax: +44 (191) 270 31 09
http://www.tecsphere.com






*** Aktuelle Veranstaltungen der XCOM AG ***

XCOM laedt ein zur IBM Workplace Roadshow in Berlin (02.03.2005)
Anmeldung und Information unter http://lotus.xcom.de/events

Workshop-Reihe "Mobilisierung von Lotus Notes Applikationen" in Berlin (05.03.2005)
Anmeldung und Information unter http://lotus.xcom.de/events



*** XCOM AG Legal Disclaimer ***

Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist allein für den Gebrauch durch den vorgesehenen Empfaenger bestimmt. Dritten ist das Lesen, Verteilen oder Weiterleiten dieser E-Mail untersagt. Wir bitten, eine fehlgeleitete E-Mail unverzueglich vollstaendig zu loeschen und uns eine Nachricht zukommen zu lassen.

This email may contain material that is confidential and for the sole use of the intended recipient. Any review, distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to