Great, your stemmer does the job I expected for Umlaut. Thanks. Has someone an idea for composed words ("betreuung" is not found in a doc containing "Kundenbetreuung")?
Marc. ----- Original Message ----- From: "Clemens Marschner" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, October 02, 2002 2:36 PM Subject: Re: Help for german queries > Hm sorry I don't have the time right now, but I think it took me 10 minutes > to discover the location where I had to do the changes. > I thought ä=ae would already be included. > I included my GermanStemmer version in this post. Sorry I can't do > CVSing/diff'ing at the moment. > The stemmer does ä->a and ae->a and doesn't distinguish between uppercase > and lowercase. I'm not a linguist, so I can't say if it does overstemming. I > commented out the expression below > > // "t" occurs only as suffix of verbs. > else if ( buffer.charAt( buffer.length() - 1 ) == 't' /*&& > !uppercase*/ ) { > buffer.deleteCharAt( buffer.length() - 1 ); > } > else { > doMore = false; > } > > Hope that helps > > Clemens > > ----- Original Message ----- > From: "Marc Guillemot" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Wednesday, October 02, 2002 12:47 PM > Subject: Re: Help for german queries > > > > The problem/question is not on the first letter case because but only on > the > > equivalence between "ä" and "ae" for instance. > > > > in my tests, searching for: > > - Geschäft -> 13 results > > - geschäft -> 0 result > > - Geschaeft -> 0 result > > - geschaeft -> 0 result > > > > Marc. > > > > > > ----- Original Message ----- > > From: "Clemens Marschner" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Tuesday, October 01, 2002 1:16 PM > > Subject: Re: Help for german queries > > > > > > > there's a "feature" in the German stemmer (I would call it a bug) that > > > treats words ending with "t" differently if they start with a capital or > > > non-capital letter. Are you sure you didn't type "geschäft" and > > "Geschaeft"? > > > Cause that's supposedly stemmed differently. > > > > > > --Clemens > > > > > > ----- Original Message ----- > > > From: "Marc Guillemot" <[EMAIL PROTECTED]> > > > To: <[EMAIL PROTECTED]> > > > Sent: Tuesday, October 01, 2002 9:40 AM > > > Subject: Help for german queries > > > > > > > > > > Hi, > > > > > > > > I've performed some tests with Lucene for german indexation/search but > I > > > > don't get the results I expected: > > > > > > > > - Umlaut: > > > > search for: > > > > - "Geschäft" -> x results > > > > - "Geschaeft" -> no result > > > > Is there an option in the standard german classes to make the 2 > searches > > > > above equivalent? > > > > > > > > - Composed words: > > > > "betreuung" is not found in a doc containing "Kundenbetreuung" > > > > > > > > Any suggestions? > > > > > > > > Marc. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>