Re: Wildcard-Search Solr 3.5.0
Chiming in late here, just back from vacation. But off the top of my head, I don't see any reason SnowballPorterFilterFactory shouldn't be MultiTermAware. I've created https://issues.apache.org/jira/browse/SOLR-3503 as a placeholder. Erick On Fri, May 25, 2012 at 1:31 PM, spr...@gmx.eu wrote: I don't know the specific rules in these specific stemmers, but generally a less aggressive stemming (e.g., plural-only) of paintings would be painting, while a more aggressive stemming would be paint. For some aggressive stemmers the stemmed word is not even a word. Sounds logically :) It would be nice to have doc with some example words for each stemmer. Absolutely! Thx alot!
Re: Wildcard-Search Solr 3.5.0
And I closed the JIRA, see the comments. But the short form is that it's not worth the effort because of the edge cases. Jack writes up some of them; the short form is what does stemming do with terms like organiz* . Sure, it would produce one token (which is the main restriction on a MultiTermAware filter), but the output might not be anything equivalent to the stem of organization, maybe not even organize. Better to avoid that rat-hole, it seems like one of those problems that could suck up enormous amounts of time and _still_ not do what's expected. If you _really_ want to try this, you could always define your own multiterm analysis component that included the stemmer, see: http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ But don't say I didn't warn you G... Best Erick On Sun, Jun 3, 2012 at 8:25 AM, Erick Erickson erickerick...@gmail.com wrote: Chiming in late here, just back from vacation. But off the top of my head, I don't see any reason SnowballPorterFilterFactory shouldn't be MultiTermAware. I've created https://issues.apache.org/jira/browse/SOLR-3503 as a placeholder. Erick On Fri, May 25, 2012 at 1:31 PM, spr...@gmx.eu wrote: I don't know the specific rules in these specific stemmers, but generally a less aggressive stemming (e.g., plural-only) of paintings would be painting, while a more aggressive stemming would be paint. For some aggressive stemmers the stemmed word is not even a word. Sounds logically :) It would be nice to have doc with some example words for each stemmer. Absolutely! Thx alot!
RE: Wildcard-Search Solr 3.5.0
Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field type. These two options... less / more aggressive. Aggressive in terms of what? Thank you! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Freitag, 25. Mai 2012 03:25 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 I tried it and it does appear to be the SnowballPorterFilterFactory that normally does the accent folding but can't here because it is not multi-term aware. I did notice that the text_de field type that comes in the Solr 3.6 example schema handles your case fine. It uses the GermanNormalizationFilterFactory to fold accented characters and is multi-term aware. Any particular reason you're not using the stock text_de field type? It also has three stemming options which might be sufficient for your needs. In any case, try to make your text_de field type closer to the stock version, and try to use GermanNormalizationFilterFactory, and that may be good enough for your situation.
Re: Wildcard-Search Solr 3.5.0
I don't know the specific rules in these specific stemmers, but generally a less aggressive stemming (e.g., plural-only) of paintings would be painting, while a more aggressive stemming would be paint. For some aggressive stemmers the stemmed word is not even a word. It would be nice to have doc with some example words for each stemmer. -- Jack Krupansky -Original Message- From: spr...@gmx.eu Sent: Friday, May 25, 2012 5:59 AM To: solr-user@lucene.apache.org Subject: RE: Wildcard-Search Solr 3.5.0 Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field type. These two options... less / more aggressive. Aggressive in terms of what? Thank you! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Freitag, 25. Mai 2012 03:25 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 I tried it and it does appear to be the SnowballPorterFilterFactory that normally does the accent folding but can't here because it is not multi-term aware. I did notice that the text_de field type that comes in the Solr 3.6 example schema handles your case fine. It uses the GermanNormalizationFilterFactory to fold accented characters and is multi-term aware. Any particular reason you're not using the stock text_de field type? It also has three stemming options which might be sufficient for your needs. In any case, try to make your text_de field type closer to the stock version, and try to use GermanNormalizationFilterFactory, and that may be good enough for your situation.
RE: Wildcard-Search Solr 3.5.0
I don't know the specific rules in these specific stemmers, but generally a less aggressive stemming (e.g., plural-only) of paintings would be painting, while a more aggressive stemming would be paint. For some aggressive stemmers the stemmed word is not even a word. Sounds logically :) It would be nice to have doc with some example words for each stemmer. Absolutely! Thx alot!
Re: Wildcard-Search Solr 3.5.0
I tried it and it does appear to be the SnowballPorterFilterFactory that normally does the accent folding but can't here because it is not multi-term aware. I did notice that the text_de field type that comes in the Solr 3.6 example schema handles your case fine. It uses the GermanNormalizationFilterFactory to fold accented characters and is multi-term aware. Any particular reason you're not using the stock text_de field type? It also has three stemming options which might be sufficient for your needs. In any case, try to make your text_de field type closer to the stock version, and try to use GermanNormalizationFilterFactory, and that may be good enough for your situation. -- Jack Krupansky -Original Message- From: spr...@gmx.eu Sent: Wednesday, May 23, 2012 10:16 AM To: solr-user@lucene.apache.org Subject: RE: Wildcard-Search Solr 3.5.0 I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. Yes, I think this hinders the automagically multiterm awarness to do it's job. Could an own analyzer chain with analyzer type=multiterm help? Like described (very, very short, too short...) here: http://wiki.apache.org/solr/MultitermQueryAnalysis
RE: Wildcard-Search Solr 3.5.0
No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you
Re: Wildcard-Search Solr 3.5.0
what about bä*-hits? -- Dmitry On Wed, May 23, 2012 at 2:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you -- Regards, Dmitry Kan
RE: Wildcard-Search Solr 3.5.0
No. No hits for bä*. It's something with the umlauts but I have no idea what... -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: Mittwoch, 23. Mai 2012 13:36 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 what about bä*-hits? -- Dmitry On Wed, May 23, 2012 at 2:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you -- Regards, Dmitry Kan
Re: Wildcard-Search Solr 3.5.0
do umlauts arrive properly on the server side, no encoding issues? Check the query params of the response xml/json/.. set debugQuery to true as well to see if it produces any useful diagnostic info. On Wed, May 23, 2012 at 2:58 PM, spr...@gmx.eu wrote: No. No hits for bä*. It's something with the umlauts but I have no idea what... -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: Mittwoch, 23. Mai 2012 13:36 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 what about bä*-hits? -- Dmitry On Wed, May 23, 2012 at 2:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you -- Regards, Dmitry Kan -- Regards, Dmitry Kan
RE: Wildcard-Search Solr 3.5.0
-Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: Mittwoch, 23. Mai 2012 14:02 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 do umlauts arrive properly on the server side, no encoding issues? Yes, works fine. It must, since I have hits for Bär or bär. It's just the combination between umlauts and wildcards. Must be something with the automagically Multiterm feature in Solr 3.6.
Re: Wildcard-Search Solr 3.5.0
Maybe a filter like ISOLatin1AccentFilter that doesn't get applied when using wildcards? How do the terms actually appear in the index? Jens On 05/23/2012 01:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you
RE: Wildcard-Search Solr 3.5.0
Maybe a filter like ISOLatin1AccentFilter that doesn't get applied when using wildcards? How do the terms actually appear in the index? Bär get indexed as bar. I use not ISOLatin1AccentFilter . My field def is this: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=German2 / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=German2 / /analyzer /fieldType /types
RE: Wildcard-Search Solr 3.5.0
I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. -Michael
RE: Wildcard-Search Solr 3.5.0
I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. Yes, I think this hinders the automagically multiterm awarness to do it's job. Could an own analyzer chain with analyzer type=multiterm help? Like described (very, very short, too short...) here: http://wiki.apache.org/solr/MultitermQueryAnalysis
RE: Wildcard-Search Solr 3.5.0
The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you
Re: Wildcard-Search Solr 3.5.0
The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
RE: Wildcard-Search Solr 3.5.0
Hi Ahmet, Please see http://wiki.apache.org/solr/MultitermQueryAnalysis so your advice is to upgrade to 3.6? Thank you
RE: Wildcard-Search Solr 3.5.0
so your advice is to upgrade to 3.6? Or, as a workaround, you can lowercase wildcard queries on the client side.