Re: wildcard search doesn't fetch results when field has whi
Yes. But You haven’t told us what _type_ of field you’re working with though. If it’s a “string” type, then ComplexPhraseQueryParser won’t work. Looking again at your example it looks as though you are using strings. Then try abc\ d* Adding debug=query to your url will show you how the query gets parsed and may help considerably. Best, Erick > On Mar 31, 2019, at 7:24 AM, Ahemad Ali > wrote: > > Erick,I tried complexqueryparser, still no result.Escape white space, do you > mean to say using "\" ?Thanks,Ahemad > > Sent from Yahoo Mail on Android > > On Sun, Mar 31, 2019 at 1:22, Erick Erickson wrote: > Try complexphrasequeryparser. If (and only if) you always want to search > from the beginning of the content, you might be able to use string rather > than text-based Fields but make sure to escape whitespace... > > Best, > Erick > > On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID > wrote: > >> Hi , >> I have field with white spaces and special characters on which indexing >> needs to be done to do wildcard querying. >> It works for most of the scnearios with wildcard search. >> e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali" >> then search with ali* gives this three results. >> >> But I am not able to search with say - ali a* >> >> Search with query q="ali abc" gives exact match and desired result. >> >> I want to do wildcard search where criteria can include spaces like >> example - "ahemad a* or ahemad a* >> >> >> i.e. if space is present then I am not able to to wildcard search. >> >> Is there any way by which wildcard search will be achieved even if space >> is present in token. >> >> The field type have is below: >> >> > sortMissingLast="true"> >> >> >> >> >> >> >> >> > replacement=""replace="all" /> >> >> >> >> >> >> >> >> >> >> >> >> > replacement=""replace="all" /> >> >> >> >> >> Any help would be great. >> Thanks,Ahemad Ali >
Re: wildcard search doesn't fetch results when field has whi
Erick,I tried complexqueryparser, still no result.Escape white space, do you mean to say using "\" ?Thanks,Ahemad Sent from Yahoo Mail on Android On Sun, Mar 31, 2019 at 1:22, Erick Erickson wrote: Try complexphrasequeryparser. If (and only if) you always want to search from the beginning of the content, you might be able to use string rather than text-based Fields but make sure to escape whitespace... Best, Erick On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID wrote: > Hi , > I have field with white spaces and special characters on which indexing > needs to be done to do wildcard querying. > It works for most of the scnearios with wildcard search. > e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali" > then search with ali* gives this three results. > > But I am not able to search with say - ali a* > > Search with query q="ali abc" gives exact match and desired result. > > I want to do wildcard search where criteria can include spaces like > example - "ahemad a* or ahemad a* > > > i.e. if space is present then I am not able to to wildcard search. > > Is there any way by which wildcard search will be achieved even if space > is present in token. > > The field type have is below: > > sortMissingLast="true"> > > > > > > > > replacement=""replace="all" /> > > > > > > > > > > > > replacement=""replace="all" /> > > > > > Any help would be great. > Thanks,Ahemad Ali
Re: wildcard search doesn't fetch results when field has white spaces and special charecters
Try complexphrasequeryparser. If (and only if) you always want to search from the beginning of the content, you might be able to use string rather than text-based Fields but make sure to escape whitespace... Best, Erick On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID wrote: > Hi , > I have field with white spaces and special characters on which indexing > needs to be done to do wildcard querying. > It works for most of the scnearios with wildcard search. > e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali" > then search with ali* gives this three results. > > But I am not able to search with say - ali a* > > Search with query q="ali abc" gives exact match and desired result. > > I want to do wildcard search where criteria can include spaces like > example - "ahemad a* or ahemad a* > > > i.e. if space is present then I am not able to to wildcard search. > > Is there any way by which wildcard search will be achieved even if space > is present in token. > > The field type have is below: > > sortMissingLast="true"> > > > > > > > > replacement=""replace="all" /> > > > > > > > > > > > > replacement=""replace="all" /> > > > > > Any help would be great. > Thanks,Ahemad Ali
wildcard search doesn't fetch results when field has white spaces and special charecters
Hi , I have field with white spaces and special characters on which indexing needs to be done to do wildcard querying. It works for most of the scnearios with wildcard search. e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali" then search with ali* gives this three results. But I am not able to search with say - ali a* Search with query q="ali abc" gives exact match and desired result. I want to do wildcard search where criteria can include spaces like example - "ahemad a* or ahemad a* i.e. if space is present then I am not able to to wildcard search. Is there any way by which wildcard search will be achieved even if space is present in token. The field type have is below: Any help would be great. Thanks,Ahemad Ali
Re: Edismax leading wildcard search
Well, the other option is to allow leading wildcards, but use ReversedWildcardFilterFactory. Admittedly that increases the size of your index, but apparently your users expect leading wildcards so why not support them? Best, Erick On Fri, Dec 7, 2018 at 6:58 AM Kudrettin Güleryüz wrote: > > Hi, > > I am also wondering how to disable leading wildcards in Solr. Can you > please suggest how to disable leading wildcards in Solr? I know in Lucene > it is a flag that's set to false by default. > > > Do it on the client side. Just don't allow leading asterisks or question > marks in your query term. > > This does not look trivial to me. A search query can be very complicated. > How do you suggest to detect trailing wildcards from a complicated Lucene > query? > > Thank you > > On Fri, Dec 22, 2017 at 6:07 AM Michael Kuhlmann wrote: > > > Am 22.12.2017 um 11:57 schrieb Selvam Raman: > > > 1) how can i disable leading wildcard search > > > > Do it on the client side. Just don't allow leading asterisks or question > > marks in your query term. > > > > > 2) why leading wildcard search takes so much of time to give the > > response. > > > > > > > Because Lucene can't just look in the index for all terms beginning with > > something; it needs to look in all terms instead. Basically, indexed > > terms are in alphabetical order, but that doesn't help with leading > > wildcards. > > > > There's a ReversedWildcardFilterFactory in Solr to address this issue. > > > > -Michael > >
Re: Edismax leading wildcard search
Hi, I am also wondering how to disable leading wildcards in Solr. Can you please suggest how to disable leading wildcards in Solr? I know in Lucene it is a flag that's set to false by default. > Do it on the client side. Just don't allow leading asterisks or question marks in your query term. This does not look trivial to me. A search query can be very complicated. How do you suggest to detect trailing wildcards from a complicated Lucene query? Thank you On Fri, Dec 22, 2017 at 6:07 AM Michael Kuhlmann wrote: > Am 22.12.2017 um 11:57 schrieb Selvam Raman: > > 1) how can i disable leading wildcard search > > Do it on the client side. Just don't allow leading asterisks or question > marks in your query term. > > > 2) why leading wildcard search takes so much of time to give the > response. > > > > Because Lucene can't just look in the index for all terms beginning with > something; it needs to look in all terms instead. Basically, indexed > terms are in alphabetical order, but that doesn't help with leading > wildcards. > > There's a ReversedWildcardFilterFactory in Solr to address this issue. > > -Michael >
Re: Enable default wildcard search
Right. Sticking to index only processing here should resolve false matches to 3116 by [3115]. The log should have OutOfMemoryError: heap blah or something. That's the cause. On Fri, Dec 29, 2017 at 5:27 PM, Siarhei Chystsiakou wrote: > Thank you for your answer. > > I tried to use EdgeNGram under the same settings > > maxGramSize="25"/> > > the same problem emerged, the search was not exactly correct. For instance, > I need to find the figure 311570, I enter 3115 into the search bar, in the > result I get all the figures that start from 311 and not 3115. Should I > probably had to turn on this option for indexing only? > But I'm still concerned with the fact that in case of this option Solr > often crashed during indexing. How to turn on debug correctly so as to show > you detailed errors? > > > RU > Спасибо за Ваш ответ. > Я пробовал использовать EdgeNGram при таких же настройках > > maxGramSize="25"/> > > возникала такая же проблема, был не совсем правильный поиск. Например надо > найти число 311570 в поиск я ввожу 3115, в результате я получал все числа > которые начинались на 311 а не 3115. Возможно данную опцию надо было > включить только для индексации ? > Но меня все равно беспокоит, что при данной опции часто в процессе > индексации падал Solr. Как правильно включить debug что бы Вам показать > более детальные ошибки ? > > > > > 2017-12-28 22:47 GMT+01:00 Mikhail Khludnev : > > > Obviously, Chris has nothing in common with Christmas, hence this classic > > search behavior is correct. > > What people are asking here is autocomplete, and it's a separate UX and > > algorithms. > > You can start to explore different aspects of this field from > > https://lucidworks.com/2015/03/04/solr-suggester/ > > You see NGamming just freak the heap out. So, you can band aid it with > > EdgeNGram (and it's what you probably want to have) and add some heap to > > your poor server. > > Another approach, is to stop ngramming but try to really search by > wildcard > > with http://yonik.com/solr-query-parameter-substitution/ > > It should be something like q=${text}* and when client pass text=foo it > > searches for foo*, but it doesn't work for a few words and expensive as > > well. > > > > On Wed, Dec 27, 2017 at 3:34 PM, Siarhei Chystsiakou > > > wrote: > > > > > Hi everybody! > > > I try integration Solr 6.6.1 with my email server (dovecot 2.32). I > > have > > > the following settings: > > > > > > schema.xml - https://pastebin.com/1XXWTs8V > > > solrconfig.xml - https://pastebin.com/5HSswCcv > > > > > > But under these settings, the search works only on the full > coincidence, > > > for instance, if I search for Chris it doesn't find Christmas. The > > client > > > does not support wildcard search. I would like to know how to turn on > > > wildcard search for all queries. > > > > > > I tried to do that by adding the following line to schema.xml > > > > > > > maxGramSize="25"/> > > > > > > but when I added it, Solr 6.6.1 very often showed errors during the > > > indexing, which led to its full crash, even the web interface didn't > > > respond, only the full Solr restart helped. This problem emerged both > on > > > Solr 6.6.1 and Solr 7.2 > > > > > > Also, in case of this option, the search result was not what I > expected. > > > For example, when I searched for the word domain, the words domes and > > > domain were also included. I suppose, that from the point of view of > this > > > operation, the result is correct, but this is not what I need. > > > > > > That is why I would like to know, how to turn on the standard wildcard > > > search. As it is impossible on the client's side, I would like to > manage > > it > > > from the Solr side. > > > > > > Thanks. > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev
Re: Enable default wildcard search
Hi Rick! Yes, as soon as I get the required result I'll definitely publish it on GitHub. Dovecot default scheme doesn't suit me, when I use it the search works according to the full word, but I want to make it wildcard search. I don't have the solution. Hope the this group will help me. 2017-12-29 17:56 GMT+01:00 Rick Leir : > Siarhei: > Will you be putting up your system at github? I would like to Solr-ize my > dovecot. > > Maybe you saw this already: > https://github.com/dovecot/core/blob/master/doc/solr-schema.xml > > https://github.com/dovecot/core/blob/master/src/plugins/ > fts-solr/solr-connection.c > > https://github.com/dovecot/core/blob/master/src/plugins/ > fts-solr/fts-solr-plugin.h > > https://github.com/bdraco/dovecot/blob/master/doc/wiki/ > Plugins.FTS.Solr.txt > Cheers -- Rick > > On December 28, 2017 4:15:06 PM EST, Siarhei Chystsiakou < > brest...@gmail.com> wrote: > >Hi > >Does anyone have any idea how to fix this? > > > >2017-12-27 13:34 GMT+01:00 Siarhei Chystsiakou : > > > >> Hi everybody! > >> I try integration Solr 6.6.1 with my email server (dovecot 2.32). I > >have > >> the following settings: > >> > >> schema.xml - https://pastebin.com/1XXWTs8V > >> solrconfig.xml - https://pastebin.com/5HSswCcv > >> > >> But under these settings, the search works only on the full > >coincidence, > >> for instance, if I search for Chris it doesn't find Christmas. The > >client > >> does not support wildcard search. I would like to know how to turn on > >> wildcard search for all queries. > >> > >> I tried to do that by adding the following line to schema.xml > >> > >> >maxGramSize="25"/> > >> > >> but when I added it, Solr 6.6.1 very often showed errors during the > >> indexing, which led to its full crash, even the web interface didn't > >> respond, only the full Solr restart helped. This problem emerged both > >on > >> Solr 6.6.1 and Solr 7.2 > >> > >> Also, in case of this option, the search result was not what I > >expected. > >> For example, when I searched for the word domain, the words domes and > >> domain were also included. I suppose, that from the point of view of > >this > >> operation, the result is correct, but this is not what I need. > >> > >> That is why I would like to know, how to turn on the standard > >wildcard > >> search. As it is impossible on the client's side, I would like to > >manage it > >> from the Solr side. > >> > >> Thanks. > >> > >> > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Enable default wildcard search
Siarhei: Will you be putting up your system at github? I would like to Solr-ize my dovecot. Maybe you saw this already: https://github.com/dovecot/core/blob/master/doc/solr-schema.xml https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/solr-connection.c https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/fts-solr-plugin.h https://github.com/bdraco/dovecot/blob/master/doc/wiki/Plugins.FTS.Solr.txt Cheers -- Rick On December 28, 2017 4:15:06 PM EST, Siarhei Chystsiakou wrote: >Hi >Does anyone have any idea how to fix this? > >2017-12-27 13:34 GMT+01:00 Siarhei Chystsiakou : > >> Hi everybody! >> I try integration Solr 6.6.1 with my email server (dovecot 2.32). I >have >> the following settings: >> >> schema.xml - https://pastebin.com/1XXWTs8V >> solrconfig.xml - https://pastebin.com/5HSswCcv >> >> But under these settings, the search works only on the full >coincidence, >> for instance, if I search for Chris it doesn't find Christmas. The >client >> does not support wildcard search. I would like to know how to turn on >> wildcard search for all queries. >> >> I tried to do that by adding the following line to schema.xml >> >> maxGramSize="25"/> >> >> but when I added it, Solr 6.6.1 very often showed errors during the >> indexing, which led to its full crash, even the web interface didn't >> respond, only the full Solr restart helped. This problem emerged both >on >> Solr 6.6.1 and Solr 7.2 >> >> Also, in case of this option, the search result was not what I >expected. >> For example, when I searched for the word domain, the words domes and >> domain were also included. I suppose, that from the point of view of >this >> operation, the result is correct, but this is not what I need. >> >> That is why I would like to know, how to turn on the standard >wildcard >> search. As it is impossible on the client's side, I would like to >manage it >> from the Solr side. >> >> Thanks. >> >> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Enable default wildcard search
Thank you for your answer. I tried to use EdgeNGram under the same settings the same problem emerged, the search was not exactly correct. For instance, I need to find the figure 311570, I enter 3115 into the search bar, in the result I get all the figures that start from 311 and not 3115. Should I probably had to turn on this option for indexing only? But I'm still concerned with the fact that in case of this option Solr often crashed during indexing. How to turn on debug correctly so as to show you detailed errors? RU Спасибо за Ваш ответ. Я пробовал использовать EdgeNGram при таких же настройках возникала такая же проблема, был не совсем правильный поиск. Например надо найти число 311570 в поиск я ввожу 3115, в результате я получал все числа которые начинались на 311 а не 3115. Возможно данную опцию надо было включить только для индексации ? Но меня все равно беспокоит, что при данной опции часто в процессе индексации падал Solr. Как правильно включить debug что бы Вам показать более детальные ошибки ? 2017-12-28 22:47 GMT+01:00 Mikhail Khludnev : > Obviously, Chris has nothing in common with Christmas, hence this classic > search behavior is correct. > What people are asking here is autocomplete, and it's a separate UX and > algorithms. > You can start to explore different aspects of this field from > https://lucidworks.com/2015/03/04/solr-suggester/ > You see NGamming just freak the heap out. So, you can band aid it with > EdgeNGram (and it's what you probably want to have) and add some heap to > your poor server. > Another approach, is to stop ngramming but try to really search by wildcard > with http://yonik.com/solr-query-parameter-substitution/ > It should be something like q=${text}* and when client pass text=foo it > searches for foo*, but it doesn't work for a few words and expensive as > well. > > On Wed, Dec 27, 2017 at 3:34 PM, Siarhei Chystsiakou > wrote: > > > Hi everybody! > > I try integration Solr 6.6.1 with my email server (dovecot 2.32). I > have > > the following settings: > > > > schema.xml - https://pastebin.com/1XXWTs8V > > solrconfig.xml - https://pastebin.com/5HSswCcv > > > > But under these settings, the search works only on the full coincidence, > > for instance, if I search for Chris it doesn't find Christmas. The > client > > does not support wildcard search. I would like to know how to turn on > > wildcard search for all queries. > > > > I tried to do that by adding the following line to schema.xml > > > > maxGramSize="25"/> > > > > but when I added it, Solr 6.6.1 very often showed errors during the > > indexing, which led to its full crash, even the web interface didn't > > respond, only the full Solr restart helped. This problem emerged both on > > Solr 6.6.1 and Solr 7.2 > > > > Also, in case of this option, the search result was not what I expected. > > For example, when I searched for the word domain, the words domes and > > domain were also included. I suppose, that from the point of view of this > > operation, the result is correct, but this is not what I need. > > > > That is why I would like to know, how to turn on the standard wildcard > > search. As it is impossible on the client's side, I would like to manage > it > > from the Solr side. > > > > Thanks. > > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: Enable default wildcard search
Obviously, Chris has nothing in common with Christmas, hence this classic search behavior is correct. What people are asking here is autocomplete, and it's a separate UX and algorithms. You can start to explore different aspects of this field from https://lucidworks.com/2015/03/04/solr-suggester/ You see NGamming just freak the heap out. So, you can band aid it with EdgeNGram (and it's what you probably want to have) and add some heap to your poor server. Another approach, is to stop ngramming but try to really search by wildcard with http://yonik.com/solr-query-parameter-substitution/ It should be something like q=${text}* and when client pass text=foo it searches for foo*, but it doesn't work for a few words and expensive as well. On Wed, Dec 27, 2017 at 3:34 PM, Siarhei Chystsiakou wrote: > Hi everybody! > I try integration Solr 6.6.1 with my email server (dovecot 2.32). I have > the following settings: > > schema.xml - https://pastebin.com/1XXWTs8V > solrconfig.xml - https://pastebin.com/5HSswCcv > > But under these settings, the search works only on the full coincidence, > for instance, if I search for Chris it doesn't find Christmas. The client > does not support wildcard search. I would like to know how to turn on > wildcard search for all queries. > > I tried to do that by adding the following line to schema.xml > > > > but when I added it, Solr 6.6.1 very often showed errors during the > indexing, which led to its full crash, even the web interface didn't > respond, only the full Solr restart helped. This problem emerged both on > Solr 6.6.1 and Solr 7.2 > > Also, in case of this option, the search result was not what I expected. > For example, when I searched for the word domain, the words domes and > domain were also included. I suppose, that from the point of view of this > operation, the result is correct, but this is not what I need. > > That is why I would like to know, how to turn on the standard wildcard > search. As it is impossible on the client's side, I would like to manage it > from the Solr side. > > Thanks. > -- Sincerely yours Mikhail Khludnev
Re: Enable default wildcard search
Hi Does anyone have any idea how to fix this? 2017-12-27 13:34 GMT+01:00 Siarhei Chystsiakou : > Hi everybody! > I try integration Solr 6.6.1 with my email server (dovecot 2.32). I have > the following settings: > > schema.xml - https://pastebin.com/1XXWTs8V > solrconfig.xml - https://pastebin.com/5HSswCcv > > But under these settings, the search works only on the full coincidence, > for instance, if I search for Chris it doesn't find Christmas. The client > does not support wildcard search. I would like to know how to turn on > wildcard search for all queries. > > I tried to do that by adding the following line to schema.xml > > > > but when I added it, Solr 6.6.1 very often showed errors during the > indexing, which led to its full crash, even the web interface didn't > respond, only the full Solr restart helped. This problem emerged both on > Solr 6.6.1 and Solr 7.2 > > Also, in case of this option, the search result was not what I expected. > For example, when I searched for the word domain, the words domes and > domain were also included. I suppose, that from the point of view of this > operation, the result is correct, but this is not what I need. > > That is why I would like to know, how to turn on the standard wildcard > search. As it is impossible on the client's side, I would like to manage it > from the Solr side. > > Thanks. > >
Enable default wildcard search
Hi everybody! I try integration Solr 6.6.1 with my email server (dovecot 2.32). I have the following settings: schema.xml - https://pastebin.com/1XXWTs8V solrconfig.xml - https://pastebin.com/5HSswCcv But under these settings, the search works only on the full coincidence, for instance, if I search for Chris it doesn't find Christmas. The client does not support wildcard search. I would like to know how to turn on wildcard search for all queries. I tried to do that by adding the following line to schema.xml but when I added it, Solr 6.6.1 very often showed errors during the indexing, which led to its full crash, even the web interface didn't respond, only the full Solr restart helped. This problem emerged both on Solr 6.6.1 and Solr 7.2 Also, in case of this option, the search result was not what I expected. For example, when I searched for the word domain, the words domes and domain were also included. I suppose, that from the point of view of this operation, the result is correct, but this is not what I need. That is why I would like to know, how to turn on the standard wildcard search. As it is impossible on the client's side, I would like to manage it from the Solr side. Thanks.
Re: Edismax leading wildcard search
Am 22.12.2017 um 11:57 schrieb Selvam Raman: > 1) how can i disable leading wildcard search Do it on the client side. Just don't allow leading asterisks or question marks in your query term. > 2) why leading wildcard search takes so much of time to give the response. > Because Lucene can't just look in the index for all terms beginning with something; it needs to look in all terms instead. Basically, indexed terms are in alphabetical order, but that doesn't help with leading wildcards. There's a ReversedWildcardFilterFactory in Solr to address this issue. -Michael
Edismax leading wildcard search
Hi, Solr version - 6.4 Parser - Edismax Leading wildcard search is allowed in edismax. 1) how can i disable leading wildcard search 2) why leading wildcard search takes so much of time to give the response. -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Re: solr.TrieDoubleField deprecated with 7.1.0 but wildcard "*" search behaviour is different with solr.DoublePointField
AFAICT The behavior you're describing with Trie fields was never intentionally supported/documented? It appears that it only worked as a fluke side effect of how the default implementation of FieldType.getprefixQuery() was inherited by Trie fields *and* because "indexed=true" TrieFields use Terms (just like StrField) ... so prefix of "" (the empty string) matched all of the Trie terms in a field. (note that the syntax you're describing does *not* work for Trie fields that are "indexed=false docValues=true") In general, there seems to be a bit of a mess in terms of trying to specify "prefix queries" (which is what "foo_d:*" really is under the covers) or "wild card" queries against numeric fields. I created a jira to try and come to a concensus about how this should behave moving forward... https://issues.apache.org/jira/browse/SOLR-11746 ...but i would suggest you move away from depending on that syntax and use the officially supported/documented range query syntax (foo_d[* TO *]) instead. : some question about the new DoublePointField which should be used : instead of the TrieDoubleField in 7.1. ... : If i am using the deprecated one its possible to get a match for a : double field like this: : : test_d:* : : even in 7.1.0. : : But with the new DoublePointField, which you should use instead, you : won't get that match - you have to use e.g. [* TO *]. : Is this an intended change in runtime / query behaviour or some bug or : is it possible to restore that behaviour with the new field too? -Hoss http://www.lucidworks.com/
solr.TrieDoubleField deprecated with 7.1.0 but wildcard "*" search behaviour is different with solr.DoublePointField
Hi, some question about the new DoublePointField which should be used instead of the TrieDoubleField in 7.1. https://lucene.apache.org/solr/guide/7_1/field-types-included-with-solr.html If i am using the deprecated one its possible to get a match for a double field like this: test_d:* even in 7.1.0. But with the new DoublePointField, which you should use instead, you won't get that match - you have to use e.g. [* TO *]. Short recipe can be found here to have a look yourself: https://stackoverflow.com/questions/47473188/solr-7-1-querying-double-field-for-any-value-not-possible-with-anymore/47752445 Is this an intended change in runtime / query behaviour or some bug or is it possible to restore that behaviour with the new field too? kind regards Torsten smime.p7s Description: S/MIME cryptographic signature
RE: Solr Wildcard Search
A slightly more refined answer... In my experience with the systems I've worked with, Porter and other stemmers can be useful as a "fallback field" with a really low boost, but you should be really careful if you're only searching on one field. Cannot recommend Doug Turnbull and John Berryman's "Relevant Search" enough on how to layer fields...among many other great insights: https://www.manning.com/books/relevant-search -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017 9:20 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search At the very least the English possessive filter, which you have. Great! Depending on what your query log analysis finds -- perhaps users are pretty much only searching on nouns? -- you might consider EnglishMinimalStemFilterFactory. I wouldn't say that porter was or wasn't chosen intentionally. It may be good for some use cases. However, for the use cases I've seen, it has been disastrous. I have code that shows "equivalence sets" for analysis chain A vs analysis chain B...with some noise...assume same tokenization... I should probably share that code on github or fold it into Luke somehow? You can see this on a one-off basis in the Solr admin window via the Analysis tab, but to see this on your corpus/corpora across terms can be eye-opening, and then to cross-check it against query logs...quite powerful. On one corpus, when I compared the same analysis chain A without Porter and B with porter, the output is e.g.: "stemmed\tunstemmed #docs|unstemmed #docs..." public public 9834 | publication 1429 | publications 960 | publicly 662 | public's 176 | publicize 118 | publicized 107 | publicity 91 | publically 66 | publicizing 63 | publication's 6 | publicizes 4 | public_ 1 | publication_ 1 | publiced 1 effect effective 6329 | effect 3157 | effectively 1745 | effectiveness 1198 | effects 831 | effected 139 | effecting 85 | effectives 1 new new 13279 | newness 6 | newed 3 | newe 2 | newing 1 order order 7256 | orders 3125 | ordered 1840 | ordering 758 | orderly 241 | order's 17 | orderable 3 | orders_ 1 Imagine users searching for "publication" (~2500 docs) and getting back every document that mentions "public" (~10k). That's a huge problem in many circumstances. Good luck finding the name "newing". -Original Message- From: Georgy Nevsky [mailto:gnevsky.cn...@thomasnet.com] Sent: Thursday, November 30, 2017 8:31 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search I understand stemming reason. Thank you. What do you suggest to use for stemming instead of "Porter" ? I guess, it wasn't chosen intentionally. In the best we trust Georgy Nevsky -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017 8:25 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see what's going on. Use the Solr admin Analysis feature to see how your tokens are being modified by each step in the analysis chain. If you use solr admin and debug the query for "shipping", you see that it is stemmed to "ship"...hence all of your matches work. Porter doesn't have rules for words ending in "pp", so it doesn't stem "shipp" to "ship". So, your wildcard query is looking for words that start with "shipp", and given that "shipping" was stemmed to "ship", it won't find it. It would find "shippqrs" because porter wouldn't know what to do with that 😊 Again, Porter can be very dangerous if it doesn't align with user expectations. -Original Message- From: Atita Arora [mailto:atitaar...@gmail.com] Sent: Thursday, November 30, 2017 8:16 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky wrote: > I wish
RE: Solr Wildcard Search
At the very least the English possessive filter, which you have. Great! Depending on what your query log analysis finds -- perhaps users are pretty much only searching on nouns? -- you might consider EnglishMinimalStemFilterFactory. I wouldn't say that porter was or wasn't chosen intentionally. It may be good for some use cases. However, for the use cases I've seen, it has been disastrous. I have code that shows "equivalence sets" for analysis chain A vs analysis chain B...with some noise...assume same tokenization... I should probably share that code on github or fold it into Luke somehow? You can see this on a one-off basis in the Solr admin window via the Analysis tab, but to see this on your corpus/corpora across terms can be eye-opening, and then to cross-check it against query logs...quite powerful. On one corpus, when I compared the same analysis chain A without Porter and B with porter, the output is e.g.: "stemmed\tunstemmed #docs|unstemmed #docs..." public public 9834 | publication 1429 | publications 960 | publicly 662 | public's 176 | publicize 118 | publicized 107 | publicity 91 | publically 66 | publicizing 63 | publication's 6 | publicizes 4 | public_ 1 | publication_ 1 | publiced 1 effect effective 6329 | effect 3157 | effectively 1745 | effectiveness 1198 | effects 831 | effected 139 | effecting 85 | effectives 1 new new 13279 | newness 6 | newed 3 | newe 2 | newing 1 order order 7256 | orders 3125 | ordered 1840 | ordering 758 | orderly 241 | order's 17 | orderable 3 | orders_ 1 Imagine users searching for "publication" (~2500 docs) and getting back every document that mentions "public" (~10k). That's a huge problem in many circumstances. Good luck finding the name "newing". -Original Message- From: Georgy Nevsky [mailto:gnevsky.cn...@thomasnet.com] Sent: Thursday, November 30, 2017 8:31 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search I understand stemming reason. Thank you. What do you suggest to use for stemming instead of "Porter" ? I guess, it wasn't chosen intentionally. In the best we trust Georgy Nevsky -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017 8:25 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see what's going on. Use the Solr admin Analysis feature to see how your tokens are being modified by each step in the analysis chain. If you use solr admin and debug the query for "shipping", you see that it is stemmed to "ship"...hence all of your matches work. Porter doesn't have rules for words ending in "pp", so it doesn't stem "shipp" to "ship". So, your wildcard query is looking for words that start with "shipp", and given that "shipping" was stemmed to "ship", it won't find it. It would find "shippqrs" because porter wouldn't know what to do with that 😊 Again, Porter can be very dangerous if it doesn't align with user expectations. -Original Message- From: Atita Arora [mailto:atitaar...@gmail.com] Sent: Thursday, November 30, 2017 8:16 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky wrote: > I wish to understand if I can do something to get in result term > "shipping" > when search for "shipp*"? > > Here field definition: > multiValued="false"/> > > positionIncrementGap="100"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > Anything else can be important? Most configuration parameters are > default to Apache Solr 7.1.0. > > In the bes
RE: Solr Wildcard Search
I understand stemming reason. Thank you. What do you suggest to use for stemming instead of "Porter" ? I guess, it wasn't chosen intentionally. In the best we trust Georgy Nevsky -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017 8:25 AM To: solr-user@lucene.apache.org Subject: RE: Solr Wildcard Search The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see what's going on. Use the Solr admin Analysis feature to see how your tokens are being modified by each step in the analysis chain. If you use solr admin and debug the query for "shipping", you see that it is stemmed to "ship"...hence all of your matches work. Porter doesn't have rules for words ending in "pp", so it doesn't stem "shipp" to "ship". So, your wildcard query is looking for words that start with "shipp", and given that "shipping" was stemmed to "ship", it won't find it. It would find "shippqrs" because porter wouldn't know what to do with that 😊 Again, Porter can be very dangerous if it doesn't align with user expectations. -Original Message- From: Atita Arora [mailto:atitaar...@gmail.com] Sent: Thursday, November 30, 2017 8:16 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky wrote: > I wish to understand if I can do something to get in result term > "shipping" > when search for "shipp*"? > > Here field definition: > multiValued="false"/> > > positionIncrementGap="100"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > Anything else can be important? Most configuration parameters are > default to Apache Solr 7.1.0. > > In the best we trust > Georgy Nevsky > > > -Original Message- > From: Rick Leir [mailto:rl...@leirtech.com] > Sent: Thursday, November 30, 2017 7:32 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr Wildcard Search > > George, > When you get those results it could be due to stemming. > > Wildcard processing expands your term to multiple terms, OR'd > together. It also takes you down a different analysis pathway, as many > analysis components do not work with multiple terms. Look into the > SolrAdmin console, and use the analysis tab to understand what is > going on. > > If you still have doubts, tell us more about your config. > Cheers --Rick > > > On November 30, 2017 7:06:42 AM EST, Georgy Nevsky > wrote: > >Can somebody help me understand how Solr Wildcard Search is working? > > > >If I’m doing search for “ship*” term I’m getting in result many > >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, > >etc. > > > >But if I’m searching for “shipp*” I don’t get any result. > > > > > > > >In the best we trust > > > >Georgy Nevsky > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com >
RE: Solr Wildcard Search
The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see what's going on. Use the Solr admin Analysis feature to see how your tokens are being modified by each step in the analysis chain. If you use solr admin and debug the query for "shipping", you see that it is stemmed to "ship"...hence all of your matches work. Porter doesn't have rules for words ending in "pp", so it doesn't stem "shipp" to "ship". So, your wildcard query is looking for words that start with "shipp", and given that "shipping" was stemmed to "ship", it won't find it. It would find "shippqrs" because porter wouldn't know what to do with that 😊 Again, Porter can be very dangerous if it doesn't align with user expectations. -Original Message- From: Atita Arora [mailto:atitaar...@gmail.com] Sent: Thursday, November 30, 2017 8:16 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky wrote: > I wish to understand if I can do something to get in result term "shipping" > when search for "shipp*"? > > Here field definition: > multiValued="false"/> > > positionIncrementGap="100"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > Anything else can be important? Most configuration parameters are > default to Apache Solr 7.1.0. > > In the best we trust > Georgy Nevsky > > > -Original Message- > From: Rick Leir [mailto:rl...@leirtech.com] > Sent: Thursday, November 30, 2017 7:32 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr Wildcard Search > > George, > When you get those results it could be due to stemming. > > Wildcard processing expands your term to multiple terms, OR'd > together. It also takes you down a different analysis pathway, as many > analysis components do not work with multiple terms. Look into the > SolrAdmin console, and use the analysis tab to understand what is > going on. > > If you still have doubts, tell us more about your config. > Cheers --Rick > > > On November 30, 2017 7:06:42 AM EST, Georgy Nevsky > wrote: > >Can somebody help me understand how Solr Wildcard Search is working? > > > >If I’m doing search for “ship*” term I’m getting in result many > >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, > >etc. > > > >But if I’m searching for “shipp*” I don’t get any result. > > > > > > > >In the best we trust > > > >Georgy Nevsky > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com >
Re: Solr Wildcard Search
As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now and hence are not found when sent across as phrases. I tried this on my local Solr 7.1 without phrase this works as expected , however as soon as I do phrase search it fails for the reason as i mentioned above. Let me know if I can clarify further. On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky wrote: > I wish to understand if I can do something to get in result term "shipping" > when search for "shipp*"? > > Here field definition: > multiValued="false"/> > > positionIncrementGap="100"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > Anything else can be important? Most configuration parameters are default > to > Apache Solr 7.1.0. > > In the best we trust > Georgy Nevsky > > > -Original Message----- > From: Rick Leir [mailto:rl...@leirtech.com] > Sent: Thursday, November 30, 2017 7:32 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr Wildcard Search > > George, > When you get those results it could be due to stemming. > > Wildcard processing expands your term to multiple terms, OR'd together. It > also takes you down a different analysis pathway, as many analysis > components do not work with multiple terms. Look into the SolrAdmin > console, > and use the analysis tab to understand what is going on. > > If you still have doubts, tell us more about your config. > Cheers --Rick > > > On November 30, 2017 7:06:42 AM EST, Georgy Nevsky > wrote: > >Can somebody help me understand how Solr Wildcard Search is working? > > > >If I’m doing search for “ship*” term I’m getting in result many > >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, > >etc. > > > >But if I’m searching for “shipp*” I don’t get any result. > > > > > > > >In the best we trust > > > >Georgy Nevsky > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com >
RE: Solr Wildcard Search
I wish to understand if I can do something to get in result term "shipping" when search for "shipp*"? Here field definition: Anything else can be important? Most configuration parameters are default to Apache Solr 7.1.0. In the best we trust Georgy Nevsky -Original Message- From: Rick Leir [mailto:rl...@leirtech.com] Sent: Thursday, November 30, 2017 7:32 AM To: solr-user@lucene.apache.org Subject: Re: Solr Wildcard Search George, When you get those results it could be due to stemming. Wildcard processing expands your term to multiple terms, OR'd together. It also takes you down a different analysis pathway, as many analysis components do not work with multiple terms. Look into the SolrAdmin console, and use the analysis tab to understand what is going on. If you still have doubts, tell us more about your config. Cheers --Rick On November 30, 2017 7:06:42 AM EST, Georgy Nevsky wrote: >Can somebody help me understand how Solr Wildcard Search is working? > >If I’m doing search for “ship*” term I’m getting in result many >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, >etc. > >But if I’m searching for “shipp*” I don’t get any result. > > > >In the best we trust > >Georgy Nevsky -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Solr Wildcard Search
George, When you get those results it could be due to stemming. Wildcard processing expands your term to multiple terms, OR'd together. It also takes you down a different analysis pathway, as many analysis components do not work with multiple terms. Look into the SolrAdmin console, and use the analysis tab to understand what is going on. If you still have doubts, tell us more about your config. Cheers --Rick On November 30, 2017 7:06:42 AM EST, Georgy Nevsky wrote: >Can somebody help me understand how Solr Wildcard Search is working? > >If I’m doing search for “ship*” term I’m getting in result many >strings, >like “Shipping Weight”, “Ship From”, “Shipping Calculator”, etc. > >But if I’m searching for “shipp*” I don’t get any result. > > > >In the best we trust > >Georgy Nevsky -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Solr Wildcard Search
Can somebody help me understand how Solr Wildcard Search is working? If I’m doing search for “ship*” term I’m getting in result many strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, etc. But if I’m searching for “shipp*” I don’t get any result. In the best we trust Georgy Nevsky
Re: StrField with Wildcard Search
Hi, I think AutomatonQuery is used. http://opensourceconnections.com/blog/2013/02/21/lucene-4-finite-state-automaton-in-10-minutes-intro-tutorial/ https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/AutomatonQuery.html Ahmet On Thursday, September 8, 2016 3:54 PM, Sandeep Khanzode wrote: Hi, Okay. So it seems that the wildcard searches will perform a (sort-of) dictionary search where they will inspect every (full keyword) token at search time, and do a match instead of a match on pre-created index-time tokens with TextField. However, the wildcard/fuzzy functionality will still be provided no matter the approach... SRK On Thursday, September 8, 2016 5:05 PM, Ahmet Arslan wrote: Hi, EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or starts with search. Lets say, wildcard enumerates the whole inverted index, thus it may get slower for very large databases. With this one no index time manipulation is required. EdgeNGram does its magic at index time, indexes a lot of tokens, all possible prefixes. Index size gets bigger, query time no wildcard operator required in this one. Ahmet On Thursday, September 8, 2016 12:35 PM, Sandeep Khanzode wrote: Hello, There are quite a few links that detail the difference between StrField and TextField. Also links that explain that, even though the field is indexed, it is not tokenized and stored as a single keyword, as can be verified by the debug analysis on Solr admin and CURL debugQuery options. What I am unable to understand is how a wildcard works on StrFields? For example, if the name is "John Doe" and I search for "John*", I get that match. Which means, that somewhere deep within, maybe a Trie or Dictionary representation exists that allows this search with a partial string. I would have assumed that wildcard would match on TextFields which allow (Edge)NGramFilters, etc. -- SRK
Re: StrField with Wildcard Search
Hi, Okay. So it seems that the wildcard searches will perform a (sort-of) dictionary search where they will inspect every (full keyword) token at search time, and do a match instead of a match on pre-created index-time tokens with TextField. However, the wildcard/fuzzy functionality will still be provided no matter the approach... SRK On Thursday, September 8, 2016 5:05 PM, Ahmet Arslan wrote: Hi, EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or starts with search. Lets say, wildcard enumerates the whole inverted index, thus it may get slower for very large databases. With this one no index time manipulation is required. EdgeNGram does its magic at index time, indexes a lot of tokens, all possible prefixes. Index size gets bigger, query time no wildcard operator required in this one. Ahmet On Thursday, September 8, 2016 12:35 PM, Sandeep Khanzode wrote: Hello, There are quite a few links that detail the difference between StrField and TextField. Also links that explain that, even though the field is indexed, it is not tokenized and stored as a single keyword, as can be verified by the debug analysis on Solr admin and CURL debugQuery options. What I am unable to understand is how a wildcard works on StrFields? For example, if the name is "John Doe" and I search for "John*", I get that match. Which means, that somewhere deep within, maybe a Trie or Dictionary representation exists that allows this search with a partial string. I would have assumed that wildcard would match on TextFields which allow (Edge)NGramFilters, etc. -- SRK
Re: StrField with Wildcard Search
Hi, EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or starts with search. Lets say, wildcard enumerates the whole inverted index, thus it may get slower for very large databases. With this one no index time manipulation is required. EdgeNGram does its magic at index time, indexes a lot of tokens, all possible prefixes. Index size gets bigger, query time no wildcard operator required in this one. Ahmet On Thursday, September 8, 2016 12:35 PM, Sandeep Khanzode wrote: Hello, There are quite a few links that detail the difference between StrField and TextField. Also links that explain that, even though the field is indexed, it is not tokenized and stored as a single keyword, as can be verified by the debug analysis on Solr admin and CURL debugQuery options. What I am unable to understand is how a wildcard works on StrFields? For example, if the name is "John Doe" and I search for "John*", I get that match. Which means, that somewhere deep within, maybe a Trie or Dictionary representation exists that allows this search with a partial string. I would have assumed that wildcard would match on TextFields which allow (Edge)NGramFilters, etc. -- SRK
StrField with Wildcard Search
Hello, There are quite a few links that detail the difference between StrField and TextField. Also links that explain that, even though the field is indexed, it is not tokenized and stored as a single keyword, as can be verified by the debug analysis on Solr admin and CURL debugQuery options. What I am unable to understand is how a wildcard works on StrFields? For example, if the name is "John Doe" and I search for "John*", I get that match. Which means, that somewhere deep within, maybe a Trie or Dictionary representation exists that allows this search with a partial string. I would have assumed that wildcard would match on TextFields which allow (Edge)NGramFilters, etc. -- SRK
RE: Wildcard search not working
Hi Ahmet, Hi Upayavira, OK, it seems that I have to dive a bit deeper in the Solr filters and tokenizers. I've just realized that my command there is too limited. Thanks a lot guys so far for help. Cheers and have a nice day, christian -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Freitag, 12. August 2016 07:41 To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) Subject: Re: Wildcard search not working Hi Christian, Please use the following filter before/above the stemmer. Plus, you may want to add : Ahmet On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" wrote: Hi Ahmet, Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you. Let's be a bit more concrete. Following the schema snippet for the corresponding field: ... ... What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches? Many thanks for your time. Cheers, christian -- Christian Ribeaud Software Engineer (External) NIBR / WSJ-310.5.17 Novartis Campus CH-4056 Basel -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Donnerstag, 11. August 2016 16:00 To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) Subject: Re: Wildcard search not working Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the query r?che won't match it. Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" wrote: Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
Re: Wildcard search not working
Hi Christian, Please use the following filter before/above the stemmer. Plus, you may want to add : Ahmet On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" wrote: Hi Ahmet, Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you. Let's be a bit more concrete. Following the schema snippet for the corresponding field: ... ... What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches? Many thanks for your time. Cheers, christian -- Christian Ribeaud Software Engineer (External) NIBR / WSJ-310.5.17 Novartis Campus CH-4056 Basel -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Donnerstag, 11. August 2016 16:00 To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) Subject: Re: Wildcard search not working Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the query r?che won't match it. Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" wrote: Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
Re: Wildcard search not working
You have a stemming filter in your analysis chain. Go to the analysis tab, select the 'text' field, and put "Roche" into both boxes. Click analyse. I bet you you will see Roch, not Roche, because of your stemming filter shown below. That's what Ahmet shrewdly identified above. Upayavira On Thu, 11 Aug 2016, at 08:31 PM, Ribeaud, Christian (Ext) wrote: > Hi Ahmet, > > Many thanks for your reply. I had a look at the URL you pointed out but, > honestly, I have to admit that I did not fully understand you. > Let's be a bit more concrete. Following the schema snippet for the > corresponding field: > > ... > required="false" multiValued="false" /> > > > positionIncrementGap="100"> > > > > words="lang/stopwords_de.txt" format="snowball" /> > > > > > > > ... > > What is wrong with this schema? Respectively, what should I change to be > able to correctly do wildcard searches? > > Many thanks for your time. Cheers, > > christian > -- > Christian Ribeaud > Software Engineer (External) > NIBR / WSJ-310.5.17 > Novartis Campus > CH-4056 Basel > > > -----Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: Donnerstag, 11. August 2016 16:00 > To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) > Subject: Re: Wildcard search not working > > Hi Chiristian, > > The query r?che may not return at least the same number of matches as > roche depending on your analysis chain. > The difference is roche is analyzed but r?che don't. Wildcard queries are > executed on the indexed/analyzed terms. > For example, if roche is indexed/analyzed as roch, the query r?che won't > match it. > > Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis > > Ahmet > > > > On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" > wrote: > Hi, > > What would be the reasons making the wildcard search for Lucene Query > Parser NOT working? > > We are using Solr 5.4.1 and, using the admin console, I am triggering for > instance searches with term 'roche' in a specific core. Everything fine, > I am getting for instance two matches. I would expect at least the same > number of matches with term 'r?che'. However, this does NOT happen. I am > getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not > work neither but 'roch*' works. > > Switching debug mode brings following output: > > "debug": { > "rawquerystring": "roch?", > "querystring": "roch?", > "parsedquery": "text:roch?", > "parsedquery_toString": "text:roch?", > "explain": {}, > "QParser": "LuceneQParser", > ... > > Any idea? Thanks and cheers, > > christian
RE: Wildcard search not working
Hi Ahmet, Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you. Let's be a bit more concrete. Following the schema snippet for the corresponding field: ... ... What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches? Many thanks for your time. Cheers, christian -- Christian Ribeaud Software Engineer (External) NIBR / WSJ-310.5.17 Novartis Campus CH-4056 Basel -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Donnerstag, 11. August 2016 16:00 To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) Subject: Re: Wildcard search not working Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the query r?che won't match it. Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" wrote: Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
Re: Wildcard search not working
Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the query r?che won't match it. Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" wrote: Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
Wildcard search not working
Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
RE: wildcard search for string having spaces
Great. First option worked for me. I was trying with q=abc\sp*... it should be q=abc\ p* Thanks -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Wednesday, June 15, 2016 6:25 PM To: solr-user@lucene.apache.org; Roshan Kamble Subject: Re: wildcard search for string having spaces Hi Roshan, I think there are two options: 1) escape the space q=abc\ p* 2) use prefix query parser q={!prefix f=my_string}abc p Ahmet On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble wrote: Hello, I have below custom field type defined for solr 6.0.0 I am using above field to ensure that entire string is considered as single token and search should be case insensitive. It works for most of the scnearios with wildcard search. e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* gives this three results. But I am not able to search with say abc p* Search with query q="abc pqr" gives exact match and desired result. I want to do wildcard search where criteria can include spaces like above example i.e. if space is present then I am not able to to wildcard search. Is there any way by which wildcard search will be achieved even if space is present in token. Regards, Roshan The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
Re: wildcard search for string having spaces
Hi Roshan, I think there are two options: 1) escape the space q=abc\ p* 2) use prefix query parser q={!prefix f=my_string}abc p Ahmet On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble wrote: Hello, I have below custom field type defined for solr 6.0.0 I am using above field to ensure that entire string is considered as single token and search should be case insensitive. It works for most of the scnearios with wildcard search. e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* gives this three results. But I am not able to search with say abc p* Search with query q="abc pqr" gives exact match and desired result. I want to do wildcard search where criteria can include spaces like above example i.e. if space is present then I am not able to to wildcard search. Is there any way by which wildcard search will be achieved even if space is present in token. Regards, Roshan The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
wildcard search for string having spaces
Hello, I have below custom field type defined for solr 6.0.0 I am using above field to ensure that entire string is considered as single token and search should be case insensitive. It works for most of the scnearios with wildcard search. e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* gives this three results. But I am not able to search with say abc p* Search with query q="abc pqr" gives exact match and desired result. I want to do wildcard search where criteria can include spaces like above example i.e. if space is present then I am not able to to wildcard search. Is there any way by which wildcard search will be achieved even if space is present in token. Regards, Roshan The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
Re: Solr Wildcard Search for large amount of text
What do you want actual user queries to look like? I mean, having to explicitly write asterisks after every term is a real pain. Indexing ngrams has the advantage that phrase queries and edismax phrase boosting work automatically. Phrases don't work with explicit wildcard queries. The only real downside to ngrams is that they explode the size of the index. But memory is supposed to be cheap these days. I mean, compare the cost of the extra RAM (to keep the full index in memory) to the cost to users of tehir productivity constructing queries and having expensive staff to help them figure out why various queries don't work as expected. How big is your corpus - number of documents and average document size? -- Jack Krupansky On Sat, Jun 27, 2015 at 6:27 AM, octopus wrote: > Hi, I'm looking at Solr's features for wildcard search used for a large > amount of text. I read on the net that solr.EdgeNGramFilterFactory is used > to generate tokens for wildcard searching. > > For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", > "nigeria", "nigerian" > > However, I have a large amount of text out there which requires wildcard > search and it's not viable to use EdgeNGrameFilterFactory as the amount of > processing will be too huge. Do you have any suggestions/advice please? > > Thank you so much for your time! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Solr Wildcard Search for large amount of text
Try it and see ;). My experience is that wildcards work fine, although what "fine" is up to you to decide _if_ you restrict it to requiring at least two leading "real" characters, and I actually prefer three. I.e. ab* or abc*. Note that if you require leading wildcards, use the reverse wildcard filter. I will vociferously argue that single-letter wildcards are not useful anyway. I mean every single document in your corpus will probably match every single-letter wildcard (a*, b*, whatever), providing no benefit to the user. And, the need for wildcards can often be reduced or eliminated if you use can autosuggest or autocomplete. Of course if you're trying to satisfy more complex use cases where the user is composing their own complex clauses that may not apply. FWIW, Erick On Sat, Jun 27, 2015 at 10:06 AM, Shawn Heisey wrote: > On 6/27/2015 4:27 AM, octopus wrote: >> Hi, I'm looking at Solr's features for wildcard search used for a large >> amount of text. I read on the net that solr.EdgeNGramFilterFactory is used >> to generate tokens for wildcard searching. >> >> For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", >> "nigeria", "nigerian" >> >> However, I have a large amount of text out there which requires wildcard >> search and it's not viable to use EdgeNGrameFilterFactory as the amount of >> processing will be too huge. Do you have any suggestions/advice please? > > Both edgengrams and wildcards are ways to do this. There are advantages > and disadvantages to both ways. > > To do a wildcard search, Solr (Lucene really) must look up all the > matching terms in the index and substitute them into the query so that > it becomes a large number of simple string matches. If you have a large > number of terms in your index, that can be slow. The expensive work > (expanding the terms) is done for every single query. > > The edgengram filter does similar work, but it does it at *index* time, > rather than query time. At query time, you are doing a simple string > match with one term, although the index contains many more terms, > because the very expensive work was done at index time. > > It's difficult to know which approach will be more efficient on *your* > index without experimentation, but there is a general rule when it comes > to Solr performance: As much as possible, do the expensive work at index > time. > > Thanks, > Shawn >
Re: Solr Wildcard Search for large amount of text
On 6/27/2015 4:27 AM, octopus wrote: > Hi, I'm looking at Solr's features for wildcard search used for a large > amount of text. I read on the net that solr.EdgeNGramFilterFactory is used > to generate tokens for wildcard searching. > > For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", > "nigeria", "nigerian" > > However, I have a large amount of text out there which requires wildcard > search and it's not viable to use EdgeNGrameFilterFactory as the amount of > processing will be too huge. Do you have any suggestions/advice please? Both edgengrams and wildcards are ways to do this. There are advantages and disadvantages to both ways. To do a wildcard search, Solr (Lucene really) must look up all the matching terms in the index and substitute them into the query so that it becomes a large number of simple string matches. If you have a large number of terms in your index, that can be slow. The expensive work (expanding the terms) is done for every single query. The edgengram filter does similar work, but it does it at *index* time, rather than query time. At query time, you are doing a simple string match with one term, although the index contains many more terms, because the very expensive work was done at index time. It's difficult to know which approach will be more efficient on *your* index without experimentation, but there is a general rule when it comes to Solr performance: As much as possible, do the expensive work at index time. Thanks, Shawn
Re: Solr Wildcard Search for large amount of text
That is one way to implement wildcarda, but isnt the most efficient. Just index normally, tokenized, and search with an asterisk suffix, e.g. foo* This will build a finite state transformer that will make wildcard handling efficient. Upayavira On, Jun 27, 2015, at 11:27 AM, pus wrote: > Hi, I'm looking at Solr's features for wildcard search used for a large > amount of text. I read on the net that solr.EdgeNGramFilterFactory is > used > to generate tokens for wildcard searching. > > For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", > "nigeria", "nigerian" > > However, I have a large amount of text out there which requires wildcard > search and it's not viable to use EdgeNGrameFilterFactory as the amount > of > processing will be too huge. Do you have any suggestions/advice please? > > Thank you so much for your time! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html > Sent from the Solr - User mailing list archive at Nabble.com.
Solr Wildcard Search for large amount of text
Hi, I'm looking at Solr's features for wildcard search used for a large amount of text. I read on the net that solr.EdgeNGramFilterFactory is used to generate tokens for wildcard searching. For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", "nigeria", "nigerian" However, I have a large amount of text out there which requires wildcard search and it's not viable to use EdgeNGrameFilterFactory as the amount of processing will be too huge. Do you have any suggestions/advice please? Thank you so much for your time! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: rq breaks wildcard search?
Awesome thanks! I was on 4.10.2 Ryan > On Apr 22, 2015, at 16:44, Joel Bernstein wrote: > > For your own implementation you'll need to implement the following methods: > > public Query rewrite(IndexReader reader) throws IOException > public void extractTerms(Set terms) > > You can review the 4.10.3 version of the ReRankQParserPlugin to see how it > implements these methods. > > Joel Bernstein > http://joelsolr.blogspot.com/ > >> On Wed, Apr 22, 2015 at 7:33 PM, Joel Bernstein wrote: >> >> Just confirmed that wildcard queries work with Re-Ranking following >> SOLR-6323. >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein >> wrote: >> >>> This should be resolved in >>> https://issues.apache.org/jira/browse/SOLR-6323. >>> >>> Solr 4.10.3 >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ >>> On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal wrote: Using edismax, supplying a rq= param, like {!rerank ...} is causing an UnsupportedOperationException because the Query doesn't implement createWeight. This is for WildcardQuery in particular. From some preliminary debugging it looks like without rq, somehow the qf Queries might turn into ConstantScore instead of WildcardQuery. I don't think this is related to the RankQuery implementation as my own subclass has the same issue. Anyway the effect is that all q's containing ? or * return http 500 because I always have rq on. Can anyone confirm if this is a bug? I will log it in Jira if so. Also, does anyone know how I can work around it? Specifically, can I disable edismax from making WildcardQueries? Ryan >>
Re: rq breaks wildcard search?
For your own implementation you'll need to implement the following methods: public Query rewrite(IndexReader reader) throws IOException public void extractTerms(Set terms) You can review the 4.10.3 version of the ReRankQParserPlugin to see how it implements these methods. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 22, 2015 at 7:33 PM, Joel Bernstein wrote: > Just confirmed that wildcard queries work with Re-Ranking following > SOLR-6323. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein > wrote: > >> This should be resolved in >> https://issues.apache.org/jira/browse/SOLR-6323. >> >> Solr 4.10.3 >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal wrote: >> >>> Using edismax, supplying a rq= param, like {!rerank ...} is causing an >>> UnsupportedOperationException because the Query doesn't implement >>> createWeight. This is for WildcardQuery in particular. From some >>> preliminary debugging it looks like without rq, somehow the qf Queries >>> might turn into ConstantScore instead of WildcardQuery. I don't think >>> this >>> is related to the RankQuery implementation as my own subclass has the >>> same >>> issue. Anyway the effect is that all q's containing ? or * return http >>> 500 >>> because I always have rq on. Can anyone confirm if this is a bug? I >>> will >>> log it in Jira if so. >>> >>> Also, does anyone know how I can work around it? Specifically, can I >>> disable edismax from making WildcardQueries? >>> >>> Ryan >>> >> >> >
Re: rq breaks wildcard search?
Just confirmed that wildcard queries work with Re-Ranking following SOLR-6323. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein wrote: > This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323 > . > > Solr 4.10.3 > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal wrote: > >> Using edismax, supplying a rq= param, like {!rerank ...} is causing an >> UnsupportedOperationException because the Query doesn't implement >> createWeight. This is for WildcardQuery in particular. From some >> preliminary debugging it looks like without rq, somehow the qf Queries >> might turn into ConstantScore instead of WildcardQuery. I don't think >> this >> is related to the RankQuery implementation as my own subclass has the same >> issue. Anyway the effect is that all q's containing ? or * return http >> 500 >> because I always have rq on. Can anyone confirm if this is a bug? I will >> log it in Jira if so. >> >> Also, does anyone know how I can work around it? Specifically, can I >> disable edismax from making WildcardQueries? >> >> Ryan >> > >
Re: rq breaks wildcard search?
This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323. Solr 4.10.3 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal wrote: > Using edismax, supplying a rq= param, like {!rerank ...} is causing an > UnsupportedOperationException because the Query doesn't implement > createWeight. This is for WildcardQuery in particular. From some > preliminary debugging it looks like without rq, somehow the qf Queries > might turn into ConstantScore instead of WildcardQuery. I don't think this > is related to the RankQuery implementation as my own subclass has the same > issue. Anyway the effect is that all q's containing ? or * return http 500 > because I always have rq on. Can anyone confirm if this is a bug? I will > log it in Jira if so. > > Also, does anyone know how I can work around it? Specifically, can I > disable edismax from making WildcardQueries? > > Ryan >
rq breaks wildcard search?
Using edismax, supplying a rq= param, like {!rerank ...} is causing an UnsupportedOperationException because the Query doesn't implement createWeight. This is for WildcardQuery in particular. From some preliminary debugging it looks like without rq, somehow the qf Queries might turn into ConstantScore instead of WildcardQuery. I don't think this is related to the RankQuery implementation as my own subclass has the same issue. Anyway the effect is that all q's containing ? or * return http 500 because I always have rq on. Can anyone confirm if this is a bug? I will log it in Jira if so. Also, does anyone know how I can work around it? Specifically, can I disable edismax from making WildcardQueries? Ryan
Re: Weird Problem (possible bug?) with german stemming and wildcard search
Thank you very much, this information is worht it's weight in gold. So far, we've used the asterisk method because it seemed logical and straight-forward. We will slowly migrate to a version using EdgeNGramFilterFactory. Thanks a bunch. Am 07.10.2014 14:42 schrieb Alexandre Rafalovitch: On 7 October 2014 08:25, Thomas Michael Engelke wrote: So the culprit is the asterisk at the end. As far as we can read from the docs, an asterisk is just 0 or more characters, which means that the literal word in front of the asterisk should match the query. Not quite: http://wiki.apache.org/solr/MultitermQueryAnalysis [1] It's actually quite complicated and even depends on exact version of Solr you are using. In fact, out of all the analyzers you showed above, I think only LowerCase will be present on the chain. Look for (multi) marker at: http://www.solr-start.com/info/analyzers/ [2] for more details. On a higher level, I would suggest getting away from *-based expansion and looking at EdgeNGrams instead. You can see an example of autocomplete at http://www.solr-start.com/javadoc/solr-lucene/index.html [3] and the matching configuration at: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 [4] Or a dedicated Suggester module, though information on that is a bit harder to find. Regards, Alex. Personal: http://www.outerthoughts.com/ [5] and @arafalov Solr resources and newsletter: http://www.solr-start.com/ [6] and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 [7] Links: -- [1] http://wiki.apache.org/solr/MultitermQueryAnalysis [2] http://www.solr-start.com/info/analyzers/ [3] http://www.solr-start.com/javadoc/solr-lucene/index.html [4] https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 [5] http://www.outerthoughts.com/ [6] http://www.solr-start.com/ [7] https://www.linkedin.com/groups?gid=6713853
Re: Weird Problem (possible bug?) with german stemming and wildcard search
Hi - you should not use wild cards for autocompletion, Lucene has far better tools for making very good autocompletion, also, since a wild card is a multi term query, they are not passed through your configured query time analyzer. Some other comments: - you use a porter stemmer but you should use one of the German specific stem filters. - you don't have an index time tokenizer defined, this should not be possible and behaviour is undefined as far as i know. On Tuesday 07 October 2014 14:25:27 Thomas Michael Engelke wrote: > I have a problem with a stemmed german field. The field definition: > > stored="true" required="false" multiValued="false"/> > ... > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > When we search for a word from an autosuggest kind of component, we > always add an asterisk to a word, so when somebody enters something like > "Radbremszylinder" and waits for some milliseconds, the autosuggest list > is filled with the results of searching for "Radbremszylinder*". This > seemed to work quite well. Today we got a bug report from a customer for > that exact word. > > So I made an analysis for the word as "Field value (index)" and "Field > value (query)", and it looked like this: > > ST RadbremszylinderWT Radbremszylinder* > SF RadbremszylinderSF Radbremszylinder* > WDF RadbremszylinderSF Radbremszylinder* > LCF radbremszylinderWDF Radbremszylinder > SKMF radbremszylinderLCF radbremszylinder > PSF radbremszylind SKMF radbremszylinder > > As you can see, the end result looks very much alike. However, records > containing that word in their "description" field aren't reported as > results. Strangely enough, records containing "Radbremszylindern" > (plural) are reported as results. Removing the asterisk from the end > reports all records with "Radbremszylinder", just as we would expect. So > the culprit is the asterisk at the end. As far as we can read from the > docs, an asterisk is just 0 or more characters, which means that the > literal word in front of the asterisk should match the query. > > Searching further we tried some variations, and it seems that searching > for "Radbremszylind*" works. All records with any variation > ("Radbremszylinder", "Radbremszylindern") are reported. So maybe there's > a weird interaction with stemming? > > Any ideas?
Re: Weird Problem (possible bug?) with german stemming and wildcard search
On 7 October 2014 08:25, Thomas Michael Engelke wrote: > So the culprit is the asterisk at the end. As far as we can read from the > docs, an asterisk is just 0 or more characters, which means that the literal > word in front of the asterisk should match the query. Not quite: http://wiki.apache.org/solr/MultitermQueryAnalysis It's actually quite complicated and even depends on exact version of Solr you are using. In fact, out of all the analyzers you showed above, I think only LowerCase will be present on the chain. Look for (multi) marker at: http://www.solr-start.com/info/analyzers/ for more details. On a higher level, I would suggest getting away from *-based expansion and looking at EdgeNGrams instead. You can see an example of autocomplete at http://www.solr-start.com/javadoc/solr-lucene/index.html and the matching configuration at: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 Or a dedicated Suggester module, though information on that is a bit harder to find. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Weird Problem (possible bug?) with german stemming and wildcard search
I have a problem with a stemmed german field. The field definition: stored="true" required="false" multiValued="false"/> ... positionIncrementGap="100" autoGeneratePhraseQueries="true"> words="stopwords.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> protected="protwords.txt"/> ignoreCase="true" expand="true"/> words="stopwords.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> protected="protwords.txt"/> When we search for a word from an autosuggest kind of component, we always add an asterisk to a word, so when somebody enters something like "Radbremszylinder" and waits for some milliseconds, the autosuggest list is filled with the results of searching for "Radbremszylinder*". This seemed to work quite well. Today we got a bug report from a customer for that exact word. So I made an analysis for the word as "Field value (index)" and "Field value (query)", and it looked like this: ST RadbremszylinderWT Radbremszylinder* SF RadbremszylinderSF Radbremszylinder* WDF RadbremszylinderSF Radbremszylinder* LCF radbremszylinderWDF Radbremszylinder SKMF radbremszylinderLCF radbremszylinder PSF radbremszylind SKMF radbremszylinder As you can see, the end result looks very much alike. However, records containing that word in their "description" field aren't reported as results. Strangely enough, records containing "Radbremszylindern" (plural) are reported as results. Removing the asterisk from the end reports all records with "Radbremszylinder", just as we would expect. So the culprit is the asterisk at the end. As far as we can read from the docs, an asterisk is just 0 or more characters, which means that the literal word in front of the asterisk should match the query. Searching further we tried some variations, and it seems that searching for "Radbremszylind*" works. All records with any variation ("Radbremszylinder", "Radbremszylindern") are reported. So maybe there's a weird interaction with stemming? Any ideas?
Re: Wildcard search makes no sense!!
Ok I think I understand your points there. Just clarify say if the term was "Large increased" and my filters went something like: Large|increased Large|increase|increased large|increase|increased the final tokens indexed would be large|increase|increased ? Once again thanks for all the help. On Thu, Oct 2, 2014 at 2:30 PM, Shawn Heisey-2 [via Lucene] < ml-node+s472066n4162306...@n3.nabble.com> wrote: > On 10/2/2014 4:33 AM, waynemailinglist wrote: > > > Something that is still not clear in my mind is how this tokenising > works. > > For example with the filters I have when I run the analyser I get: > > Field: Hello You > > > > Hello|You > > Hello|You > > Hello|You > > hello|you > > hello|you > > > > > > Does this mean that the index is stored as 'hello|you' (the final one) > and > > that when I run a query and it goes through the filters whatever the end > > result of that is must match the 'hello|you' in order to return a > result? > > The index has two terms for this field if this is the whole input -- > hello and you -- which can be searched for individually. The tokenizer > does the initial job of separating the input into tokens (terms) ... > some filters can create additional terms, depending on exactly what's > left when the tokenizer is done. > > Thanks, > Shawn > > > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162306.html > To unsubscribe from Wildcard search makes no sense!!, click here > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4162069&code=d2F5bmVtYWlsaW5nbGlzdHNAZ21haWwuY29tfDQxNjIwNjl8LTIxOTMxNzkyNQ==> > . > NAML > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
right, prior to 3.6, the standard way to handle wildcards was to, essentially, pre-analyze the terms that had wildcards. This works fine for simple filters, things like lowercasing for instance, but doesn't work so well for things like stemming. So you're doing what can be done at this point, but moving to 4.x (or even 3.6) would solve it better. Best, Erick On Thu, Oct 2, 2014 at 6:29 AM, Shawn Heisey wrote: > On 10/2/2014 4:33 AM, waynemailinglist wrote: >> Something that is still not clear in my mind is how this tokenising works. >> For example with the filters I have when I run the analyser I get: >> Field: Hello You >> >> Hello|You >> Hello|You >> Hello|You >> hello|you >> hello|you >> >> >> Does this mean that the index is stored as 'hello|you' (the final one) and >> that when I run a query and it goes through the filters whatever the end >> result of that is must match the 'hello|you' in order to return a result? > > The index has two terms for this field if this is the whole input -- > hello and you -- which can be searched for individually. The tokenizer > does the initial job of separating the input into tokens (terms) ... > some filters can create additional terms, depending on exactly what's > left when the tokenizer is done. > > Thanks, > Shawn >
Re: Wildcard search makes no sense!!
On 10/2/2014 4:33 AM, waynemailinglist wrote: > Something that is still not clear in my mind is how this tokenising works. > For example with the filters I have when I run the analyser I get: > Field: Hello You > > Hello|You > Hello|You > Hello|You > hello|you > hello|you > > > Does this mean that the index is stored as 'hello|you' (the final one) and > that when I run a query and it goes through the filters whatever the end > result of that is must match the 'hello|you' in order to return a result? The index has two terms for this field if this is the whole input -- hello and you -- which can be searched for individually. The tokenizer does the initial job of separating the input into tokens (terms) ... some filters can create additional terms, depending on exactly what's left when the tokenizer is done. Thanks, Shawn
Re: Wildcard search makes no sense!!
Many many thanks for the replies - it was helpful for me to start understanding how this works. I'm using 3.5 so this goes to explain a lot. What I have done is if the query contains a * I make the query lowercase before sending to solr. This seems to have solved this issue given your explanation above. Many thanks Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162284.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
Two things: 1> what version of Solr are you using? If it's prior to 3.6, then the bits that handle applying lowercaseFilter to wildcards isn't in the code. 2> what do you see if you add &debug=query? I just tried it with your analysis chain and it seemed to work. Did you completely blow your index away when trying this? I did get into a state where my terms didn't show up. When you change the schema, sometimes some information about the fields is written into the index and is incompatible with later changes. By "completely blow away" I mean stop Solr rm -rf blah/collection/data start Solr reindex test Best, Erick On Wed, Oct 1, 2014 at 10:10 AM, waynemailinglist wrote: > I'm still stuck on this actually. I would really appreciate any pointers. > If I search for : > query 1: Κώστας > result: Κώστας > > query 2: Κώστα* > result: > > I've looked at the analyser but I don't really understand what I'm looking > at if I'm honest. It gives the output: > Field (name): title > Field value: Κώστας > Field value (query): Κώστα* > > Index Analyzer > Κώστας > Κώστας > Κώστας > κώστας > κώστας > Query Analyzer > Κώστα* > Κώστα* > Κώστα* > Κώστα > κώστα > κώστα > > > In my schema I have defined > > ignoreCase="true" expand="true"/> (only used in query) > words="stopwords.txt"/> > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0"/> > > > > > I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference > after reindexing. > > Any ideas? > > many thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
If you use "*" you use Multiterm analysis path, which is semi-hidden and is a lot more limited to the things done with normal tokens: https://wiki.apache.org/solr/MultitermQueryAnalysis The Analyzer components that are NOT multiterm aware cannot be used that way. Looking at: http://www.solr-start.com/info/analyzers/ , you can see that only LowerCase analyzer is multiterm aware (with (multi) in the brackets). So, the rest are not used. You may switch to EdgeNGrams or similar instead. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 1 October 2014 13:10, waynemailinglist wrote: > I'm still stuck on this actually. I would really appreciate any pointers. > If I search for : > query 1: Κώστας > result: Κώστας > > query 2: Κώστα* > result: > > I've looked at the analyser but I don't really understand what I'm looking > at if I'm honest. It gives the output: > Field (name): title > Field value: Κώστας > Field value (query): Κώστα* > > Index Analyzer > Κώστας > Κώστας > Κώστας > κώστας > κώστας > Query Analyzer > Κώστα* > Κώστα* > Κώστα* > Κώστα > κώστα > κώστα > > > In my schema I have defined > > ignoreCase="true" expand="true"/> (only used in query) > words="stopwords.txt"/> > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0"/> > > > > > I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference > after reindexing. > > Any ideas? > > many thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
I'm still stuck on this actually. I would really appreciate any pointers. If I search for : query 1: Κώστας result: Κώστας query 2: Κώστα* result: I've looked at the analyser but I don't really understand what I'm looking at if I'm honest. It gives the output: Field (name): title Field value: Κώστας Field value (query): Κώστα* Index Analyzer Κώστας Κώστας Κώστας κώστας κώστας Query Analyzer Κώστα* Κώστα* Κώστα* Κώστα κώστα κώστα In my schema I have defined (only used in query) I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference after reindexing. Any ideas? many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
Ahmet - many thanks - I removed the EnglishPorterFilterFactory and reindexed and this seems to behave as expected now. Jack - thanks aswell - I'm very much a noob with this, and thats a great tip. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
The presence of a wildcard in a query term short circuits some portions of the analysis process. Some token filters like lower case can still be performed on the query terms, but others, like stemming, cannot. So, either simplify the analysis (be more selective of what token filters you use), or you will have to modify your query terms so that you manually simulate the token transformations that your text analysis is performing. Take one of your indexed terms that you think should match and send it through the Solr Admin UI analysis page for the query field and see what the source token gets analyzed into - that's what your wildcard prefix must match. Sometimes (usually!) you will be surprised. -- Jack Krupansky -Original Message- From: Wayne W Sent: Wednesday, October 1, 2014 7:16 AM To: solr-user@lucene.apache.org Subject: Wildcard search makes no sense!! Hi, I don't understand this at all. We are indexing some contact names. When we do a standard query: query 1: capi* result: Capital Health query 2: capit* result: Capital Health query 3: capita* result: query 4: capital* result: I understand (as we are using solar 3.5) that the wildcard search does not actually return the query without the wildcard so I understand at least why query 4 is not working ( I need to use: capital* OR capital ). What I don't understand is why query 3 is not working. Also if we place in the text field the following 3 contacts: j...@capitalhealth.com f...@capitalhealth.com Capital Heath When searching for: query A: capita* result: j...@capitalhealth.com, f...@capitalhealth.com query B: capit* result: j...@capitalhealth.com, f...@capitalhealth.com, Capital Heath What is going on and how can I solve this? many thanks as I'm really stuck on this
Re: Wildcard search makes no sense!!
On Wed, 2014-10-01 at 13:16 +0200, Wayne W wrote: > query 2: capit* > result: Capital Health > > query 3: capita* > result: You are likely using a stemmer for the field: "Capital Health" gets indexed as "capit" and "health", so there are no tokens starting with "capita". Turn off the stemmer or add a non-stemmed copy-field for trunkated searches. (sanity-checked at http://9ol.es/porter_js_demo.html) - Toke Eskildsen, State and University Library, Denmark
Re: Wildcard search makes no sense!!
Hi, Probably you have stemmer and it is eating up Capital to capit. Thats the reason. Either remove stemmer from analyser chain or add keyword repeat filter. Ahmet On Wednesday, October 1, 2014 2:16 PM, Wayne W wrote: Hi, I don't understand this at all. We are indexing some contact names. When we do a standard query: query 1: capi* result: Capital Health query 2: capit* result: Capital Health query 3: capita* result: query 4: capital* result: I understand (as we are using solar 3.5) that the wildcard search does not actually return the query without the wildcard so I understand at least why query 4 is not working ( I need to use: capital* OR capital ). What I don't understand is why query 3 is not working. Also if we place in the text field the following 3 contacts: j...@capitalhealth.com f...@capitalhealth.com Capital Heath When searching for: query A: capita* result: j...@capitalhealth.com, f...@capitalhealth.com query B: capit* result: j...@capitalhealth.com, f...@capitalhealth.com, Capital Heath What is going on and how can I solve this? many thanks as I'm really stuck on this
Wildcard search makes no sense!!
Hi, I don't understand this at all. We are indexing some contact names. When we do a standard query: query 1: capi* result: Capital Health query 2: capit* result: Capital Health query 3: capita* result: query 4: capital* result: I understand (as we are using solar 3.5) that the wildcard search does not actually return the query without the wildcard so I understand at least why query 4 is not working ( I need to use: capital* OR capital ). What I don't understand is why query 3 is not working. Also if we place in the text field the following 3 contacts: j...@capitalhealth.com f...@capitalhealth.com Capital Heath When searching for: query A: capita* result: j...@capitalhealth.com, f...@capitalhealth.com query B: capit* result: j...@capitalhealth.com, f...@capitalhealth.com, Capital Heath What is going on and how can I solve this? many thanks as I'm really stuck on this
Re: Stemming not working with wildcard search
Did you re-index? And what do you get when adding &debug=query? That should show you the parsed query. Have you looked at the results of the admin/analysis page? That tool is invaluable for seeing what the actual transformations are. Best, Erick On Mon, Apr 28, 2014 at 11:41 AM, Geepalem wrote: > Hi Ahmet, > > Thanks for your prompt response! > > I have added filters which you have specified but still its not working. > Below is field Query Analyzer > > > > > > > > > > > http://localhost:8080/solr/master/select?q=page_title_t:*products* > http://localhost:8080/solr/master/select?q=page_title_t:*product* > > > Please let me know if I am doing anything wrong. > > Thanks, > G. Naresh Kumar > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133556.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working with search term having special characters and digits
Can someone help me out with this issue please? -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133770.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming not working with wildcard search
Can someone help me out with this issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133769.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming not working with wildcard search
Hi Ahmet, Thanks for your prompt response! I have added filters which you have specified but still its not working. Below is field Query Analyzer http://localhost:8080/solr/master/select?q=page_title_t:*products* http://localhost:8080/solr/master/select?q=page_title_t:*product* Please let me know if I am doing anything wrong. Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133556.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working with search term having special characters and digits
Thanks jack for prompt response! So is there any solution to make this scenario works? Or wildcard doesn't work with special characters and numerics? Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133554.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming not working with wildcard search
Hi Naresh, quotes are only meaningful when there are two or more terms. don't use quotes for products* and product*. As regarding stemming and wildcards, use following chain, and your wildcard searches will be happier. Ahmet On Monday, April 28, 2014 5:41 PM, Jack Krupansky wrote: Wildcards and stemming are incompatible at query time - you need to manually stem the term before applying your wildcard. Wildcards are not supported in quoted phrases. They will be treated as punctuation, and ignored by the standard tokenizer or the word delimiter filter. -- Jack Krupansky -Original Message- From: Geepalem Sent: Sunday, April 27, 2014 3:13 PM To: solr-user@lucene.apache.org Subject: Stemming not working with wildcard search Hi, I have added SnowballPorterFilterFactory filter to field type to make singular and plural search terms return same results. So below queries (double quotes around search term) returning similar results which is fine. http://localhost:8080/solr/master/select?q=page_title_t:"product*"; http://localhost:8080/solr/master/select?q=page_title_t:"products*"; But when I have analyzed results, in both result sets, documents which dont start with words "Product" or "products" didnt come though there are few documents available. So I have added * as prefix and suffix to search term without double quotes to do wildcard search. http://localhost:8080/solr/master/select?q=page_title_t:*product* http://localhost:8080/solr/master/select?q=page_title_t:*products* Now, stemming is not working as above second query is not returning similar results as query 1. If double quotes are added around search term then its returning similar results but results are not as expected. With double quotes it wont return results like "Old products", "New products", "Cool Product". It will only return results with the values like "Product 1", "Product 2","Products of USA". Please suggest or guide how to make stemming work with wildcard search. Appreciate immediate response!! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming not working with wildcard search
Wildcards and stemming are incompatible at query time - you need to manually stem the term before applying your wildcard. Wildcards are not supported in quoted phrases. They will be treated as punctuation, and ignored by the standard tokenizer or the word delimiter filter. -- Jack Krupansky -Original Message- From: Geepalem Sent: Sunday, April 27, 2014 3:13 PM To: solr-user@lucene.apache.org Subject: Stemming not working with wildcard search Hi, I have added SnowballPorterFilterFactory filter to field type to make singular and plural search terms return same results. So below queries (double quotes around search term) returning similar results which is fine. http://localhost:8080/solr/master/select?q=page_title_t:"product*"; http://localhost:8080/solr/master/select?q=page_title_t:"products*"; But when I have analyzed results, in both result sets, documents which dont start with words "Product" or "products" didnt come though there are few documents available. So I have added * as prefix and suffix to search term without double quotes to do wildcard search. http://localhost:8080/solr/master/select?q=page_title_t:*product* http://localhost:8080/solr/master/select?q=page_title_t:*products* Now, stemming is not working as above second query is not returning similar results as query 1. If double quotes are added around search term then its returning similar results but results are not as expected. With double quotes it wont return results like "Old products", "New products", "Cool Product". It will only return results with the values like "Product 1", "Product 2","Products of USA". Please suggest or guide how to make stemming work with wildcard search. Appreciate immediate response!! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working with search term having special characters and digits
Wildcard query only works for single terms. Any embedded special characters will cause a term to be split into multiple terms at index time. The use of a wildcard in a query term with embedded special characters will bypass normal analysis - you need to enter the term exactly as it would be analyzed at index time for wildcard to work. Ditto is your filed type uses the word delimiter filter with the split digits option enabled - the alpha and numeric portions will generate separate terms - and cause a wildcard to fail. -- Jack Krupansky -Original Message- From: Geepalem Sent: Sunday, April 27, 2014 3:30 PM To: solr-user@lucene.apache.org Subject: Wildcard search not working with search term having special characters and digits Hi, Below query without wildcard search is returning results. http://localhost:8080/solr/master/select?q=page_title_t:"an-138"; But below query with wildcard is not returning results http://localhost:8080/solr/master/select?q=page_title_t:"an-13*"; Below query with wildcard search and no didgits is returning results. http://localhost:8080/solr/master/select?q=page_title_t:"an-*"; I have tried by adding WordDelimeter Filter but there is no luck. Please suggest or guide how to make wildcard search works with special characters and digits. Appreciate immediate response!! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working with search term having special characters and digits
Can some one please help me with this as I am struck with this issue.. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming not working with wildcard search
Can some one please help me with this as I am struck with this issue.. -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133477.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcard search not working with search term having special characters and digits
Hi, Below query without wildcard search is returning results. http://localhost:8080/solr/master/select?q=page_title_t:"an-138"; But below query with wildcard is not returning results http://localhost:8080/solr/master/select?q=page_title_t:"an-13*"; Below query with wildcard search and no didgits is returning results. http://localhost:8080/solr/master/select?q=page_title_t:"an-*"; I have tried by adding WordDelimeter Filter but there is no luck. Please suggest or guide how to make wildcard search works with special characters and digits. Appreciate immediate response!! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385.html Sent from the Solr - User mailing list archive at Nabble.com.
Stemming not working with wildcard search
Hi, I have added SnowballPorterFilterFactory filter to field type to make singular and plural search terms return same results. So below queries (double quotes around search term) returning similar results which is fine. http://localhost:8080/solr/master/select?q=page_title_t:"product*"; http://localhost:8080/solr/master/select?q=page_title_t:"products*"; But when I have analyzed results, in both result sets, documents which dont start with words "Product" or "products" didnt come though there are few documents available. So I have added * as prefix and suffix to search term without double quotes to do wildcard search. http://localhost:8080/solr/master/select?q=page_title_t:*product* http://localhost:8080/solr/master/select?q=page_title_t:*products* Now, stemming is not working as above second query is not returning similar results as query 1. If double quotes are added around search term then its returning similar results but results are not as expected. With double quotes it wont return results like "Old products", "New products", "Cool Product". It will only return results with the values like "Product 1", "Product 2","Products of USA". Please suggest or guide how to make stemming work with wildcard search. Appreciate immediate response!! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi, Forget about patternReplaceCharFilter for a moment. Your example is more clear this time. q=titleName:1999/99* should return following two docs: d1) JULIUS CAESER (1999/99) d2) ARABIAN NIGHTS - 1999/99 This is achievable with the following type. 1) MappingCharFilterFactory with mappings.txt "(" => "" ")" => "" 2) WhiteSpaceTokenizerFactory 3) LowercaseFilterFactory I dont understand your sentence : "i will never be able to specifically search the title i want as 1999/99." But please try / test above. I also suggest you to use prefix query parser. https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-PrefixQueryParser Ahmet On Wednesday, March 5, 2014 11:20 PM, Kashish wrote: Hi Ahmet, Let me explain with another scenario . There is a title -> ARABIAN NIGHTS - 1999/99 Now in autocomplete, if i give 1999/99 , in the backend i append an asterisk to it and form the solr url thsi way q=titleName:1999/99* I get the above mentioned title.- so works perfect Now lets add another title to this. -> JULIUS CAESER (1999/99) If i pass the same query parameter, i would definitely expect both these titles to come up. but this new one doesn't come(Because of the braces). I can add patternReplaceFilter but this way i will never be able to specifically search the title i want as 1999/99. Hope you get what i am trying to achieve. Is my understanding wrong somewhere? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Ahmet, Let me explain with another scenario . There is a title -> ARABIAN NIGHTS - 1999/99 Now in autocomplete, if i give 1999/99 , in the backend i append an asterisk to it and form the solr url thsi way q=titleName:1999/99* I get the above mentioned title.- so works perfect Now lets add another title to this. -> JULIUS CAESER (1999/99) If i pass the same query parameter, i would definitely expect both these titles to come up. but this new one doesn't come(Because of the braces). I can add patternReplaceFilter but this way i will never be able to specifically search the title i want as 1999/99. Hope you get what i am trying to achieve. Is my understanding wrong somewhere? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Kashish, This is confusing. You gave the following example : query 1999/99* should return RABIAN NIGHTS #01 (1999/99) However you said "I cannot ignore parenthesis or other special characters..." Above two contadicts each other. Since you are after autocomplete you might be interested in this http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ Ahmet On Wednesday, March 5, 2014 8:36 PM, Kashish wrote: Hi, Pls help me with this. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi, Pls help me with this. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Erick, I understand what you pointing out but the thing is.. this is for autocomplete feature. I cannot ignore parenthesis or other special characters as in certain titles like 'A Team of five', if the user fives 'a team' then titles containing a-team and rest also comes off and this one gets lost as we show only top 6 results (user can drill down to get closer to the result he wants). I modified my fieldtype so at index added worddelimeter delimeter and at query time added patternfilter but now still if i use asterisk i get no records for 1999/99* but get without asterisk. Thsi is not what i want as by default, whatever the user enters we append asterisk to it for autocomplete search. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
The admin/analysis page is your friend. Taking some time to get acquainted with that page will save you lots and lots and lots of time. In this case, you'd have seen that your input is actually tokenized as (1999/99), parentheses and all as a _single_ token, so of course searching for 1999/99 wouldn't work. Searching for *1999/99* is generally a bad idea. It'll work, but it's a kludge. What you _do_ need to do is define your use-cases. Let's assume that you _never_ want parentheses to be relevant. You could use PatternReplaceCharFilterFactory or PatternReplaceFilterFactory in both index and query parts of your analysis chain to remove parens. Or really any kinds of extraneous characters you decided were unimportant. But you need to decide what's "important" and enforce that. Best, Erick On Tue, Feb 25, 2014 at 7:28 PM, Kashish wrote: > Hi Ahmet/Erick, > > I tried escaping as well. See no luck. > > The title am looking for is - ARABIAN NIGHTS #01 (1999/99) > > I figured out that if i pass the query as *1999/99* (i.e asterisk not only > at the end but at the beginning as well), It works. > > The problem is the braces. I can change my field type and add > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/> > > But this will show too many results in autocomplete. > > Is there any best way to handle this? Or should i pass asterisk before and > after the query? > > Thanks. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119678.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Ahmet/Erick, I tried escaping as well. See no luck. The title am looking for is - ARABIAN NIGHTS #01 (1999/99) I figured out that if i pass the query as *1999/99* (i.e asterisk not only at the end but at the beginning as well), It works. The problem is the braces. I can change my field type and add But this will show too many results in autocomplete. Is there any best way to handle this? Or should i pass asterisk before and after the query? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi, By saying escaping I mean this : q=title_autocomplete:1999\/99* It is different than URL encoding. http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters If prefix query parser didn't return what you want then it must be something with indexed terms. Can you give an example raw documents text that you expect to retrieve with this query? On Tuesday, February 25, 2014 10:15 PM, Kashish wrote: Hi Ahmet, Thanks for your reply. Yes. I pass my query this way - > q=title_autocomplete:1999%2f99 I tried your way too. But no luck. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query conatins numbers along with special characters.
What does it say happens on your admin/analysis page for that field? And did you by any chance change your schema without reindexing everything? Also, try the TermsComonent to see what tokens are actually _in_ your index. Schema-browser from the admin page can help here too. Best, Erick On Tue, Feb 25, 2014 at 12:05 PM, Ahmet Arslan wrote: > Hi Kashish, > > > What happens when you use this q={!prefix f=title_autocomplete}1999/99 > > I suspect '/' character is a special query parser character therefore it > needs to be escaped. > > Ahmet > > > On Tuesday, February 25, 2014 9:55 PM, Kashish > wrote: > Hi, > > I have a very weird problem. The wild card search works fine for all > scenarios but one. It doesn't seem to give any result for query 1999/99*. I > checked the debug query and its formed perfect. > > title_autocomplete:1999/99* > title_autocomplete:1999/99* > (+title_autocomplete:1999/99* ())/no_coord > +title_autocomplete:1999/99* () > > This is my fieldType > > positionIncrementGap="100"> > > > words="stopwords.txt" enablePositionIncrements="true" /> > > > > > > words="stopwords.txt" enablePositionIncrements="true" /> > > > > > > > Please help we with this. > > Thanks. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Ahmet, Thanks for your reply. Yes. I pass my query this way - > q=title_autocomplete:1999%2f99 I tried your way too. But no luck. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query conatins numbers along with special characters.
Hi Kashish, What happens when you use this q={!prefix f=title_autocomplete}1999/99 I suspect '/' character is a special query parser character therefore it needs to be escaped. Ahmet On Tuesday, February 25, 2014 9:55 PM, Kashish wrote: Hi, I have a very weird problem. The wild card search works fine for all scenarios but one. It doesn't seem to give any result for query 1999/99*. I checked the debug query and its formed perfect. title_autocomplete:1999/99* title_autocomplete:1999/99* (+title_autocomplete:1999/99* ())/no_coord +title_autocomplete:1999/99* () This is my fieldType Please help we with this. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcard search not working if the query conatins numbers along with special characters.
Hi, I have a very weird problem. The wild card search works fine for all scenarios but one. It doesn't seem to give any result for query 1999/99*. I checked the debug query and its formed perfect. title_autocomplete:1999/99* title_autocomplete:1999/99* (+title_autocomplete:1999/99* ())/no_coord +title_autocomplete:1999/99* () This is my fieldType Please help we with this. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr wildcard search
Also be aware that some analysis steps may not be performed on wildcards. The filter has to be MultTermAware. See: https://wiki.apache.org/solr/MultitermQueryAnalysis and http://searchhub.org/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ Best, Erick On Fri, Sep 13, 2013 at 12:12 PM, Jack Krupansky wrote: > Wildcard applies only to a single term. The escaped space suggests that > you are trying to match a wildcard on multiple terms. > > Try the contrib complex phrase query parser. > > -- Jack Krupansky > > -Original Message- From: Prasi S > Sent: Friday, September 13, 2013 6:37 AM > To: solr-user@lucene.apache.org > Subject: Solr wildcard search > > > Hi all, > I am working with wildcard queries and few things are confusing. > > 1. Does a wildcard search omit the analysers on a particular field? > > 2. I have searched for > q=google\ technology - >gives result > q=google technology -> Gives results > q=google tech* -> gives results > q=google\ tech* -> 0 results. The debug Query for the last query is name="parsedquery_toString">**text:google tech* > > Why does this happen. > > > Thanks, > Prasi >
Solr wildcard search
Hi all, I am working with wildcard queries and few things are confusing. 1. Does a wildcard search omit the analysers on a particular field? 2. I have searched for q=google\ technology - >gives result q=google technology -> Gives results q=google tech* -> gives results q=google\ tech* -> 0 results. The debug Query for the last query is text:google tech* Why does this happen. Thanks, Prasi
Re: Solr wildcard search
Wildcard applies only to a single term. The escaped space suggests that you are trying to match a wildcard on multiple terms. Try the contrib complex phrase query parser. -- Jack Krupansky -Original Message- From: Prasi S Sent: Friday, September 13, 2013 6:37 AM To: solr-user@lucene.apache.org Subject: Solr wildcard search Hi all, I am working with wildcard queries and few things are confusing. 1. Does a wildcard search omit the analysers on a particular field? 2. I have searched for q=google\ technology - >gives result q=google technology -> Gives results q=google tech* -> gives results q=google\ tech* -> 0 results. The debug Query for the last query is text:google tech* Why does this happen. Thanks, Prasi
Re: Synonyms with wildcard search
Sorry, but Solr synonym processing does not know about wildcards, so it is bypassed when a wildcard is present. Technically, it could probably be enhanced to support them, at least for some common special cases such as yours, but that prospect won't help you right now. Your best bet is to preprocess your queries in your application layer and perform the mapping there. -- Jack Krupansky -Original Message- From: Sandeep Gupta Sent: Tuesday, July 30, 2013 5:22 AM To: solr-user@lucene.apache.org Subject: Synonyms with wildcard search Hello All, I want to know whether it is possible to make a query of word which has synonym+wildcard. For example : I have one field which is type of text_en (default fieldType in 4.3.1) And synonym.txt file has this entry colour => color Now when I am using full text search as colour* (with wild card) then search result is not returning the keyword of type colorology... (as in case If I use color* then I am getting this word) So any suggestions as how I can achieve this Or its not possible. Thanks Sandeep
Synonyms with wildcard search
Hello All, I want to know whether it is possible to make a query of word which has synonym+wildcard. For example : I have one field which is type of text_en (default fieldType in 4.3.1) And synonym.txt file has this entry colour => color Now when I am using full text search as colour* (with wild card) then search result is not returning the keyword of type colorology... (as in case If I use color* then I am getting this word) So any suggestions as how I can achieve this Or its not possible. Thanks Sandeep
Re: Is leading wildcard search turned on by default in Solr 3.6.1?
Just a quick comment from our experience: since we have quite a lot of data indexed in our Solr, we take some extra measures to ensure, no bogus wild-card queries are accepted by the system (for instance *, **, *** etc). And that is done in the QueryParser. Wanted to mention this approach as one way of handling simple query security checks. -- Dmitry On Tue, Nov 13, 2012 at 6:22 AM, Jack Krupansky wrote: > Be sure to realize that even with reverse wildcard support, the user can > add a trailing wildcard as well (double-ended wildcard) and then you are > back in the same boat. > > The overall idea is that: 1) Hardware is much faster than just 3 or 4 > years ago, and 2) even though document counts are getting much larger, the > number of unique terms (which is all that matters for wildcard performance) > does not tend to grow as fast as document count grows. And, some fields > have a much more limited vocabulary (unique terms), so a leading wildcard > is not necessarily a big performance hit. > > Technology advances. We should permit our mindsets to advance as well. > > -- Jack Krupansky > > > -Original Message- From: François Schiettecatte > Sent: Monday, November 12, 2012 2:38 PM > To: solr-user@lucene.apache.org > Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1? > > > John > > You can still use leading wildcards even if you dont have the > ReversedWildcardFilterFactory in your analysis but it means you will be > scanning the entire dictionary when the search is run which can be a > performance issue. If you do use ReversedWildcardFilterFactory you wont > have that performance issue but you will increase the overall size of your > index. Its a tradeoff. > > When I looked into it for a site I built I decided that the tradeoff was > not worth it (after benchmarking) given how few leading wildcards searches > it was getting. > > Best regards > > François > > > On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote: > > >> >> Hi, >> >> >> I'm migrating from Solr 1.2 to 3.6.1. I used the same analyzer as I was, >> and re-indexed my data. I did not add >> solr.**ReversedWildcardFilterFactory to my index analyzer, but yet >> leading wild cards are working!! Does this mean it's turned on by default? >> If so, how do I turn it off, and what are the implication of leaving ON? >> Won't my searches be slower and consume more memory? >> >> >> Thanks, >> >> >> --MJ >> >> -- Regards, Dmitry Kan
Re: Is leading wildcard search turned on by default in Solr 3.6.1?
Be sure to realize that even with reverse wildcard support, the user can add a trailing wildcard as well (double-ended wildcard) and then you are back in the same boat. The overall idea is that: 1) Hardware is much faster than just 3 or 4 years ago, and 2) even though document counts are getting much larger, the number of unique terms (which is all that matters for wildcard performance) does not tend to grow as fast as document count grows. And, some fields have a much more limited vocabulary (unique terms), so a leading wildcard is not necessarily a big performance hit. Technology advances. We should permit our mindsets to advance as well. -- Jack Krupansky -Original Message- From: François Schiettecatte Sent: Monday, November 12, 2012 2:38 PM To: solr-user@lucene.apache.org Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1? John You can still use leading wildcards even if you dont have the ReversedWildcardFilterFactory in your analysis but it means you will be scanning the entire dictionary when the search is run which can be a performance issue. If you do use ReversedWildcardFilterFactory you wont have that performance issue but you will increase the overall size of your index. Its a tradeoff. When I looked into it for a site I built I decided that the tradeoff was not worth it (after benchmarking) given how few leading wildcards searches it was getting. Best regards François On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote: Hi, I'm migrating from Solr 1.2 to 3.6.1. I used the same analyzer as I was, and re-indexed my data. I did not add solr.ReversedWildcardFilterFactory to my index analyzer, but yet leading wild cards are working!! Does this mean it's turned on by default? If so, how do I turn it off, and what are the implication of leaving ON? Won't my searches be slower and consume more memory? Thanks, --MJ
Re: Is leading wildcard search turned on by default in Solr 3.6.1?
On Tue, Nov 13, 2012 at 2:27 AM, wrote: > I'm surprised that this has not been logged as adefect. The fact that this > is ON bydefault, means someone can bring down a server; this is bad enough to > categorizethis as a security issue. It's all relative. There are tons of queries that can take a long time and disabling them all by default would just be frustrating for users (range queries, prefix queries, regex queries, etc). If a single wildcard query like *a is bad, then non leading wildcard a*a a*a a*a a*a a*a a*a a*a a*a will probably be just as bad (or [a TO z], or [* TO *], etc. It's no real protection from a security perspective. Individual control of different query types in edismax would probably be nice though (and perhaps a minimum wildcard prefix length rather than just an on/off switch). -Yonik http://lucidworks.com
RE: Is leading wildcard search turned on by default in Solr 3.6.1?
I'm surprised that this has not been logged as adefect. The fact that this is ON bydefault, means someone can bring down a server; this is bad enough to categorizethis as a security issue. --MJ -Original Message- From: Michael Ryan [mailto:mr...@moreover.com] Sent: Monday, November 12, 2012 8:10 PM To: solr-user@lucene.apache.org Subject: RE: Is leading wildcard search turned on by default in Solr 3.6.1? Yeah, thesituation is kind of a pain right now. In https://issues.apache.org/jira/browse/SOLR-2438, it was enabled by default and there is noway to disable without patching SolrQueryParser. There's also the edismaxparser which doesn't have a setting for this, which I've made a jira for at https://issues.apache.org/jira/browse/SOLR-3031. I'm surprisedother people haven't requested this, as any instance of serious size can bebrought to its knees by a wildcard query. -Michael -OriginalMessage- From: johnmu...@aol.com [mailto:johnmu...@aol.com] Sent: Monday,November 12, 2012 7:58 PM To: solr-user@lucene.apache.org Subject: RE: Isleading wildcard search turned on by default in Solr 3.6.1? At one point, insome version of Solr, it was OFF by default, and you had to enable it via asetting (either in solrconfig.xml or schema.xml, I don't remember). It looks like this is no longer thecase. Even worse, and if this is true,disabling it no longer seems to be possible to disable it via a Solr setting!! -- MJ
RE: Is leading wildcard search turned on by default in Solr 3.6.1?
Yeah, the situation is kind of a pain right now. In https://issues.apache.org/jira/browse/SOLR-2438, it was enabled by default and there is no way to disable without patching SolrQueryParser. There's also the edismax parser which doesn't have a setting for this, which I've made a jira for at https://issues.apache.org/jira/browse/SOLR-3031. I'm surprised other people haven't requested this, as any instance of serious size can be brought to its knees by a wildcard query. -Michael -Original Message- From: johnmu...@aol.com [mailto:johnmu...@aol.com] Sent: Monday, November 12, 2012 7:58 PM To: solr-user@lucene.apache.org Subject: RE: Is leading wildcard search turned on by default in Solr 3.6.1? At one point, in some version of Solr, it was OFF by default, and you had to enable it via a setting (either in solrconfig.xml or schema.xml, I don't remember). It looks like this is no longer the case. Even worse, and if this is true, disabling it no longer seems to be possible to disable it via a Solr setting!! -- MJ
RE: Is leading wildcard search turned on by default in Solr 3.6.1?
At one point, in some version of Solr, it was OFF by default, and you had to enable it via a setting (either in solrconfig.xml or schema.xml, I don't remember). It looks like this is no longer the case. Even worse, and if this is true, disabling it no longer seems to be possible to disable it via a Solr setting!! -- MJ -Original Message- From: François Schiettecatte [mailto:fschietteca...@gmail.com] Sent: Monday, November 12, 2012 7:48 PM To: solr-user@lucene.apache.org Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1? I suspect it is just part of the wildcard handling, maybe someone can chime in here, you may need to catch this before it gets to SOLR. François On Nov 12, 2012, at 5:44 PM, johnmu...@aol.com wrote: > Thanks for the quick response. > > > So, I do not want to use ReversedWildcardFilterFactory, but leading wildcard > is working and thus is ON by default. How do I disable it to prevent the use > of it and the issues that come with it? > > > -- MJ > > > > -Original Message- > From: François Schiettecat > te > To: solr-user > Sent: Mon, Nov 12, 2012 5:39 pm > Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1? > > > John > > You can still use leading wildcards even if you dont have the > ReversedWildcardFilterFactory in your analysis but it means you will > be scanning the entire dictionary when the search is run which can be a > performance issue. > If you do use ReversedWildcardFilterFactory you wont have that > performance issue but you will increase the overall size of your index. Its a > tradeoff. > > When I looked into it for a site I built I decided that the tradeoff > was not worth it (after benchmarking) given how few leading wildcards > searches it was getting. > > Best regards > > François > > > On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote: > >> >> >> Hi, >> >> >> I'm migrating from Solr 1.2 to 3.6.1. I used the same analyzer as I >> was, and > re-indexed my data. I did not add >> solr.ReversedWildcardFilterFactory to my index analyzer, but yet >> leading wild > cards are working!! Does this mean it's turned on by default? If so, > how do I turn it off, and what are the implication of leaving ON? > Won't my searches be slower and consume more memory? >> >> >> Thanks, >> >> >> --MJ >> > > > >
Re: Is leading wildcard search turned on by default in Solr 3.6.1?
I suspect it is just part of the wildcard handling, maybe someone can chime in here, you may need to catch this before it gets to SOLR. François On Nov 12, 2012, at 5:44 PM, johnmu...@aol.com wrote: > Thanks for the quick response. > > > So, I do not want to use ReversedWildcardFilterFactory, but leading wildcard > is working and thus is ON by default. How do I disable it to prevent the use > of it and the issues that come with it? > > > -- MJ > > > > -Original Message- > From: François Schiettecat > te > To: solr-user > Sent: Mon, Nov 12, 2012 5:39 pm > Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1? > > > John > > You can still use leading wildcards even if you dont have the > ReversedWildcardFilterFactory in your analysis but it means you will be > scanning > the entire dictionary when the search is run which can be a performance > issue. > If you do use ReversedWildcardFilterFactory you wont have that performance > issue > but you will increase the overall size of your index. Its a tradeoff. > > When I looked into it for a site I built I decided that the tradeoff was not > worth it (after benchmarking) given how few leading wildcards searches it was > getting. > > Best regards > > François > > > On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote: > >> >> >> Hi, >> >> >> I'm migrating from Solr 1.2 to 3.6.1. I used the same analyzer as I was, >> and > re-indexed my data. I did not add >> solr.ReversedWildcardFilterFactory to my index analyzer, but yet leading >> wild > cards are working!! Does this mean it's turned on by default? If so, how do > I > turn it off, and what are the implication of leaving ON? Won't my searches > be > slower and consume more memory? >> >> >> Thanks, >> >> >> --MJ >> > > > >