Re: wildcard search doesn't fetch results when field has whi
Yes. But You haven’t told us what _type_ of field you’re working with though. If it’s a “string” type, then ComplexPhraseQueryParser won’t work. Looking again at your example it looks as though you are using strings. Then try abc\ d* Adding debug=query to your url will show you how the query gets parsed and may help considerably. Best, Erick > On Mar 31, 2019, at 7:24 AM, Ahemad Ali > wrote: > > Erick,I tried complexqueryparser, still no result.Escape white space, do you > mean to say using "\" ?Thanks,Ahemad > > Sent from Yahoo Mail on Android > > On Sun, Mar 31, 2019 at 1:22, Erick Erickson wrote: > Try complexphrasequeryparser. If (and only if) you always want to search > from the beginning of the content, you might be able to use string rather > than text-based Fields but make sure to escape whitespace... > > Best, > Erick > > On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID > wrote: > >> Hi , >> I have field with white spaces and special characters on which indexing >> needs to be done to do wildcard querying. >> It works for most of the scnearios with wildcard search. >> e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali" >> then search with ali* gives this three results. >> >> But I am not able to search with say - ali a* >> >> Search with query q="ali abc" gives exact match and desired result. >> >> I want to do wildcard search where criteria can include spaces like >> example - "ahemad a* or ahemad a* >> >> >> i.e. if space is present then I am not able to to wildcard search. >> >> Is there any way by which wildcard search will be achieved even if space >> is present in token. >> >> The field type have is below: >> >> > sortMissingLast="true"> >> >> >> >> >> >> >> >> > replacement=""replace="all" /> >> >> >> >> >> >> >> >> >> >> >> >> > replacement=""replace="all" /> >> >> >> >> >> Any help would be great. >> Thanks,Ahemad Ali >
Re: wildcard search doesn't fetch results when field has whi
Erick,I tried complexqueryparser, still no result.Escape white space, do you mean to say using "\" ?Thanks,Ahemad Sent from Yahoo Mail on Android On Sun, Mar 31, 2019 at 1:22, Erick Erickson wrote: Try complexphrasequeryparser. If (and only if) you always want to search from the beginning of the content, you might be able to use string rather than text-based Fields but make sure to escape whitespace... Best, Erick On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID wrote: > Hi , > I have field with white spaces and special characters on which indexing > needs to be done to do wildcard querying. > It works for most of the scnearios with wildcard search. > e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali" > then search with ali* gives this three results. > > But I am not able to search with say - ali a* > > Search with query q="ali abc" gives exact match and desired result. > > I want to do wildcard search where criteria can include spaces like > example - "ahemad a* or ahemad a* > > > i.e. if space is present then I am not able to to wildcard search. > > Is there any way by which wildcard search will be achieved even if space > is present in token. > > The field type have is below: > > sortMissingLast="true"> > > > > > > > > replacement=""replace="all" /> > > > > > > > > > > > > replacement=""replace="all" /> > > > > > Any help would be great. > Thanks,Ahemad Ali
Re: wildcard search doesn't fetch results when field has white spaces and special charecters
Try complexphrasequeryparser. If (and only if) you always want to search from the beginning of the content, you might be able to use string rather than text-based Fields but make sure to escape whitespace... Best, Erick On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID wrote: > Hi , > I have field with white spaces and special characters on which indexing > needs to be done to do wildcard querying. > It works for most of the scnearios with wildcard search. > e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali" > then search with ali* gives this three results. > > But I am not able to search with say - ali a* > > Search with query q="ali abc" gives exact match and desired result. > > I want to do wildcard search where criteria can include spaces like > example - "ahemad a* or ahemad a* > > > i.e. if space is present then I am not able to to wildcard search. > > Is there any way by which wildcard search will be achieved even if space > is present in token. > > The field type have is below: > > sortMissingLast="true"> > > > > > > > > replacement=""replace="all" /> > > > > > > > > > > > > replacement=""replace="all" /> > > > > > Any help would be great. > Thanks,Ahemad Ali
RE: Wildcard search not working
Hi Ahmet, Hi Upayavira, OK, it seems that I have to dive a bit deeper in the Solr filters and tokenizers. I've just realized that my command there is too limited. Thanks a lot guys so far for help. Cheers and have a nice day, christian -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Freitag, 12. August 2016 07:41 To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) Subject: Re: Wildcard search not working Hi Christian, Please use the following filter before/above the stemmer. Plus, you may want to add : Ahmet On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" <christian.ribe...@novartis.com> wrote: Hi Ahmet, Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you. Let's be a bit more concrete. Following the schema snippet for the corresponding field: ... ... What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches? Many thanks for your time. Cheers, christian -- Christian Ribeaud Software Engineer (External) NIBR / WSJ-310.5.17 Novartis Campus CH-4056 Basel -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Donnerstag, 11. August 2016 16:00 To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) Subject: Re: Wildcard search not working Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the query r?che won't match it. Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <christian.ribe...@novartis.com> wrote: Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
Re: Wildcard search not working
Hi Christian, Please use the following filter before/above the stemmer. Plus, you may want to add : Ahmet On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" <christian.ribe...@novartis.com> wrote: Hi Ahmet, Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you. Let's be a bit more concrete. Following the schema snippet for the corresponding field: ... ... What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches? Many thanks for your time. Cheers, christian -- Christian Ribeaud Software Engineer (External) NIBR / WSJ-310.5.17 Novartis Campus CH-4056 Basel -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Donnerstag, 11. August 2016 16:00 To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) Subject: Re: Wildcard search not working Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the query r?che won't match it. Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <christian.ribe...@novartis.com> wrote: Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
Re: Wildcard search not working
You have a stemming filter in your analysis chain. Go to the analysis tab, select the 'text' field, and put "Roche" into both boxes. Click analyse. I bet you you will see Roch, not Roche, because of your stemming filter shown below. That's what Ahmet shrewdly identified above. Upayavira On Thu, 11 Aug 2016, at 08:31 PM, Ribeaud, Christian (Ext) wrote: > Hi Ahmet, > > Many thanks for your reply. I had a look at the URL you pointed out but, > honestly, I have to admit that I did not fully understand you. > Let's be a bit more concrete. Following the schema snippet for the > corresponding field: > > ... > required="false" multiValued="false" /> > > > positionIncrementGap="100"> > > > > words="lang/stopwords_de.txt" format="snowball" /> > > > > > > > ... > > What is wrong with this schema? Respectively, what should I change to be > able to correctly do wildcard searches? > > Many thanks for your time. Cheers, > > christian > -- > Christian Ribeaud > Software Engineer (External) > NIBR / WSJ-310.5.17 > Novartis Campus > CH-4056 Basel > > > -----Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: Donnerstag, 11. August 2016 16:00 > To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) > Subject: Re: Wildcard search not working > > Hi Chiristian, > > The query r?che may not return at least the same number of matches as > roche depending on your analysis chain. > The difference is roche is analyzed but r?che don't. Wildcard queries are > executed on the indexed/analyzed terms. > For example, if roche is indexed/analyzed as roch, the query r?che won't > match it. > > Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis > > Ahmet > > > > On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" > <christian.ribe...@novartis.com> wrote: > Hi, > > What would be the reasons making the wildcard search for Lucene Query > Parser NOT working? > > We are using Solr 5.4.1 and, using the admin console, I am triggering for > instance searches with term 'roche' in a specific core. Everything fine, > I am getting for instance two matches. I would expect at least the same > number of matches with term 'r?che'. However, this does NOT happen. I am > getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not > work neither but 'roch*' works. > > Switching debug mode brings following output: > > "debug": { > "rawquerystring": "roch?", > "querystring": "roch?", > "parsedquery": "text:roch?", > "parsedquery_toString": "text:roch?", > "explain": {}, > "QParser": "LuceneQParser", > ... > > Any idea? Thanks and cheers, > > christian
RE: Wildcard search not working
Hi Ahmet, Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you. Let's be a bit more concrete. Following the schema snippet for the corresponding field: ... ... What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches? Many thanks for your time. Cheers, christian -- Christian Ribeaud Software Engineer (External) NIBR / WSJ-310.5.17 Novartis Campus CH-4056 Basel -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Donnerstag, 11. August 2016 16:00 To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext) Subject: Re: Wildcard search not working Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the query r?che won't match it. Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <christian.ribe...@novartis.com> wrote: Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
Re: Wildcard search not working
Hi Chiristian, The query r?che may not return at least the same number of matches as roche depending on your analysis chain. The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms. For example, if roche is indexed/analyzed as roch, the query r?che won't match it. Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)"wrote: Hi, What would be the reasons making the wildcard search for Lucene Query Parser NOT working? We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works. Switching debug mode brings following output: "debug": { "rawquerystring": "roch?", "querystring": "roch?", "parsedquery": "text:roch?", "parsedquery_toString": "text:roch?", "explain": {}, "QParser": "LuceneQParser", ... Any idea? Thanks and cheers, christian
RE: wildcard search for string having spaces
Great. First option worked for me. I was trying with q=abc\sp*... it should be q=abc\ p* Thanks -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Wednesday, June 15, 2016 6:25 PM To: solr-user@lucene.apache.org; Roshan Kamble Subject: Re: wildcard search for string having spaces Hi Roshan, I think there are two options: 1) escape the space q=abc\ p* 2) use prefix query parser q={!prefix f=my_string}abc p Ahmet On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble <roshan.kam...@smartstreamrdu.com> wrote: Hello, I have below custom field type defined for solr 6.0.0 I am using above field to ensure that entire string is considered as single token and search should be case insensitive. It works for most of the scnearios with wildcard search. e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* gives this three results. But I am not able to search with say abc p* Search with query q="abc pqr" gives exact match and desired result. I want to do wildcard search where criteria can include spaces like above example i.e. if space is present then I am not able to to wildcard search. Is there any way by which wildcard search will be achieved even if space is present in token. Regards, Roshan The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
Re: wildcard search for string having spaces
Hi Roshan, I think there are two options: 1) escape the space q=abc\ p* 2) use prefix query parser q={!prefix f=my_string}abc p Ahmet On Wednesday, June 15, 2016 3:48 PM, Roshan Kamblewrote: Hello, I have below custom field type defined for solr 6.0.0 I am using above field to ensure that entire string is considered as single token and search should be case insensitive. It works for most of the scnearios with wildcard search. e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* gives this three results. But I am not able to search with say abc p* Search with query q="abc pqr" gives exact match and desired result. I want to do wildcard search where criteria can include spaces like above example i.e. if space is present then I am not able to to wildcard search. Is there any way by which wildcard search will be achieved even if space is present in token. Regards, Roshan The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
Re: Wildcard search makes no sense!!
Many many thanks for the replies - it was helpful for me to start understanding how this works. I'm using 3.5 so this goes to explain a lot. What I have done is if the query contains a * I make the query lowercase before sending to solr. This seems to have solved this issue given your explanation above. Many thanks Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162284.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
On 10/2/2014 4:33 AM, waynemailinglist wrote: Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? The index has two terms for this field if this is the whole input -- hello and you -- which can be searched for individually. The tokenizer does the initial job of separating the input into tokens (terms) ... some filters can create additional terms, depending on exactly what's left when the tokenizer is done. Thanks, Shawn
Re: Wildcard search makes no sense!!
right, prior to 3.6, the standard way to handle wildcards was to, essentially, pre-analyze the terms that had wildcards. This works fine for simple filters, things like lowercasing for instance, but doesn't work so well for things like stemming. So you're doing what can be done at this point, but moving to 4.x (or even 3.6) would solve it better. Best, Erick On Thu, Oct 2, 2014 at 6:29 AM, Shawn Heisey apa...@elyograg.org wrote: On 10/2/2014 4:33 AM, waynemailinglist wrote: Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? The index has two terms for this field if this is the whole input -- hello and you -- which can be searched for individually. The tokenizer does the initial job of separating the input into tokens (terms) ... some filters can create additional terms, depending on exactly what's left when the tokenizer is done. Thanks, Shawn
Re: Wildcard search makes no sense!!
Ok I think I understand your points there. Just clarify say if the term was Large increased and my filters went something like: Large|increased Large|increase|increased large|increase|increased the final tokens indexed would be large|increase|increased ? Once again thanks for all the help. On Thu, Oct 2, 2014 at 2:30 PM, Shawn Heisey-2 [via Lucene] ml-node+s472066n4162306...@n3.nabble.com wrote: On 10/2/2014 4:33 AM, waynemailinglist wrote: Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? The index has two terms for this field if this is the whole input -- hello and you -- which can be searched for individually. The tokenizer does the initial job of separating the input into tokens (terms) ... some filters can create additional terms, depending on exactly what's left when the tokenizer is done. Thanks, Shawn -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162306.html To unsubscribe from Wildcard search makes no sense!!, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4162069code=d2F5bmVtYWlsaW5nbGlzdHNAZ21haWwuY29tfDQxNjIwNjl8LTIxOTMxNzkyNQ== . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
Hi, Probably you have stemmer and it is eating up Capital to capit. Thats the reason. Either remove stemmer from analyser chain or add keyword repeat filter. Ahmet On Wednesday, October 1, 2014 2:16 PM, Wayne W waynemailingli...@gmail.com wrote: Hi, I don't understand this at all. We are indexing some contact names. When we do a standard query: query 1: capi* result: Capital Health query 2: capit* result: Capital Health query 3: capita* result: no results query 4: capital* result: no results I understand (as we are using solar 3.5) that the wildcard search does not actually return the query without the wildcard so I understand at least why query 4 is not working ( I need to use: capital* OR capital ). What I don't understand is why query 3 is not working. Also if we place in the text field the following 3 contacts: j...@capitalhealth.com f...@capitalhealth.com Capital Heath When searching for: query A: capita* result: j...@capitalhealth.com, f...@capitalhealth.com query B: capit* result: j...@capitalhealth.com, f...@capitalhealth.com, Capital Heath What is going on and how can I solve this? many thanks as I'm really stuck on this
Re: Wildcard search makes no sense!!
On Wed, 2014-10-01 at 13:16 +0200, Wayne W wrote: query 2: capit* result: Capital Health query 3: capita* result: no results You are likely using a stemmer for the field: Capital Health gets indexed as capit and health, so there are no tokens starting with capita. Turn off the stemmer or add a non-stemmed copy-field for trunkated searches. (sanity-checked at http://9ol.es/porter_js_demo.html) - Toke Eskildsen, State and University Library, Denmark
Re: Wildcard search makes no sense!!
The presence of a wildcard in a query term short circuits some portions of the analysis process. Some token filters like lower case can still be performed on the query terms, but others, like stemming, cannot. So, either simplify the analysis (be more selective of what token filters you use), or you will have to modify your query terms so that you manually simulate the token transformations that your text analysis is performing. Take one of your indexed terms that you think should match and send it through the Solr Admin UI analysis page for the query field and see what the source token gets analyzed into - that's what your wildcard prefix must match. Sometimes (usually!) you will be surprised. -- Jack Krupansky -Original Message- From: Wayne W Sent: Wednesday, October 1, 2014 7:16 AM To: solr-user@lucene.apache.org Subject: Wildcard search makes no sense!! Hi, I don't understand this at all. We are indexing some contact names. When we do a standard query: query 1: capi* result: Capital Health query 2: capit* result: Capital Health query 3: capita* result: no results query 4: capital* result: no results I understand (as we are using solar 3.5) that the wildcard search does not actually return the query without the wildcard so I understand at least why query 4 is not working ( I need to use: capital* OR capital ). What I don't understand is why query 3 is not working. Also if we place in the text field the following 3 contacts: j...@capitalhealth.com f...@capitalhealth.com Capital Heath When searching for: query A: capita* result: j...@capitalhealth.com, f...@capitalhealth.com query B: capit* result: j...@capitalhealth.com, f...@capitalhealth.com, Capital Heath What is going on and how can I solve this? many thanks as I'm really stuck on this
Re: Wildcard search makes no sense!!
Ahmet - many thanks - I removed the EnglishPorterFilterFactory and reindexed and this seems to behave as expected now. Jack - thanks aswell - I'm very much a noob with this, and thats a great tip. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
I'm still stuck on this actually. I would really appreciate any pointers. If I search for : query 1: Κώστας result: Κώστας query 2: Κώστα* result: no result I've looked at the analyser but I don't really understand what I'm looking at if I'm honest. It gives the output: Field (name): title Field value: Κώστας Field value (query): Κώστα* Index Analyzer Κώστας Κώστας Κώστας κώστας κώστας Query Analyzer Κώστα* Κώστα* Κώστα* Κώστα κώστα κώστα In my schema I have defined tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ (only used in query) filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference after reindexing. Any ideas? many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
If you use * you use Multiterm analysis path, which is semi-hidden and is a lot more limited to the things done with normal tokens: https://wiki.apache.org/solr/MultitermQueryAnalysis The Analyzer components that are NOT multiterm aware cannot be used that way. Looking at: http://www.solr-start.com/info/analyzers/ , you can see that only LowerCase analyzer is multiterm aware (with (multi) in the brackets). So, the rest are not used. You may switch to EdgeNGrams or similar instead. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 1 October 2014 13:10, waynemailinglist waynemailingli...@gmail.com wrote: I'm still stuck on this actually. I would really appreciate any pointers. If I search for : query 1: Κώστας result: Κώστας query 2: Κώστα* result: no result I've looked at the analyser but I don't really understand what I'm looking at if I'm honest. It gives the output: Field (name): title Field value: Κώστας Field value (query): Κώστα* Index Analyzer Κώστας Κώστας Κώστας κώστας κώστας Query Analyzer Κώστα* Κώστα* Κώστα* Κώστα κώστα κώστα In my schema I have defined tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ (only used in query) filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference after reindexing. Any ideas? many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
Two things: 1 what version of Solr are you using? If it's prior to 3.6, then the bits that handle applying lowercaseFilter to wildcards isn't in the code. 2 what do you see if you add debug=query? I just tried it with your analysis chain and it seemed to work. Did you completely blow your index away when trying this? I did get into a state where my terms didn't show up. When you change the schema, sometimes some information about the fields is written into the index and is incompatible with later changes. By completely blow away I mean stop Solr rm -rf blah/collection/data start Solr reindex test Best, Erick On Wed, Oct 1, 2014 at 10:10 AM, waynemailinglist waynemailingli...@gmail.com wrote: I'm still stuck on this actually. I would really appreciate any pointers. If I search for : query 1: Κώστας result: Κώστας query 2: Κώστα* result: no result I've looked at the analyser but I don't really understand what I'm looking at if I'm honest. It gives the output: Field (name): title Field value: Κώστας Field value (query): Κώστα* Index Analyzer Κώστας Κώστας Κώστας κώστας κώστας Query Analyzer Κώστα* Κώστα* Κώστα* Κώστα κώστα κώστα In my schema I have defined tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ (only used in query) filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference after reindexing. Any ideas? many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working with search term having special characters and digits
Can someone help me out with this issue please? -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133770.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working with search term having special characters and digits
Can some one please help me with this as I am struck with this issue.. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working with search term having special characters and digits
Wildcard query only works for single terms. Any embedded special characters will cause a term to be split into multiple terms at index time. The use of a wildcard in a query term with embedded special characters will bypass normal analysis - you need to enter the term exactly as it would be analyzed at index time for wildcard to work. Ditto is your filed type uses the word delimiter filter with the split digits option enabled - the alpha and numeric portions will generate separate terms - and cause a wildcard to fail. -- Jack Krupansky -Original Message- From: Geepalem Sent: Sunday, April 27, 2014 3:30 PM To: solr-user@lucene.apache.org Subject: Wildcard search not working with search term having special characters and digits Hi, Below query without wildcard search is returning results. http://localhost:8080/solr/master/select?q=page_title_t:an-138; But below query with wildcard is not returning results http://localhost:8080/solr/master/select?q=page_title_t:an-13*; Below query with wildcard search and no didgits is returning results. http://localhost:8080/solr/master/select?q=page_title_t:an-*; I have tried by adding WordDelimeter Filter but there is no luck. filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ Please suggest or guide how to make wildcard search works with special characters and digits. Appreciate immediate response!! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working with search term having special characters and digits
Thanks jack for prompt response! So is there any solution to make this scenario works? Or wildcard doesn't work with special characters and numerics? Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133554.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi, Pls help me with this. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Kashish, This is confusing. You gave the following example : query 1999/99* should return RABIAN NIGHTS #01 (1999/99) However you said I cannot ignore parenthesis or other special characters... Above two contadicts each other. Since you are after autocomplete you might be interested in this http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ Ahmet On Wednesday, March 5, 2014 8:36 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi, Pls help me with this. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Ahmet, Let me explain with another scenario . There is a title - ARABIAN NIGHTS - 1999/99 Now in autocomplete, if i give 1999/99 , in the backend i append an asterisk to it and form the solr url thsi way q=titleName:1999/99* I get the above mentioned title.- so works perfect Now lets add another title to this. - JULIUS CAESER (1999/99) If i pass the same query parameter, i would definitely expect both these titles to come up. but this new one doesn't come(Because of the braces). I can add patternReplaceFilter but this way i will never be able to specifically search the title i want as 1999/99. Hope you get what i am trying to achieve. Is my understanding wrong somewhere? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi, Forget about patternReplaceCharFilter for a moment. Your example is more clear this time. q=titleName:1999/99* should return following two docs: d1) JULIUS CAESER (1999/99) d2) ARABIAN NIGHTS - 1999/99 This is achievable with the following type. 1) MappingCharFilterFactory with mappings.txt ( = ) = 2) WhiteSpaceTokenizerFactory 3) LowercaseFilterFactory I dont understand your sentence : i will never be able to specifically search the title i want as 1999/99. But please try / test above. I also suggest you to use prefix query parser. https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-PrefixQueryParser Ahmet On Wednesday, March 5, 2014 11:20 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi Ahmet, Let me explain with another scenario . There is a title - ARABIAN NIGHTS - 1999/99 Now in autocomplete, if i give 1999/99 , in the backend i append an asterisk to it and form the solr url thsi way q=titleName:1999/99* I get the above mentioned title.- so works perfect Now lets add another title to this. - JULIUS CAESER (1999/99) If i pass the same query parameter, i would definitely expect both these titles to come up. but this new one doesn't come(Because of the braces). I can add patternReplaceFilter but this way i will never be able to specifically search the title i want as 1999/99. Hope you get what i am trying to achieve. Is my understanding wrong somewhere? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Erick, I understand what you pointing out but the thing is.. this is for autocomplete feature. I cannot ignore parenthesis or other special characters as in certain titles like 'A Team of five', if the user fives 'a team' then titles containing a-team and rest also comes off and this one gets lost as we show only top 6 results (user can drill down to get closer to the result he wants). I modified my fieldtype so at index added worddelimeter delimeter and at query time added patternfilter but now still if i use asterisk i get no records for 1999/99* but get without asterisk. Thsi is not what i want as by default, whatever the user enters we append asterisk to it for autocomplete search. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query conatins numbers along with special characters.
Hi Kashish, What happens when you use this q={!prefix f=title_autocomplete}1999/99 I suspect '/' character is a special query parser character therefore it needs to be escaped. Ahmet On Tuesday, February 25, 2014 9:55 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi, I have a very weird problem. The wild card search works fine for all scenarios but one. It doesn't seem to give any result for query 1999/99*. I checked the debug query and its formed perfect. str name=rawquerystringtitle_autocomplete:1999/99*/str str name=querystringtitle_autocomplete:1999/99*/str str name=parsedquery(+title_autocomplete:1999/99* ())/no_coord/str str name=parsedquery_toString+title_autocomplete:1999/99* ()/str This is my fieldType fieldType name=text_general_Title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please help we with this. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Ahmet, Thanks for your reply. Yes. I pass my query this way - q=title_autocomplete:1999%2f99 I tried your way too. But no luck. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query conatins numbers along with special characters.
What does it say happens on your admin/analysis page for that field? And did you by any chance change your schema without reindexing everything? Also, try the TermsComonent to see what tokens are actually _in_ your index. Schema-browser from the admin page can help here too. Best, Erick On Tue, Feb 25, 2014 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Kashish, What happens when you use this q={!prefix f=title_autocomplete}1999/99 I suspect '/' character is a special query parser character therefore it needs to be escaped. Ahmet On Tuesday, February 25, 2014 9:55 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi, I have a very weird problem. The wild card search works fine for all scenarios but one. It doesn't seem to give any result for query 1999/99*. I checked the debug query and its formed perfect. str name=rawquerystringtitle_autocomplete:1999/99*/str str name=querystringtitle_autocomplete:1999/99*/str str name=parsedquery(+title_autocomplete:1999/99* ())/no_coord/str str name=parsedquery_toString+title_autocomplete:1999/99* ()/str This is my fieldType fieldType name=text_general_Title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please help we with this. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi, By saying escaping I mean this : q=title_autocomplete:1999\/99* It is different than URL encoding. http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters If prefix query parser didn't return what you want then it must be something with indexed terms. Can you give an example raw documents text that you expect to retrieve with this query? On Tuesday, February 25, 2014 10:15 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi Ahmet, Thanks for your reply. Yes. I pass my query this way - q=title_autocomplete:1999%2f99 I tried your way too. But no luck. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Ahmet/Erick, I tried escaping as well. See no luck. The title am looking for is - ARABIAN NIGHTS #01 (1999/99) I figured out that if i pass the query as *1999/99* (i.e asterisk not only at the end but at the beginning as well), It works. The problem is the braces. I can change my field type and add filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ But this will show too many results in autocomplete. Is there any best way to handle this? Or should i pass asterisk before and after the query? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
The admin/analysis page is your friend. Taking some time to get acquainted with that page will save you lots and lots and lots of time. In this case, you'd have seen that your input is actually tokenized as (1999/99), parentheses and all as a _single_ token, so of course searching for 1999/99 wouldn't work. Searching for *1999/99* is generally a bad idea. It'll work, but it's a kludge. What you _do_ need to do is define your use-cases. Let's assume that you _never_ want parentheses to be relevant. You could use PatternReplaceCharFilterFactory or PatternReplaceFilterFactory in both index and query parts of your analysis chain to remove parens. Or really any kinds of extraneous characters you decided were unimportant. But you need to decide what's important and enforce that. Best, Erick On Tue, Feb 25, 2014 at 7:28 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi Ahmet/Erick, I tried escaping as well. See no luck. The title am looking for is - ARABIAN NIGHTS #01 (1999/99) I figured out that if i pass the query as *1999/99* (i.e asterisk not only at the end but at the beginning as well), It works. The problem is the braces. I can change my field type and add filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ But this will show too many results in autocomplete. Is there any best way to handle this? Or should i pass asterisk before and after the query? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard-Search Solr 3.5.0
Chiming in late here, just back from vacation. But off the top of my head, I don't see any reason SnowballPorterFilterFactory shouldn't be MultiTermAware. I've created https://issues.apache.org/jira/browse/SOLR-3503 as a placeholder. Erick On Fri, May 25, 2012 at 1:31 PM, spr...@gmx.eu wrote: I don't know the specific rules in these specific stemmers, but generally a less aggressive stemming (e.g., plural-only) of paintings would be painting, while a more aggressive stemming would be paint. For some aggressive stemmers the stemmed word is not even a word. Sounds logically :) It would be nice to have doc with some example words for each stemmer. Absolutely! Thx alot!
Re: Wildcard-Search Solr 3.5.0
And I closed the JIRA, see the comments. But the short form is that it's not worth the effort because of the edge cases. Jack writes up some of them; the short form is what does stemming do with terms like organiz* . Sure, it would produce one token (which is the main restriction on a MultiTermAware filter), but the output might not be anything equivalent to the stem of organization, maybe not even organize. Better to avoid that rat-hole, it seems like one of those problems that could suck up enormous amounts of time and _still_ not do what's expected. If you _really_ want to try this, you could always define your own multiterm analysis component that included the stemmer, see: http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ But don't say I didn't warn you G... Best Erick On Sun, Jun 3, 2012 at 8:25 AM, Erick Erickson erickerick...@gmail.com wrote: Chiming in late here, just back from vacation. But off the top of my head, I don't see any reason SnowballPorterFilterFactory shouldn't be MultiTermAware. I've created https://issues.apache.org/jira/browse/SOLR-3503 as a placeholder. Erick On Fri, May 25, 2012 at 1:31 PM, spr...@gmx.eu wrote: I don't know the specific rules in these specific stemmers, but generally a less aggressive stemming (e.g., plural-only) of paintings would be painting, while a more aggressive stemming would be paint. For some aggressive stemmers the stemmed word is not even a word. Sounds logically :) It would be nice to have doc with some example words for each stemmer. Absolutely! Thx alot!
RE: Wildcard-Search Solr 3.5.0
Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field type. These two options... less / more aggressive. Aggressive in terms of what? Thank you! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Freitag, 25. Mai 2012 03:25 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 I tried it and it does appear to be the SnowballPorterFilterFactory that normally does the accent folding but can't here because it is not multi-term aware. I did notice that the text_de field type that comes in the Solr 3.6 example schema handles your case fine. It uses the GermanNormalizationFilterFactory to fold accented characters and is multi-term aware. Any particular reason you're not using the stock text_de field type? It also has three stemming options which might be sufficient for your needs. In any case, try to make your text_de field type closer to the stock version, and try to use GermanNormalizationFilterFactory, and that may be good enough for your situation.
Re: Wildcard-Search Solr 3.5.0
I don't know the specific rules in these specific stemmers, but generally a less aggressive stemming (e.g., plural-only) of paintings would be painting, while a more aggressive stemming would be paint. For some aggressive stemmers the stemmed word is not even a word. It would be nice to have doc with some example words for each stemmer. -- Jack Krupansky -Original Message- From: spr...@gmx.eu Sent: Friday, May 25, 2012 5:59 AM To: solr-user@lucene.apache.org Subject: RE: Wildcard-Search Solr 3.5.0 Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field type. These two options... less / more aggressive. Aggressive in terms of what? Thank you! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Freitag, 25. Mai 2012 03:25 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 I tried it and it does appear to be the SnowballPorterFilterFactory that normally does the accent folding but can't here because it is not multi-term aware. I did notice that the text_de field type that comes in the Solr 3.6 example schema handles your case fine. It uses the GermanNormalizationFilterFactory to fold accented characters and is multi-term aware. Any particular reason you're not using the stock text_de field type? It also has three stemming options which might be sufficient for your needs. In any case, try to make your text_de field type closer to the stock version, and try to use GermanNormalizationFilterFactory, and that may be good enough for your situation.
RE: Wildcard-Search Solr 3.5.0
I don't know the specific rules in these specific stemmers, but generally a less aggressive stemming (e.g., plural-only) of paintings would be painting, while a more aggressive stemming would be paint. For some aggressive stemmers the stemmed word is not even a word. Sounds logically :) It would be nice to have doc with some example words for each stemmer. Absolutely! Thx alot!
Re: Wildcard-Search Solr 3.5.0
I tried it and it does appear to be the SnowballPorterFilterFactory that normally does the accent folding but can't here because it is not multi-term aware. I did notice that the text_de field type that comes in the Solr 3.6 example schema handles your case fine. It uses the GermanNormalizationFilterFactory to fold accented characters and is multi-term aware. Any particular reason you're not using the stock text_de field type? It also has three stemming options which might be sufficient for your needs. In any case, try to make your text_de field type closer to the stock version, and try to use GermanNormalizationFilterFactory, and that may be good enough for your situation. -- Jack Krupansky -Original Message- From: spr...@gmx.eu Sent: Wednesday, May 23, 2012 10:16 AM To: solr-user@lucene.apache.org Subject: RE: Wildcard-Search Solr 3.5.0 I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. Yes, I think this hinders the automagically multiterm awarness to do it's job. Could an own analyzer chain with analyzer type=multiterm help? Like described (very, very short, too short...) here: http://wiki.apache.org/solr/MultitermQueryAnalysis
RE: Wildcard-Search Solr 3.5.0
No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you
Re: Wildcard-Search Solr 3.5.0
what about bä*-hits? -- Dmitry On Wed, May 23, 2012 at 2:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you -- Regards, Dmitry Kan
RE: Wildcard-Search Solr 3.5.0
No. No hits for bä*. It's something with the umlauts but I have no idea what... -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: Mittwoch, 23. Mai 2012 13:36 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 what about bä*-hits? -- Dmitry On Wed, May 23, 2012 at 2:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you -- Regards, Dmitry Kan
Re: Wildcard-Search Solr 3.5.0
do umlauts arrive properly on the server side, no encoding issues? Check the query params of the response xml/json/.. set debugQuery to true as well to see if it produces any useful diagnostic info. On Wed, May 23, 2012 at 2:58 PM, spr...@gmx.eu wrote: No. No hits for bä*. It's something with the umlauts but I have no idea what... -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: Mittwoch, 23. Mai 2012 13:36 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 what about bä*-hits? -- Dmitry On Wed, May 23, 2012 at 2:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you -- Regards, Dmitry Kan -- Regards, Dmitry Kan
RE: Wildcard-Search Solr 3.5.0
-Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: Mittwoch, 23. Mai 2012 14:02 To: solr-user@lucene.apache.org Subject: Re: Wildcard-Search Solr 3.5.0 do umlauts arrive properly on the server side, no encoding issues? Yes, works fine. It must, since I have hits for Bär or bär. It's just the combination between umlauts and wildcards. Must be something with the automagically Multiterm feature in Solr 3.6.
Re: Wildcard-Search Solr 3.5.0
Maybe a filter like ISOLatin1AccentFilter that doesn't get applied when using wildcards? How do the terms actually appear in the index? Jens On 05/23/2012 01:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you
RE: Wildcard-Search Solr 3.5.0
Maybe a filter like ISOLatin1AccentFilter that doesn't get applied when using wildcards? How do the terms actually appear in the index? Bär get indexed as bar. I use not ISOLatin1AccentFilter . My field def is this: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=German2 / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=German2 / /analyzer /fieldType /types
RE: Wildcard-Search Solr 3.5.0
I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. -Michael
RE: Wildcard-Search Solr 3.5.0
I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. Yes, I think this hinders the automagically multiterm awarness to do it's job. Could an own analyzer chain with analyzer type=multiterm help? Like described (very, very short, too short...) here: http://wiki.apache.org/solr/MultitermQueryAnalysis
RE: Wildcard-Search Solr 3.5.0
The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you
Re: Wildcard-Search Solr 3.5.0
The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
RE: Wildcard-Search Solr 3.5.0
Hi Ahmet, Please see http://wiki.apache.org/solr/MultitermQueryAnalysis so your advice is to upgrade to 3.6? Thank you
RE: Wildcard-Search Solr 3.5.0
so your advice is to upgrade to 3.6? Or, as a workaround, you can lowercase wildcard queries on the client side.
Re: Wildcard search not working if full word is queried
Hi François, it is indeed being stemmed, thanks a lot for the heads up. It appears that stemming is also configured for the query so it should work just the same, no? Thanks again. Regards, Celso 2011/6/30 François Schiettecatte fschietteca...@gmail.com: I would run that word through the analyzer, I suspect that the word 'teste' is being stemmed to 'test' in the index, at least that is the first place I would check. François On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote: Hi everyone, I'm having some trouble figuring out why a query with an exact word followed by the * wildcard, eg. teste*, returns no results while a query for test* returns results that have the word teste in them. I've created a couple of pasties: Exact word with wildcard : http://pastebin.com/n9SMNsH0 Similar word: http://pastebin.com/jQ56Ww6b Parameters other than title, description and content have no effect other than filtering out unwanted results. In a two of the four results, the title has the complete word teste. On the other two, the word appears in the other fields. Does anyone have any insights about what I'm doing wrong? Thanks in advance. Regards, Celso
Re: Wildcard search not working if full word is queried
Hi again, read (past tense) TFM :-) and: On wildcard and fuzzy searches, no text analysis is performed on the search word. Thanks a lot François! Regards, Celso On Fri, Jul 1, 2011 at 10:02 AM, Celso Pinto cpi...@yimports.com wrote: Hi François, it is indeed being stemmed, thanks a lot for the heads up. It appears that stemming is also configured for the query so it should work just the same, no? Thanks again. Regards, Celso 2011/6/30 François Schiettecatte fschietteca...@gmail.com: I would run that word through the analyzer, I suspect that the word 'teste' is being stemmed to 'test' in the index, at least that is the first place I would check. François On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote: Hi everyone, I'm having some trouble figuring out why a query with an exact word followed by the * wildcard, eg. teste*, returns no results while a query for test* returns results that have the word teste in them. I've created a couple of pasties: Exact word with wildcard : http://pastebin.com/n9SMNsH0 Similar word: http://pastebin.com/jQ56Ww6b Parameters other than title, description and content have no effect other than filtering out unwanted results. In a two of the four results, the title has the complete word teste. On the other two, the word appears in the other fields. Does anyone have any insights about what I'm doing wrong? Thanks in advance. Regards, Celso
Re: Wildcard search not working if full word is queried
Celso You are very welcome and yes I should have mentioned that wildcard searches are not analyzed (which is a recurring theme). This also means that they are not downcased, so the search TEST* will probably not find anything either in your set up. Cheers François On Jul 1, 2011, at 5:16 AM, Celso Pinto wrote: Hi again, read (past tense) TFM :-) and: On wildcard and fuzzy searches, no text analysis is performed on the search word. Thanks a lot François! Regards, Celso On Fri, Jul 1, 2011 at 10:02 AM, Celso Pinto cpi...@yimports.com wrote: Hi François, it is indeed being stemmed, thanks a lot for the heads up. It appears that stemming is also configured for the query so it should work just the same, no? Thanks again. Regards, Celso 2011/6/30 François Schiettecatte fschietteca...@gmail.com: I would run that word through the analyzer, I suspect that the word 'teste' is being stemmed to 'test' in the index, at least that is the first place I would check. François On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote: Hi everyone, I'm having some trouble figuring out why a query with an exact word followed by the * wildcard, eg. teste*, returns no results while a query for test* returns results that have the word teste in them. I've created a couple of pasties: Exact word with wildcard : http://pastebin.com/n9SMNsH0 Similar word: http://pastebin.com/jQ56Ww6b Parameters other than title, description and content have no effect other than filtering out unwanted results. In a two of the four results, the title has the complete word teste. On the other two, the word appears in the other fields. Does anyone have any insights about what I'm doing wrong? Thanks in advance. Regards, Celso
Re: Wildcard search not working if full word is queried
I would run that word through the analyzer, I suspect that the word 'teste' is being stemmed to 'test' in the index, at least that is the first place I would check. François On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote: Hi everyone, I'm having some trouble figuring out why a query with an exact word followed by the * wildcard, eg. teste*, returns no results while a query for test* returns results that have the word teste in them. I've created a couple of pasties: Exact word with wildcard : http://pastebin.com/n9SMNsH0 Similar word: http://pastebin.com/jQ56Ww6b Parameters other than title, description and content have no effect other than filtering out unwanted results. In a two of the four results, the title has the complete word teste. On the other two, the word appears in the other fields. Does anyone have any insights about what I'm doing wrong? Thanks in advance. Regards, Celso
Re: wildcard search
Hi Ahmet, so I created a fake license ComplexPhrase-LICENSE-MIT.txt for ComplexPhrase and tried again, which ran through successfully, I hope this is OK. I didn't used it with solr 3.2. I will check about it. So your GOK field already contains the list as multivalued. Then you can use prefix query parser plugin for this. Just make sure that field type of GOK is string not text. q={!prefix f=GOK}IA 3 should be equivalent to {!complexphrase}GOK:IA 3* I'll try that. But my search requests come from a pazpar2 system and are directed against different clients, which all get requests of the form GOK:IA 32*, so in some sense this is better for me. I found two problems: – In the solr 1.4.2 version I'm testing the request IA 32* works, but GOK:IA 32* will not. Is this somehow related to the indexing of that field? – The other is that IA320 (on 1.4.2) and GOK:IA320 (on 3.2) will throw an exception: description The server encountered an internal error (Unknown query type org.apache.lucene.search.PhraseQuery found in phrase query string IA620 java.lang.IllegalArgumentException: Unknown query type org.apache.lucene.search.PhraseQuery found in phrase query string IA620 at org.apache.lucene.queryParser.ComplexPhraseQueryParser Cheers Thomas
Re: wildcard search
Hi Ahmet, I don't use it myself (but I will soon), so I may be wrong, but did you try to use the ComplexPhraseQueryParser : ComplexPhraseQueryParser QueryParser which permits complex phrase query syntax eg (john jon jonathan~) peters*. It seems that you could do such type of queries : GOK:IA 38* yes that sounds interesting. But I don't know how to get and install it into solr. Cam you give me a hint? https://issues.apache.org/jira/browse/SOLR-1604 I tried to follow this recipe, adapting it to the solr 3.2 I am testing right now. The first try gave me a message [java] !!! Couldn't get license file for /Installer/solr/apache-solr-3.2.0/solr/lib/ComplexPhrase-1.0.jar [java] At least one file does not have a license, or it's license name is not in the proper format. See the logs. BUILD FAILED so I created a fake license ComplexPhrase-LICENSE-MIT.txt for ComplexPhrase and tried again, which ran through successfully, I hope this is OK. I registered queryparser not to solrhome/conf/solrconfig.xml (no such thing, I'm running multiple cores) but to solrhome/cores/lit/conf/solrconfig.xml and could search successfully for {!complexphrase}GOK:IC 62* But it seems that you can achieve what you want with vanilla solr. I don't follow the multivalued part in your example but you can tokenize IA 300; IC 330; IA 317; IA 318 into these 4 tokens IA 300 IC 330 IA 314 IA 318 I didn't have to split them up, they are already separated as field with multiValued=true. But I need to be able to search for IA 310 - IA 319 with one call, {!complexphrase}GOK:IA 31? will do this now, or even for {!complexphrase}GOK:IA 3* to catch all those in one go. Thanks, this helped a lot Thomas
Re: wildcard search
I tried to follow this recipe, adapting it to the solr 3.2 I am testing right now. The first try gave me a message [java] !!! Couldn't get license file for /Installer/solr/apache-solr-3.2.0/solr/lib/ComplexPhrase-1.0.jar [java] At least one file does not have a license, or it's license name is not in the proper format. See the logs. BUILD FAILED so I created a fake license ComplexPhrase-LICENSE-MIT.txt for ComplexPhrase and tried again, which ran through successfully, I hope this is OK. I didn't used it with solr 3.2. I will check about it. IA 300 IC 330 IA 314 IA 318 I didn't have to split them up, they are already separated as field with multiValued=true. But I need to be able to search for IA 310 - IA 319 with one call, {!complexphrase}GOK:IA 31? will do this now, or even for {!complexphrase}GOK:IA 3* to catch all those in one go. So your GOK field already contains the list as multivalued. Then you can use prefix query parser plugin for this. Just make sure that field type of GOK is string not text. q={!prefix f=GOK}IA 3 should be equivalent to {!complexphrase}GOK:IA 3*
Re: wildcard search
Hi Erick, I have a multivalued field GOK (local classification scheme) with separate entries of the sort IA 300; IC 330; IA 317; IA 318, i.e. 1 to 3 capital characters, space, 3 digits. I want to be able to perform a truncated search on that field: either just the string before the space, or a combination of that string with 1 or 2 digits, something like: GOK:IA or GOK:IA 3* or GOK:IA 31? My problem is the clash between the phrase (GOK:IA 317 works) and the wildcards. As a start I tried as type fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true from the solr 3.2 distribution schema (apache-solr-3.2.0/example/solr/conf/schema.xml), the field is just field name=GOK type=text multiValued=true/ BTW, I have another field DDC with entries of the form t1:086643 with analogous requirements which yields similar problems due to the colon, also indexed as text. Here also DDC:T1\:086643 works, but not DDC:T1\:08664? Thanks in advance Thomas Yes there is, but you haven't provided enough information to make a suggestion. What isthe fieldType definition? What is the field definition? Two resources that'll help you greatly are: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the admin/analysis page... Best Erick On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer fischer...@aon.at wrote: Hello, I am testing solr 3.2 and have problems with wildcards. I am indexing values like IA 300; IC 330; IA 317; IA 318 in a field GOK, and can't find a way to search with wildcards. I want to use a wild card search to match something like IA 31? but cannot find a way to do so. GOK:IA\ 38* doesn't work with the contents of GOK indexed as text. Is there a way to index and search that would meet my requirements? Thomas Mit freundlichen Grüßen Thomas Fischer
Re: wildcard search
Hmmm, have you tried EdgeNGrams? This works for me (at the expense of a somewhat larger index, of course)... fieldType name=edge class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=4 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer and a field of type edge named thomasfield Now searches like thomasfield:GOK IA 3 (include quotes!) should work. The various parameters (min/max gram size) I chose arbitrarily, you'll want to tweak them. I include a lowercasefilter for safety's sake if people are actually going to type things in... It's probably instructive to look at the admin/analysis page to see how this all plays out Best Erick On Wed, Jun 8, 2011 at 9:29 AM, Thomas Fischer fischer...@aon.at wrote: Hi Erick, I have a multivalued field GOK (local classification scheme) with separate entries of the sort IA 300; IC 330; IA 317; IA 318, i.e. 1 to 3 capital characters, space, 3 digits. I want to be able to perform a truncated search on that field: either just the string before the space, or a combination of that string with 1 or 2 digits, something like: GOK:IA or GOK:IA 3* or GOK:IA 31? My problem is the clash between the phrase (GOK:IA 317 works) and the wildcards. As a start I tried as type fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true from the solr 3.2 distribution schema (apache-solr-3.2.0/example/solr/conf/schema.xml), the field is just field name=GOK type=text multiValued=true/ BTW, I have another field DDC with entries of the form t1:086643 with analogous requirements which yields similar problems due to the colon, also indexed as text. Here also DDC:T1\:086643 works, but not DDC:T1\:08664? Thanks in advance Thomas Yes there is, but you haven't provided enough information to make a suggestion. What isthe fieldType definition? What is the field definition? Two resources that'll help you greatly are: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the admin/analysis page... Best Erick On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer fischer...@aon.at wrote: Hello, I am testing solr 3.2 and have problems with wildcards. I am indexing values like IA 300; IC 330; IA 317; IA 318 in a field GOK, and can't find a way to search with wildcards. I want to use a wild card search to match something like IA 31? but cannot find a way to do so. GOK:IA\ 38* doesn't work with the contents of GOK indexed as text. Is there a way to index and search that would meet my requirements? Thomas Mit freundlichen Grüßen Thomas Fischer
Re: wildcard search
Hi Thomas, I don't use it myself (but I will soon), so I may be wrong, but did you try to use the ComplexPhraseQueryParser : ComplexPhraseQueryParser QueryParser which permits complex phrase query syntax eg (john jon jonathan~) peters*. It seems that you could do such type of queries : GOK:IA 38* Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/memory-leak-during-undeploying-tp2620093p3039561.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: wildcard search
Hi Ludovic, I don't use it myself (but I will soon), so I may be wrong, but did you try to use the ComplexPhraseQueryParser : ComplexPhraseQueryParser QueryParser which permits complex phrase query syntax eg (john jon jonathan~) peters*. It seems that you could do such type of queries : GOK:IA 38* yes that sounds interesting. But I don't know how to get and install it into solr. Cam you give me a hint? Thanks Thomas
Re: wildcard search
I don't use it myself (but I will soon), so I may be wrong, but did you try to use the ComplexPhraseQueryParser : ComplexPhraseQueryParser QueryParser which permits complex phrase query syntax eg (john jon jonathan~) peters*. It seems that you could do such type of queries : GOK:IA 38* yes that sounds interesting. But I don't know how to get and install it into solr. Cam you give me a hint? https://issues.apache.org/jira/browse/SOLR-1604 But it seems that you can achieve what you want with vanilla solr. I don't follow the multivalued part in your example but you can tokenize IA 300; IC 330; IA 317; IA 318 into these 4 tokens IA 300 IC 330 IA 314 IA 318 Using Pattern Tokenizer Factory. And you can use PrefixQParserPlugin for searching. http://lucene.apache.org/solr/api/org/apache/solr/search/PrefixQParserPlugin.html
Re: wildcard search
Yes there is, but you haven't provided enough information to make a suggestion. What isthe fieldType definition? What is the field definition? Two resources that'll help you greatly are: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the admin/analysis page... Best Erick On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer fischer...@aon.at wrote: Hello, I am testing solr 3.2 and have problems with wildcards. I am indexing values like IA 300; IC 330; IA 317; IA 318 in a field GOK, and can't find a way to search with wildcards. I want to use a wild card search to match something like IA 31? but cannot find a way to do so. GOK:IA\ 38* doesn't work with the contents of GOK indexed as text. Is there a way to index and search that would meet my requirements? Thomas
Re: wildcard search inconsistencies
'conditional' seems to be stemmed into the word 'condit' in the index. So your results are normal. As you said, mixing wildcards searching and stemmed fields is not recommanded. Ludovic. 2011/4/1 Melanie Drake [via Lucene] ml-node+2763787-65059921-383...@n3.nabble.com I noticed an inconsistency in results when performing wildcard searches. When searching on variations of conditional the following results occurred: conditional - hits conditional* - hits conditi* - hits condit* - hits con*al - no hits c?nditional - no hits c*ld - hits (on a different word: child) I don't see an obvious pattern to when the wildcard searches work. In a response to another post, I read that stemming will cause wildcard searches to behave strangely. I believe we may be using stemming, although the only configuration I see is the list of words protected against stemming defined in protwords.txt. Also, I'm not sure if it's helpful, but I see a vague Solr error in my server log (jboss) any time I perform a search (whether successful or not): ERROR [STDERR] timestamp org.apache.solr.core.SolrCore execute INFO: [core0] webapp=null path=/select params={q=condi*alfq=url%3A%28%22%2Flong list of application-specific IDs used for filteringfl=scorehl=truehl.fragsize=50hl.snippets=3} hits=0 status=0 QTime=0 The developer who implemented our search solution is no longer with our company, so I'm just looking for any information useful to investigate this issue. I apologize if I ommitted any necessary information. Thanks! -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763787.html To start a new topic under Solr - User, email ml-node+472068-1765922688-383...@n3.nabble.com To unsubscribe from Solr - User, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763841.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: wildcard search inconsistencies
And to be more helpfull, you can activate the debug (debugQuery=on in the query) mode to see the transform query : for instance 'field:contitional' : field:conditional field:conditional field:condit field:condit for 'field:conditional*' : field:conditional* field:conditional* field:conditional* field:conditional* and for 'field:con*al' : field:con*al field:con*al field:con*al field:con*al but in the field index the word 'conditional' is stored as 'condit' and is not matched by 'con*al'. but the words 'conceal' stored as is, 'congealable' stored as 'congeal' are matched and retrieved (and highlighted if well configured). Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763918.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: wildcard search inconsistencies
Thanks, Ludovic. That was it. I added the word conditional to the protected words file and I no longer see the odd search results when using wildcards. I will try to disable stemming altogether. Thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763934.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search in phrase query using spanquery
I used eclipse-jee-galileo-SR2-win32 to build the ant and selected dist-war for execution in build. I got the following message. Buildfile: D:\apache-solr-1.4.0\build.xml init-forrest-entities: compile-solrj: compile: [javac] Compiling 1 source file to D:\apache-solr-1.4.0\build\solr [javac] Note: D:\apache-solr-1.4.0\src\java\org\apache\solr\search\DocSetHitCollector.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. make-manifest: [exec] Execute failed: java.io.IOException: Cannot run program svnversion: CreateProcess error=2, The system cannot find the file specified dist-jar: [jar] Building jar: D:\apache-solr-1.4.0\dist\apache-solr-core-1.4.1-dev.jar dist-solrj: [jar] Building jar: D:\apache-solr-1.4.0\dist\apache-solr-solrj-1.4.1-dev.jar dist-war: [war] Building war: D:\apache-solr-1.4.0\dist\apache-solr-1.4.1-dev.war BUILD SUCCESSFUL Total time: 5 second The solr performed as usual and when I tried adding defType=complexphrase to search url the previous error showed up again. I performed this task to fresh solr-1.4.0. There was error while executing make-manifest, did it have anything to do with the error or am I missing something else? -- View this message in context: http://n3.nabble.com/Wildcard-search-in-phrase-query-using-spanquery-tp729275p739475.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search in phrase query using spanquery
I used eclipse-jee-galileo-SR2-win32 to build the ant and selected dist-war for execution in build. I got the following message. I use command prompt to invoke ant so I am not sure about this. The solr performed as usual and when I tried adding defType=complexphrase to search url the previous error showed up again. I performed this task to fresh solr-1.4.0. There was error while executing make-manifest, did it have anything to do with the error or am I missing something else? Are you running solr using java -jar start.jar? If yes you need to re-name apache-solr-1.4.0\dist\apache-solr-1.4.1-dev.war to solr.war and put it under apache-solr-1.4.0\example\webapps Also you may need to delete the folder under \apache-solr-1.4.0\example\work.
Re: Wildcard search in phrase query using spanquery
I used command line to build ant this time. Ahmet Arslan wrote: Are you running solr using java -jar start.jar? If yes you need to re-name apache-solr-1.4.0\dist\apache-solr-1.4.1-dev.war to solr.war and put it under apache-solr-1.4.0\example\webapps Also you may need to delete the folder under \apache-solr-1.4.0\example\work. Yes I ran solr using java -jar start.jar. I did the above mentioned tasks but the results were the same. I even tried by removing all the jar files of version 1.4.0 under apache-solr-1.4.0\dist\ but still I get the same result. -- View this message in context: http://n3.nabble.com/Wildcard-search-in-phrase-query-using-spanquery-tp729275p739666.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search in phrase query using spanquery
I used command line to build ant this time. Before calling 'ant dist' where did you copy the ComplexPhrase-1.0.jar? apache-solr-1.4.0\lib or apache-solr-1.4.0\example\lib? Yes I ran solr using java -jar start.jar. I did the above mentioned tasks but the results were the same. can you delete the \apache-solr-1.4.0\example\solr\lib\ComplexPhrase-1.0.jar and test again? If classnotfound exception comes then it means that your new solr.war does not contain it.
Re: Wildcard search in phrase query using spanquery
Ahmet Arslan wrote: Before calling 'ant dist' where did you copy the ComplexPhrase-1.0.jar? apache-solr-1.4.0\lib or apache-solr-1.4.0\example\lib? I tried it by placing ComplexPhrase-1.0.jar in apache-solr-1.4.0\lib ; apache-solr-1.4.0\example\lib ; and apache-solr-1.4.0\example\solr\lib with the same error HTTP ERROR: 500 tried to access field org.apache.lucene.queryParser.QueryParser.field from class org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery Ahmet Arslan wrote: can you delete the \apache-solr-1.4.0\example\solr\lib\ComplexPhrase-1.0.jar and test again? If classnotfound exception comes then it means that your new solr.war does not contain it. Yes I tried that as well but classnotfound exception did not arise. -- View this message in context: http://n3.nabble.com/Wildcard-search-in-phrase-query-using-spanquery-tp729275p739721.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search in phrase query using spanquery
I tried it by placing ComplexPhrase-1.0.jar in apache-solr-1.4.0\lib ; apache-solr-1.4.0\example\lib ; and apache-solr-1.4.0\example\solr\lib with the same error You need to copy it to only apache-solr-1.4.0\lib Maybe it is better to get a fresh copy of apache-solr-1.4.0.zip and continue step by step. copy ComplexPhrase-1.0.jar to only apache-solr-1.4.0\lib run 'ant clean dist' move \apache-solr-1.4.0\dist\apache-solr-1.4.1-dev.war to apache-solr-1.4.0\example\webapps\solr.war add the line queryParser name=complexphrase class=org.apache.solr.search.ComplexPhraseQParserPlugin / to apache-solr-1.4.0\example\solr\conf\solrconfig.xml java -jar start.jar java -jar post.jar *.xml http://localhost:8983/solr/select/?q=features%3A%22s*+c*%22version=2.2start=0rows=10indent=ondefType=complexphrasehl=truehl.fl=features should return 4 documents with highlighting.
Re: Wildcard search in phrase query using spanquery
Thanks it worked now. I think building clean ant is what made the difference. I'll work on this a bit more and give you feedbacks. -- View this message in context: http://n3.nabble.com/Wildcard-search-in-phrase-query-using-spanquery-tp729275p741988.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search in phrase query using spanquery
I tried that and got the following result. Do I have to do anything other than the mentioned instructions to make it work? HTTP ERROR: 500 tried to access field org.apache.lucene.queryParser.QueryParser.field from class org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery java.lang.IllegalAccessError: tried to access field org.apache.lucene.queryParser.QueryParser.field from class org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery at org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery.parsePhraseElements(ComplexPhraseQueryParser.java:216) at org.apache.lucene.queryParser.ComplexPhraseQueryParser.parse(ComplexPhraseQueryParser.java:114) at org.apache.solr.search.ComplexPhraseQParser.parse(ComplexPhraseQParserPlugin.java:82) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) RequestURI=/solr/select/ Powered by Jetty:// -- View this message in context: http://n3.nabble.com/Wildcard-search-in-phrase-query-using-spanquery-tp729275p731654.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search in phrase query using spanquery
I tried that and got the following result. Do I have to do anything other than the mentioned instructions to make it work? HTTP ERROR: 500 tried to access field org.apache.lucene.queryParser.QueryParser.field from class org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery java.lang.IllegalAccessError: tried to access field org.apache.lucene.queryParser.QueryParser.field from class org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery I have never noticed this since i am using this class with custom super class. Thanks for it. It seems that the reason: A class loaded by one classloader can't access package-visible members in a class loaded by a different classloader even if they are nominally in the same package. [1] [1]http://www.pubbs.net/grails/200911/37981/ I think easiest thing to do is to place ComplexPhrase-1.0.jar (created by mvn package) under apache-solr-1.4.0\lib directory. And create a new apache-solr-1.4.0\dist\apache-solr-1.4.1-dev.war by invoking ant dist . I tested this solution and it works as a workaround. It would be great if you give us feedback after using it.
Re: Wildcard search in phrase query using spanquery
I need to perform wildcard search in phrase query. I have 2 documents containing text how do impair and how to improve. I want to be able to search both documents by searching (how to im*). There is a provision in lucene which allows me to perform this operation using SpanWildcardQuery and keeping span length to 0. http://mail-archives.apache.org/mod_mbox//lucene-java-user/200707.mbox/%3c469df09f.9030...@gmail.com%3e I tried proximity search in solr but it didn't work with wildcard. Is there any other provision to perform wildcard search in phrase query? With https://issues.apache.org/jira/browse/SOLR-1604 you can use * operator inside phrases, e.g. how to im*
Re: Wildcard Search and Filter in Solr
hey thanks ravi , ahmed and Erik for your reply. though its tough to change my solr version , still let me try out at 1.4 and see. Erik Hatcher-4 wrote: Note that the query analyzer output is NOT doing query _parsing_, but rather taking the string you passed and running it through the query analyzer only. When using the default query parser, Inte* will be a search for terms that begin with inte. It is odd that you're not finding it. But you're using a pretty old version of Solr and quite likely something here has been fixed since. Give Solr 1.4 a try. Erik On Jan 27, 2010, at 12:56 AM, ashokcz wrote: Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel Query Analyzer Inte* Inte* inte* inte inte inte int I think somewhere my configuration or my definition of the type text is wrong. This is my configuration . fieldType class=solr.TextField name=text analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=synonyms.txt/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I think i am missing some basic configuration for doing wildcard searches . but could not figure it out . can someone help please Ahmet Arslan wrote: Hi , I m trying to use wildcard keywords in my search term and filter term . but i didnt get any results. Searched a lot but could not find any lead . Can someone help me in this. i m using solr 1.2.0 and have few records indexed with vendorName value as Intel In solr admin interface i m trying to do the search like this http://localhost:8983/solr/select?indent=onversion=2.2q=intelstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= and i m getting the result properly but when i use q=inte* no records are returned. the same is the case for Filter Query on using fq=VendorName:Intel i get my results. but on using fq=VendorName:Inte* no results are returned. I can guess i doing mistake in few obvious things , but could not figure it out .. Can someone pls help me out :) :) If q=intel returns documents while q=inte* does not, it means that fieldType of your defaultSearchField is reducing the token intel into something. Can you find out it by using /admin/anaysis.jsp what happens to Intel intel at index and query time? What is your defaultSearchField? Is it VendorName? It is expected that fq=VendorName:Intel returns results while fq=VendorName:Inte* does not. Because prefix queries are not analyzed. But it is strange that q=inte* does not return anything. Maybe your index analyzer is reducing Intel into int or ıntel? I am not 100% sure but solr 1.2.0 may use default locale in lowercase operation. What is your default locale? It is better to see what happens word Intel using analysis.jsp page. -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27405151.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard Search and Filter in Solr
Ashok: May be this will help: http://gravi2.blogspot.com/2009/05/solr-wildcards-and-omitnorms.html ~Ravi On Tue, Jan 26, 2010 at 9:56 PM, ashokcz ashokkumar.gane...@tcs.com wrote: Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel Query Analyzer Inte* Inte* inte* inte inte inte int I think somewhere my configuration or my definition of the type text is wrong. This is my configuration . fieldType class=solr.TextField name=text analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=synonyms.txt/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I think i am missing some basic configuration for doing wildcard searches . but could not figure it out . can someone help please Ahmet Arslan wrote: Hi , I m trying to use wildcard keywords in my search term and filter term . but i didnt get any results. Searched a lot but could not find any lead . Can someone help me in this. i m using solr 1.2.0 and have few records indexed with vendorName value as Intel In solr admin interface i m trying to do the search like this http://localhost:8983/solr/select?indent=onversion=2.2q=intelstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= and i m getting the result properly but when i use q=inte* no records are returned. the same is the case for Filter Query on using fq=VendorName:Intel i get my results. but on using fq=VendorName:Inte* no results are returned. I can guess i doing mistake in few obvious things , but could not figure it out .. Can someone pls help me out :) :) If q=intel returns documents while q=inte* does not, it means that fieldType of your defaultSearchField is reducing the token intel into something. Can you find out it by using /admin/anaysis.jsp what happens to Intel intel at index and query time? What is your defaultSearchField? Is it VendorName? It is expected that fq=VendorName:Intel returns results while fq=VendorName:Inte* does not. Because prefix queries are not analyzed. But it is strange that q=inte* does not return anything. Maybe your index analyzer is reducing Intel into int or ıntel? I am not 100% sure but solr 1.2.0 may use default locale in lowercase operation. What is your default locale? It is better to see what happens word Intel using analysis.jsp page. -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard Search and Filter in Solr
Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel If the resultant token is intel, then q=inte* should return documents. What says when you add debugQuery=on to your search url? And why are you using an old version of solr?
Re: Wildcard Search and Filter in Solr
Note that the query analyzer output is NOT doing query _parsing_, but rather taking the string you passed and running it through the query analyzer only. When using the default query parser, Inte* will be a search for terms that begin with inte. It is odd that you're not finding it. But you're using a pretty old version of Solr and quite likely something here has been fixed since. Give Solr 1.4 a try. Erik On Jan 27, 2010, at 12:56 AM, ashokcz wrote: Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel Query Analyzer Inte* Inte* inte* inte inte inte int I think somewhere my configuration or my definition of the type text is wrong. This is my configuration . fieldType class=solr.TextField name=text analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=synonyms.txt/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I think i am missing some basic configuration for doing wildcard searches . but could not figure it out . can someone help please Ahmet Arslan wrote: Hi , I m trying to use wildcard keywords in my search term and filter term . but i didnt get any results. Searched a lot but could not find any lead . Can someone help me in this. i m using solr 1.2.0 and have few records indexed with vendorName value as Intel In solr admin interface i m trying to do the search like this http://localhost:8983/solr/select?indent=onversion=2.2q=intelstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= and i m getting the result properly but when i use q=inte* no records are returned. the same is the case for Filter Query on using fq=VendorName:Intel i get my results. but on using fq=VendorName:Inte* no results are returned. I can guess i doing mistake in few obvious things , but could not figure it out .. Can someone pls help me out :) :) If q=intel returns documents while q=inte* does not, it means that fieldType of your defaultSearchField is reducing the token intel into something. Can you find out it by using /admin/anaysis.jsp what happens to Intel intel at index and query time? What is your defaultSearchField? Is it VendorName? It is expected that fq=VendorName:Intel returns results while fq=VendorName:Inte* does not. Because prefix queries are not analyzed. But it is strange that q=inte* does not return anything. Maybe your index analyzer is reducing Intel into int or ıntel? I am not 100% sure but solr 1.2.0 may use default locale in lowercase operation. What is your default locale? It is better to see what happens word Intel using analysis.jsp page. -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard Search and Filter in Solr
Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel Query Analyzer Inte* Inte* inte* inte inte inte int I think somewhere my configuration or my definition of the type text is wrong. This is my configuration . fieldType class=solr.TextField name=text analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=synonyms.txt/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I think i am missing some basic configuration for doing wildcard searches . but could not figure it out . can someone help please Ahmet Arslan wrote: Hi , I m trying to use wildcard keywords in my search term and filter term . but i didnt get any results. Searched a lot but could not find any lead . Can someone help me in this. i m using solr 1.2.0 and have few records indexed with vendorName value as Intel In solr admin interface i m trying to do the search like this http://localhost:8983/solr/select?indent=onversion=2.2q=intelstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= and i m getting the result properly but when i use q=inte* no records are returned. the same is the case for Filter Query on using fq=VendorName:Intel i get my results. but on using fq=VendorName:Inte* no results are returned. I can guess i doing mistake in few obvious things , but could not figure it out .. Can someone pls help me out :) :) If q=intel returns documents while q=inte* does not, it means that fieldType of your defaultSearchField is reducing the token intel into something. Can you find out it by using /admin/anaysis.jsp what happens to Intel intel at index and query time? What is your defaultSearchField? Is it VendorName? It is expected that fq=VendorName:Intel returns results while fq=VendorName:Inte* does not. Because prefix queries are not analyzed. But it is strange that q=inte* does not return anything. Maybe your index analyzer is reducing Intel into int or ıntel? I am not 100% sure but solr 1.2.0 may use default locale in lowercase operation. What is your default locale? It is better to see what happens word Intel using analysis.jsp page. -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard Search and Filter in Solr
Hi , I m trying to use wildcard keywords in my search term and filter term . but i didnt get any results. Searched a lot but could not find any lead . Can someone help me in this. i m using solr 1.2.0 and have few records indexed with vendorName value as Intel In solr admin interface i m trying to do the search like this http://localhost:8983/solr/select?indent=onversion=2.2q=intelstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= and i m getting the result properly but when i use q=inte* no records are returned. the same is the case for Filter Query on using fq=VendorName:Intel i get my results. but on using fq=VendorName:Inte* no results are returned. I can guess i doing mistake in few obvious things , but could not figure it out .. Can someone pls help me out :) :) If q=intel returns documents while q=inte* does not, it means that fieldType of your defaultSearchField is reducing the token intel into something. Can you find out it by using /admin/anaysis.jsp what happens to Intel intel at index and query time? What is your defaultSearchField? Is it VendorName? It is expected that fq=VendorName:Intel returns results while fq=VendorName:Inte* does not. Because prefix queries are not analyzed. But it is strange that q=inte* does not return anything. Maybe your index analyzer is reducing Intel into int or ıntel? I am not 100% sure but solr 1.2.0 may use default locale in lowercase operation. What is your default locale? It is better to see what happens word Intel using analysis.jsp page.
Re: wildcard search and hierarchical faceting
There are some approaches outlined here that might be of interest: http://wiki.apache.org/solr/HierarchicalFaceting On Jan 24, 2010, at 2:54 AM, Andy wrote: I'd like to provide a hierarchical faceting functionality. An example would be location drill down such as USA - New York - New York City - SoHo The number of levels can be arbitrary. One way to handle this could be to use a special character as separator, store values such as USA|New York|New York City|SoHo and use wildcard search. So if USA has been selected, the fq would be USA* I read somewhere that when using wildcard search, no stemming or tokenization will be performed. So USA will not match 'usa. Is there any way to work around that? Or would you recommend a different way to handle hierarchical faceting?
Re: wildcard search is not working
Go through this thread first - http://markmail.org/message/bannl2fpblt5sqlw If it still does not help, post back your field type definition in schema.xml Cheers Avlesh On Thu, Aug 6, 2009 at 3:46 PM, Radha C. cra...@ceiindia.com wrote: Hi, I have documents contain word healthcare articles. I need to match the healthcare artcles documents for the query strings helath, articles... I tried q=health*, q=helath*, q=heath*articles but everything returns empty result. When I try q=healthcare artilces ,the search returns proper documents. Can anyone tell me what is the wrong with my query string?
Re: Wildcard Search
Are you by any chance stemming the field when you index? Erick On Fri, May 8, 2009 at 2:29 AM, dabboo ag...@sapient.com wrote: Hi, I am facing a n wierd issue while searching. I am searching for word *sytem*, it displays all the records which contains system, systems etc. But when I tried to search *systems*, it only returns me those records, which have systems-, systems/ etc etc. It is considering wildcard as 1 or more character and not zero character. So, it is not returning records which has systems has one word. Is there any way to resolve this. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-Search-tp23440795p23440795.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard Search
Yes, thats correct. I have applied EnglishPorterFilterFactory at index time as well. Do you think, I should remove it and do the indexing again. Erick Erickson wrote: Are you by any chance stemming the field when you index? Erick On Fri, May 8, 2009 at 2:29 AM, dabboo ag...@sapient.com wrote: Hi, I am facing a n wierd issue while searching. I am searching for word *sytem*, it displays all the records which contains system, systems etc. But when I tried to search *systems*, it only returns me those records, which have systems-, systems/ etc etc. It is considering wildcard as 1 or more character and not zero character. So, it is not returning records which has systems has one word. Is there any way to resolve this. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-Search-tp23440795p23440795.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Wildcard-Search-tp23440795p23445966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard Search
My *guess* is that what you're seeing is that wildcard searches are not analyzed, in this case not run through the stemmer. So your index only contains system and the funky variants (e.g. systems/). I don't really understand why you'd get systems/ in your index, but I'm assuming that your filter chain doesn't remove things like slashes. So, you have system and systems/ in your index, but not systems due to stemming, so searching for systems* translates into systems OR systems/ OR and since no documents have systems, you don't get them as hits. All that said, you need to revisit your indexing parameters to make what happens fit your expectations. I'd advise getting a copy of Luke and pointing it at your index in order to see what *really* gets put in it. Best Erick You need to either introduce filters that remove odd stuff like slashes On Fri, May 8, 2009 at 9:25 AM, dabboo ag...@sapient.com wrote: Yes, thats correct. I have applied EnglishPorterFilterFactory at index time as well. Do you think, I should remove it and do the indexing again. Erick Erickson wrote: Are you by any chance stemming the field when you index? Erick On Fri, May 8, 2009 at 2:29 AM, dabboo ag...@sapient.com wrote: Hi, I am facing a n wierd issue while searching. I am searching for word *sytem*, it displays all the records which contains system, systems etc. But when I tried to search *systems*, it only returns me those records, which have systems-, systems/ etc etc. It is considering wildcard as 1 or more character and not zero character. So, it is not returning records which has systems has one word. Is there any way to resolve this. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-Search-tp23440795p23440795.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Wildcard-Search-tp23440795p23445966.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Wildcard search with q query
Hi Amit, Leading wildcard searches are not allowing in Solr Out-of-the-box. There are other techniques to overcome this handicap. Search for Leading Wildcards in the user mailing list archives will return the necessary mail threads which might be pretty useful to you. -Kumar -Original Message- From: dabboo [mailto:ag...@sapient.com] Sent: Tuesday, February 24, 2009 4:04 PM To: solr-user@lucene.apache.org Subject: Wildcard search with q query Hi, I am trying to perform wildcard search using q query. e.g. If I give tes* as the query, it works fine and returns the results as expected. But if I give *tes*, then it throws an exception saying that we cant have wildcards in front of any string. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-search-with-q-query-tp22179548p22179548.h tml Sent from the Solr - User mailing list archive at Nabble.com.
RE: wildcard search issue
Hi Mahendra, Wildcard searches are case-sensitive in Solr. I faced the same issue and handled converting the query string to lower case in my code itself. The filters and analyzers are not applicable for wildcard queries. FYI You will also face problem when you use keywords/Operators in your queries in the upper case. Queries like Jack AND Jill or containing words like OR, NOT, etc will throw errors from the SolrQueryParser. You will have to convert such strings to lower-case too. In this case, the exception is thrown during the parsing stage itself. Again, the filters and analyzers have no effect (since they never come into the picture here too) -Kumar -Original Message- From: mahendra mahendra [mailto:mahendra_featu...@yahoo.com] Sent: Friday, February 06, 2009 12:34 PM To: solr-user@lucene.apache.org Subject: wildcard search issue Hi, The case sensitive wild-card search is not working for TextField type. I have tried searching for UserName:cust*, it gave the results, but UserName:Cust* didn't give any results. How can I make it work.. I have defined my TextField in following way. fieldtype name=textTight class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype Any help would appreciate!! Thanks Regards, Mahendra
Re: Wildcard search question
Norberto Meijome wrote: ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters define your own type (or modify text / string... but I find that it gets confusing to have variations of text / string ...) to perform the operations on the content as needed. There are also other tokenizer/analysers available that *may* help in the partial searches (ngram , edgengram ), but there isn't much documentation on them yet (that I could find) - I am only getting into them myself i'll see how it goes.. thanks, that got me on the right track. i came up with this: fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType now searching for user_name:bobby* works as i wanted. my next question: is there a way that i can score matches that are at the start of the string higher than matches in the middle? for example, if i search for steve, i get kelly stevenson before steve jobs. i'd like steve jobs to come first. -jsd-
Re: Wildcard search question
Jon, You provided a lot of nice details, thanks for helping us help you :) The one missing piece is the definition of the text field type. In Solr's _example_ schema, bobby gets analyzed (stemmed) to bobbi[1]. When you query for bobby*, the query parser is not running an analyzer on the wildcard query, thus literally searching for terms that begin with bobby[2]. As for steve , same story, but it analyzes to steve, which is found with a steve* query. Erik [1] I used Solr's admin/analysis.jsp to double-check the text field type behavior. [2] I http://localhost:8983/solr/select/?q=bobby*debugQuery=true and see the parsed query in the debug output as text:bobby* On Jun 23, 2008, at 4:13 PM, Jon Drukman wrote: When I search with q=bobby I get the following record: doc date name=date2008-06-23T07:06:40Z/date str name=descriptionhttp://farm1.static.flickr.com/117/.../ str int name=id9/int str name=nameBobby Gaza/str str name=user[EMAIL PROTECTED]/str /doc When I search with bobby* I get nothing. When I search with steve* I get Steve Ballmer and Steve Jobs... What's going on? The relevant part of my schema.xml is: fields field name=id type=integer indexed=true stored=true required=true / field name=user_id type=integer indexed=true stored=true / field name=name type=text indexed=true stored=true/ field name=description type=text indexed=true stored=true/ field name=tags type=text indexed=true stored=true/ field name=user type=text indexed=true stored=true/ field name=date type=date indexed=true stored=true/ field name=type type=string indexed=true stored=false/ field name=type_id type=string indexed=true stored=false/ field name=thumb_url type=string indexed=true stored=true/ /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeytype_id/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldname/defaultSearchField
Re: Wildcard search question
Erik Hatcher wrote: Jon, You provided a lot of nice details, thanks for helping us help you :) The one missing piece is the definition of the text field type. In Solr's _example_ schema, bobby gets analyzed (stemmed) to bobbi[1]. When you query for bobby*, the query parser is not running an analyzer on the wildcard query, thus literally searching for terms that begin with bobby[2]. As for steve , same story, but it analyzes to steve, which is found with a steve* query. so, what's the solution? if i change the field to string, will it be able to find bobby* ? eventually it would be nice to be able to use fuzzy matching, to find 'jon' from 'john', for example. thanks -jsd-
Re: Wildcard search question
On Jun 23, 2008, at 4:45 PM, Jon Drukman wrote: Erik Hatcher wrote: Jon, You provided a lot of nice details, thanks for helping us help you :) The one missing piece is the definition of the text field type. In Solr's _example_ schema, bobby gets analyzed (stemmed) to bobbi[1]. When you query for bobby*, the query parser is not running an analyzer on the wildcard query, thus literally searching for terms that begin with bobby[2]. As for steve , same story, but it analyzes to steve, which is found with a steve* query. so, what's the solution? it depends(tm) ;) if i change the field to string, will it be able to find bobby* ? No, because the original data is str name=nameBobby Gaza/str, so Bobby* would match, but not bobby*. string type (in the example schema, to be clear) does effectively no analysis, leaving the original string indexed as-is, case and all. eventually it would be nice to be able to use fuzzy matching, to find 'jon' from 'john', for example. you could search for john~ to do that. or bobby~ would match bobbi. stemming and wildcard term queries aren't quite compatible, as you've found, but it does depend on how much of the prefix is provided. bob* matches bobbi, for example. Erik
Re: Wildcard search question
Erik Hatcher wrote: No, because the original data is str name=nameBobby Gaza/str, so Bobby* would match, but not bobby*. string type (in the example schema, to be clear) does effectively no analysis, leaving the original string indexed as-is, case and all. [...] stemming and wildcard term queries aren't quite compatible, as you've found, but it does depend on how much of the prefix is provided. bob* matches bobbi, for example. ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options? -jsd-
Re: Wildcard search question
On Mon, 23 Jun 2008 14:23:14 -0700 Jon Drukman [EMAIL PROTECTED] wrote: ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options? Jon, read http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters define your own type (or modify text / string... but I find that it gets confusing to have variations of text / string ...) to perform the operations on the content as needed. There are also other tokenizer/analysers available that *may* help in the partial searches (ngram , edgengram ), but there isn't much documentation on them yet (that I could find) - I am only getting into them myself i'll see how it goes.. B _ {Beto|Norberto|Numard} Meijome There are no stupid questions, but there are a LOT of inquisitive idiots. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.