RE: search with wildcard
I know it's documented that Lucene/Solr doesn't apply filters to queries with wildcards, but this seems to trip up a lot of users. I can also see why wildcards break a number of filters, but a number of filters (e.g. mapping charsets) could mostly or entirely work. The N-gram filter is another one that would be great to still run when there wildcards. If you indexed 4-grams and the query is a *testp*, you currently won't get any results; but the N-gram filter could have a wildcard mode that, in this case, would return just the first 4-gram as a token. Is this something you've considered? It would have to be enabled in the core network, but disabled by default for existing filters; then it could be enabled 1-by-1 for existing filters. Apologies if the dev list is a better place for this. Scott -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, November 21, 2013 8:40 AM To: solr-user@lucene.apache.org Subject: Re: search with wildcard Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan = supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do you want to search inside of words? filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=4/ On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch wrote: I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
Re: search with wildcard
Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan = supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do you want to search inside of words? filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=4/ On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch wrote: I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
RE: search with wildcard
I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
Re: search with wildcard
You might be able to make use of the dictionary compound word filter, but you will have to build up a dictionary of words to use: http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html My e-book has some examples and a better description. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Thursday, November 21, 2013 11:40 AM To: solr-user@lucene.apache.org Subject: Re: search with wildcard Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan = supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do you want to search inside of words? filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=4/ On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch wrote: I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words. fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- /analyzer /fieldType
Re: Search Phrase Wildcard?
Yes...!! you can search for phrases with wild cards. You dont have a direct support for it.. but u can achieve like the following... User input: Solr we Query should be: (name:Solr AND (name:we* OR name:we)) OR name:Solr we The query builder parses the original input and builds one that simulates a wildcard phrase query. It looks for all the words the user entered and adds a wildcard (*) to the last word. It also searches for the whole phrase the user entered using a phrase query in case the whole phrase is found in the index. This should work! let me know if you have any issues... -- View this message in context: http://www.nabble.com/Search-Phrase-Wildcard--tp23978330p23996409.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search Phrase Wildcard?
Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
Infact, Lucene does not support that. Lucene supports single and multiple character wildcard searches within single terms (*not within phrase queries*). Taken from http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches Cheers Avlesh On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby aleksander.sten...@integrasco.no wrote: Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
Well yes:) Since Solr do infact support the entire lucene query parser syntax:) - Aleks On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh avl...@gmail.com wrote: Infact, Lucene does not support that. Lucene supports single and multiple character wildcard searches within single terms (*not within phrase queries*). Taken from http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches Cheers Avlesh On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby aleksander.sten...@integrasco.no wrote: Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
You might be interested in this Lucene issue: https://issues.apache.org/jira/browse/LUCENE-1486 Aleksander M. Stensby wrote: Well yes:) Since Solr do infact support the entire lucene query parser syntax:) - Aleks On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh avl...@gmail.com wrote: Infact, Lucene does not support that. Lucene supports single and multiple character wildcard searches within single terms (*not within phrase queries*). Taken from http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches Cheers Avlesh On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby aleksander.sten...@integrasco.no wrote: Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- - Mark http://www.lucidimagination.com