RE: search with wildcard
I know it's documented that Lucene/Solr doesn't apply filters to queries with wildcards, but this seems to trip up a lot of users. I can also see why wildcards break a number of filters, but a number of filters (e.g. mapping charsets) could mostly or entirely work. The N-gram filter is another one that would be great to still run when there wildcards. If you indexed 4-grams and the query is a "*testp*", you currently won't get any results; but the N-gram filter could have a wildcard mode that, in this case, would return just the first 4-gram as a token. Is this something you've considered? It would have to be enabled in the core network, but disabled by default for existing filters; then it could be enabled 1-by-1 for existing filters. Apologies if the dev list is a better place for this. Scott > -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: Thursday, November 21, 2013 8:40 AM > To: solr-user@lucene.apache.org > Subject: Re: search with wildcard > > Hi Adnreas, > > If you don't want to use wildcards at query time, alternative way is to > use NGrams at indexing time. This will produce a lot of tokens. e.g. > For example 4grams of your example : Supertestplan => supe uper pert > erte rtes *test* estp stpl tpla plan > > > Is that you want? By the way why do you want to search inside of words? > > maxGramSize="4"/> > > > > > On Thursday, November 21, 2013 5:23 PM, Andreas Owen > wrote: > > I suppose i have to create another field with diffenet tokenizers and > set > the boost very low so it doesn't really mess with my ranking because > there > the word is now in 2 fields. What kind of tokenizer can do the job? > > > > From: Andreas Owen [mailto:a...@conx.ch] > Sent: Donnerstag, 21. November 2013 16:13 > To: solr-user@lucene.apache.org > Subject: search with wildcard > > > > I am querying "test" in solr 4.3.1 over the field below and it's not > finding > all occurences. It seems that if it is a substring of a word like > "Supertestplan" it isn't found unless I use a wildcards "*test*". This > is > write because of my tokenizer but does someone know a way around this? > I > don't want to add wildcards because that messes up queries with > multiple > words. > > > > positionIncrementGap="100"> > > > > > > > > > > words="lang/stopwords_de.txt" format="snowball" > enablePositionIncrements="true"/> > > > > class="solr.SnowballPorterFilterFactory" language="German"/> > > > > > >
Re: search with wildcard
Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan => supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do you want to search inside of words? On Thursday, November 21, 2013 5:23 PM, Andreas Owen wrote: I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying "test" in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like "Supertestplan" it isn't found unless I use a wildcards "*test*". This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words.
RE: search with wildcard
I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying "test" in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like "Supertestplan" it isn't found unless I use a wildcards "*test*". This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words.
Re: search with wildcard
You might be able to make use of the dictionary compound word filter, but you will have to build up a dictionary of words to use: http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html My e-book has some examples and a better description. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Thursday, November 21, 2013 11:40 AM To: solr-user@lucene.apache.org Subject: Re: search with wildcard Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan => supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do you want to search inside of words? On Thursday, November 21, 2013 5:23 PM, Andreas Owen wrote: I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013 16:13 To: solr-user@lucene.apache.org Subject: search with wildcard I am querying "test" in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like "Supertestplan" it isn't found unless I use a wildcards "*test*". This is write because of my tokenizer but does someone know a way around this? I don't want to add wildcards because that messes up queries with multiple words.