RE: search with wildcard

2013-11-21 Thread Scott Schneider
I know it's documented that Lucene/Solr doesn't apply filters to queries with 
wildcards, but this seems to trip up a lot of users.  I can also see why 
wildcards break a number of filters, but a number of filters (e.g. mapping 
charsets) could mostly or entirely work.  The N-gram filter is another one that 
would be great to still run when there wildcards.  If you indexed 4-grams and 
the query is a "*testp*", you currently won't get any results; but the N-gram 
filter could have a wildcard mode that, in this case, would return just the 
first 4-gram as a token.

Is this something you've considered?  It would have to be enabled in the core 
network, but disabled by default for existing filters; then it could be enabled 
1-by-1 for existing filters.  Apologies if the dev list is a better place for 
this.

Scott


> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Thursday, November 21, 2013 8:40 AM
> To: solr-user@lucene.apache.org
> Subject: Re: search with wildcard
> 
> Hi Adnreas,
> 
> If you don't want to use wildcards at query time, alternative way is to
> use NGrams at indexing time. This will produce a lot of tokens. e.g.
> For example 4grams of your example : Supertestplan => supe uper pert
> erte rtes *test* estp stpl tpla plan
> 
> 
> Is that you want? By the way why do you want to search inside of words?
> 
>  maxGramSize="4"/>
> 
> 
> 
> 
> On Thursday, November 21, 2013 5:23 PM, Andreas Owen 
> wrote:
> 
> I suppose i have to create another field with diffenet tokenizers and
> set
> the boost very low so it doesn't really mess with my ranking because
> there
> the word is now in 2 fields. What kind of tokenizer can do the job?
> 
> 
> 
> From: Andreas Owen [mailto:a...@conx.ch]
> Sent: Donnerstag, 21. November 2013 16:13
> To: solr-user@lucene.apache.org
> Subject: search with wildcard
> 
> 
> 
> I am querying "test" in solr 4.3.1 over the field below and it's not
> finding
> all occurences. It seems that if it is a substring of a word like
> "Supertestplan" it isn't found unless I use a wildcards "*test*". This
> is
> write because of my tokenizer but does someone know a way around this?
> I
> don't want to add wildcards because that messes up queries with
> multiple
> words.
> 
> 
> 
>  positionIncrementGap="100">
> 
>       
> 
>         
> 
>         
> 
> 
> 
>          words="lang/stopwords_de.txt" format="snowball"
> enablePositionIncrements="true"/> 
> 
>         
> 
>                                 class="solr.SnowballPorterFilterFactory" language="German"/> 
> 
> 
> 
>       
> 
>     


Re: search with wildcard

2013-11-21 Thread Ahmet Arslan
Hi Adnreas,

If you don't want to use wildcards at query time, alternative way is to use 
NGrams at indexing time. This will produce a lot of tokens. e.g.
For example 4grams of your example : Supertestplan => supe uper pert erte rtes 
*test* estp stpl tpla plan


Is that you want? By the way why do you want to search inside of words?






On Thursday, November 21, 2013 5:23 PM, Andreas Owen  wrote:
 
I suppose i have to create another field with diffenet tokenizers and set
the boost very low so it doesn't really mess with my ranking because there
the word is now in 2 fields. What kind of tokenizer can do the job?



From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Donnerstag, 21. November 2013 16:13
To: solr-user@lucene.apache.org
Subject: search with wildcard



I am querying "test" in solr 4.3.1 over the field below and it's not finding
all occurences. It seems that if it is a substring of a word like
"Supertestplan" it isn't found unless I use a wildcards "*test*". This is
write because of my tokenizer but does someone know a way around this? I
don't want to add wildcards because that messes up queries with multiple
words.





       

        

        

                              

         

        

                                

        

      

    

RE: search with wildcard

2013-11-21 Thread Andreas Owen
I suppose i have to create another field with diffenet tokenizers and set
the boost very low so it doesn't really mess with my ranking because there
the word is now in 2 fields. What kind of tokenizer can do the job?

 

From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Donnerstag, 21. November 2013 16:13
To: solr-user@lucene.apache.org
Subject: search with wildcard

 

I am querying "test" in solr 4.3.1 over the field below and it's not finding
all occurences. It seems that if it is a substring of a word like
"Supertestplan" it isn't found unless I use a wildcards "*test*". This is
write because of my tokenizer but does someone know a way around this? I
don't want to add wildcards because that messes up queries with multiple
words.

 



   





   

 







  





Re: search with wildcard

2013-11-21 Thread Jack Krupansky
You might be able to make use of the dictionary compound word filter, but 
you will have to build up a dictionary of words to use:


http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html

My e-book has some examples and a better description.

-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Thursday, November 21, 2013 11:40 AM
To: solr-user@lucene.apache.org
Subject: Re: search with wildcard

Hi Adnreas,

If you don't want to use wildcards at query time, alternative way is to use 
NGrams at indexing time. This will produce a lot of tokens. e.g.
For example 4grams of your example : Supertestplan => supe uper pert erte 
rtes *test* estp stpl tpla plan



Is that you want? By the way why do you want to search inside of words?






On Thursday, November 21, 2013 5:23 PM, Andreas Owen  wrote:

I suppose i have to create another field with diffenet tokenizers and set
the boost very low so it doesn't really mess with my ranking because there
the word is now in 2 fields. What kind of tokenizer can do the job?



From: Andreas Owen [mailto:a...@conx.ch]
Sent: Donnerstag, 21. November 2013 16:13
To: solr-user@lucene.apache.org
Subject: search with wildcard



I am querying "test" in solr 4.3.1 over the field below and it's not finding
all occurences. It seems that if it is a substring of a word like
"Supertestplan" it isn't found unless I use a wildcards "*test*". This is
write because of my tokenizer but does someone know a way around this? I
don't want to add wildcards because that messes up queries with multiple
words.