RE: search with wildcard

2013-11-21 Thread Scott Schneider
I know it's documented that Lucene/Solr doesn't apply filters to queries with 
wildcards, but this seems to trip up a lot of users.  I can also see why 
wildcards break a number of filters, but a number of filters (e.g. mapping 
charsets) could mostly or entirely work.  The N-gram filter is another one that 
would be great to still run when there wildcards.  If you indexed 4-grams and 
the query is a *testp*, you currently won't get any results; but the N-gram 
filter could have a wildcard mode that, in this case, would return just the 
first 4-gram as a token.

Is this something you've considered?  It would have to be enabled in the core 
network, but disabled by default for existing filters; then it could be enabled 
1-by-1 for existing filters.  Apologies if the dev list is a better place for 
this.

Scott


 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com]
 Sent: Thursday, November 21, 2013 8:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: search with wildcard
 
 Hi Adnreas,
 
 If you don't want to use wildcards at query time, alternative way is to
 use NGrams at indexing time. This will produce a lot of tokens. e.g.
 For example 4grams of your example : Supertestplan = supe uper pert
 erte rtes *test* estp stpl tpla plan
 
 
 Is that you want? By the way why do you want to search inside of words?
 
 filter class=solr.NGramFilterFactory minGramSize=3
 maxGramSize=4/
 
 
 
 
 On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch
 wrote:
 
 I suppose i have to create another field with diffenet tokenizers and
 set
 the boost very low so it doesn't really mess with my ranking because
 there
 the word is now in 2 fields. What kind of tokenizer can do the job?
 
 
 
 From: Andreas Owen [mailto:a...@conx.ch]
 Sent: Donnerstag, 21. November 2013 16:13
 To: solr-user@lucene.apache.org
 Subject: search with wildcard
 
 
 
 I am querying test in solr 4.3.1 over the field below and it's not
 finding
 all occurences. It seems that if it is a substring of a word like
 Supertestplan it isn't found unless I use a wildcards *test*. This
 is
 write because of my tokenizer but does someone know a way around this?
 I
 don't want to add wildcards because that messes up queries with
 multiple
 words.
 
 
 
 fieldType name=text_de class=solr.TextField
 positionIncrementGap=100
 
       analyzer
 
         tokenizer class=solr.StandardTokenizerFactory/
 
         filter class=solr.LowerCaseFilterFactory/
 
 
 
         filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_de.txt format=snowball
 enablePositionIncrements=true/ !-- remove common words --
 
         filter class=solr.GermanNormalizationFilterFactory/
 
                                filter
 class=solr.SnowballPorterFilterFactory language=German/ !--
 remove
 noun/adjective inflections like plural endings --
 
 
 
       /analyzer
 
     /fieldType


Re: search with wildcard

2013-11-21 Thread Ahmet Arslan
Hi Adnreas,

If you don't want to use wildcards at query time, alternative way is to use 
NGrams at indexing time. This will produce a lot of tokens. e.g.
For example 4grams of your example : Supertestplan = supe uper pert erte rtes 
*test* estp stpl tpla plan


Is that you want? By the way why do you want to search inside of words?

filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=4/




On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch wrote:
 
I suppose i have to create another field with diffenet tokenizers and set
the boost very low so it doesn't really mess with my ranking because there
the word is now in 2 fields. What kind of tokenizer can do the job?



From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Donnerstag, 21. November 2013 16:13
To: solr-user@lucene.apache.org
Subject: search with wildcard



I am querying test in solr 4.3.1 over the field below and it's not finding
all occurences. It seems that if it is a substring of a word like
Supertestplan it isn't found unless I use a wildcards *test*. This is
write because of my tokenizer but does someone know a way around this? I
don't want to add wildcards because that messes up queries with multiple
words.



fieldType name=text_de class=solr.TextField positionIncrementGap=100

      analyzer 

        tokenizer class=solr.StandardTokenizerFactory/

        filter class=solr.LowerCaseFilterFactory/

                              

        filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

        filter class=solr.GermanNormalizationFilterFactory/

                               filter
class=solr.SnowballPorterFilterFactory language=German/ !-- remove
noun/adjective inflections like plural endings --

        

      /analyzer

    /fieldType

RE: search with wildcard

2013-11-21 Thread Andreas Owen
I suppose i have to create another field with diffenet tokenizers and set
the boost very low so it doesn't really mess with my ranking because there
the word is now in 2 fields. What kind of tokenizer can do the job?

 

From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Donnerstag, 21. November 2013 16:13
To: solr-user@lucene.apache.org
Subject: search with wildcard

 

I am querying test in solr 4.3.1 over the field below and it's not finding
all occurences. It seems that if it is a substring of a word like
Supertestplan it isn't found unless I use a wildcards *test*. This is
write because of my tokenizer but does someone know a way around this? I
don't want to add wildcards because that messes up queries with multiple
words.

 

fieldType name=text_de class=solr.TextField positionIncrementGap=100

  analyzer 

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

   

filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

filter class=solr.GermanNormalizationFilterFactory/

   filter
class=solr.SnowballPorterFilterFactory language=German/ !-- remove
noun/adjective inflections like plural endings --



  /analyzer

/fieldType



Re: search with wildcard

2013-11-21 Thread Jack Krupansky
You might be able to make use of the dictionary compound word filter, but 
you will have to build up a dictionary of words to use:


http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html

My e-book has some examples and a better description.

-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Thursday, November 21, 2013 11:40 AM
To: solr-user@lucene.apache.org
Subject: Re: search with wildcard

Hi Adnreas,

If you don't want to use wildcards at query time, alternative way is to use 
NGrams at indexing time. This will produce a lot of tokens. e.g.
For example 4grams of your example : Supertestplan = supe uper pert erte 
rtes *test* estp stpl tpla plan



Is that you want? By the way why do you want to search inside of words?

filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=4/




On Thursday, November 21, 2013 5:23 PM, Andreas Owen a...@conx.ch wrote:

I suppose i have to create another field with diffenet tokenizers and set
the boost very low so it doesn't really mess with my ranking because there
the word is now in 2 fields. What kind of tokenizer can do the job?



From: Andreas Owen [mailto:a...@conx.ch]
Sent: Donnerstag, 21. November 2013 16:13
To: solr-user@lucene.apache.org
Subject: search with wildcard



I am querying test in solr 4.3.1 over the field below and it's not finding
all occurences. It seems that if it is a substring of a word like
Supertestplan it isn't found unless I use a wildcards *test*. This is
write because of my tokenizer but does someone know a way around this? I
don't want to add wildcards because that messes up queries with multiple
words.



fieldType name=text_de class=solr.TextField positionIncrementGap=100

 analyzer

   tokenizer class=solr.StandardTokenizerFactory/

   filter class=solr.LowerCaseFilterFactory/



   filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

   filter class=solr.GermanNormalizationFilterFactory/

  filter
class=solr.SnowballPorterFilterFactory language=German/ !-- remove
noun/adjective inflections like plural endings --



 /analyzer

   /fieldType 



Re: Search Phrase Wildcard?

2009-06-12 Thread Sandeep Tagore

Yes...!! you can search for phrases with wild cards.
You dont have a direct support for it.. but u can achieve like the
following...

User input:  Solr we
Query should be: (name:Solr AND (name:we* OR name:we)) OR name:Solr we

The query builder parses the original input and builds one that simulates a
wildcard phrase query. It looks for all the words the user entered and adds
a wildcard (*) to the last word. It also searches for the whole phrase the
user entered using a phrase query in case the whole phrase is found in the
index. This should work!

let me know if you have any issues...
-- 
View this message in context: 
http://www.nabble.com/Search-Phrase-Wildcard--tp23978330p23996409.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search Phrase Wildcard?

2009-06-11 Thread Aleksander M. Stensby

Solr does not support wildcards in phrase queries, yet.

Cheers,
 Aleks

On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun  
samnang.ch...@gmail.com wrote:



Hi all,
I have my document like this:

doc

nameSolr web service/name

/doc

Is there any ways that I can search like startswith:

So* We* : found
Sol*: found
We*: not found

Cheers,
Samnang




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Search Phrase Wildcard?

2009-06-11 Thread Avlesh Singh
Infact, Lucene does not support that.

Lucene supports single and multiple character wildcard searches within
 single terms (*not within phrase queries*).


Taken from
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches

Cheers
Avlesh

On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby 
aleksander.sten...@integrasco.no wrote:

 Solr does not support wildcards in phrase queries, yet.

 Cheers,
  Aleks


 On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com
 wrote:

  Hi all,
 I have my document like this:

 doc

 nameSolr web service/name

 /doc

 Is there any ways that I can search like startswith:

 So* We* : found
 Sol*: found
 We*: not found

 Cheers,
 Samnang




 --
 Aleksander M. Stensby
 Lead software developer and system architect
 Integrasco A/S
 www.integrasco.no
 http://twitter.com/Integrasco

 Please consider the environment before printing all or any of this e-mail



Re: Search Phrase Wildcard?

2009-06-11 Thread Aleksander M. Stensby
Well yes:) Since Solr do infact support the entire lucene query parser  
syntax:)


- Aleks

On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh avl...@gmail.com wrote:


Infact, Lucene does not support that.

Lucene supports single and multiple character wildcard searches within

single terms (*not within phrase queries*).



Taken from
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches

Cheers
Avlesh

On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby 
aleksander.sten...@integrasco.no wrote:


Solr does not support wildcards in phrase queries, yet.

Cheers,
 Aleks


On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun  
samnang.ch...@gmail.com

wrote:

 Hi all,

I have my document like this:

doc

nameSolr web service/name

/doc

Is there any ways that I can search like startswith:

So* We* : found
Sol*: found
We*: not found

Cheers,
Samnang





--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail






--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Search Phrase Wildcard?

2009-06-11 Thread Mark Miller
You might be interested in this Lucene issue: 
https://issues.apache.org/jira/browse/LUCENE-1486



Aleksander M. Stensby wrote:
Well yes:) Since Solr do infact support the entire lucene query parser 
syntax:)


- Aleks

On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh avl...@gmail.com 
wrote:



Infact, Lucene does not support that.

Lucene supports single and multiple character wildcard searches within

single terms (*not within phrase queries*).



Taken from
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches 



Cheers
Avlesh

On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby 
aleksander.sten...@integrasco.no wrote:


Solr does not support wildcards in phrase queries, yet.

Cheers,
 Aleks


On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun 
samnang.ch...@gmail.com

wrote:

 Hi all,

I have my document like this:

doc

nameSolr web service/name

/doc

Is there any ways that I can search like startswith:

So* We* : found
Sol*: found
We*: not found

Cheers,
Samnang





--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this 
e-mail









--
- Mark

http://www.lucidimagination.com