problem with wildcard

2010-11-11 Thread Jean-Sebastien Vachon

Hi All,

I'm having some trouble with a query using some wildcard and I was wondering if 
anyone could tell me why these two
similar queries do not return the same number of results. Basically, the query 
I'm making should return all docs whose title starts
(or contain) the string lowe'. I suspect some analyzer is causing this 
behaviour and I'd like to know if there is a way to fix this problem.

1) select?q=*:*fq=title:(+lowe')debugQuery=onrows=0

result name=response numFound=302 start=0/
lst name=debug
str name=rawquerystring*:*/str
str name=querystring*:*/str
str name=parsedqueryMatchAllDocsQuery(*:*)/str
str name=parsedquery_toString*:*/str
lst name=explain/
str name=QParserLuceneQParser/str
arr name=filter_queries
strtitle:(  lowe')/str
/arr
arr name=parsed_filter_queries
strtitle:low/str
/arr

2) select?q=*:*fq=title:(+lowe'*)debugQuery=onrows=0 

result name=response numFound=0 start=0/
lst name=debug
str name=rawquerystring*:*/str
str name=querystring*:*/str
str name=parsedqueryMatchAllDocsQuery(*:*)/str
str name=parsedquery_toString*:*/str
lst name=explain/
str name=QParserLuceneQParser/str
arr name=filter_queries
strtitle:(  lowe'*)/str
/arr
arr name=parsed_filter_queries
strtitle:lowe'*/str
/arr
...
/lst


The title field is defined as:

field name=title type=text indexed=true stored=true required=false/

where the text type is:

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
/fieldType






Re: problem with wildcard

2010-11-11 Thread Ahmet Arslan
 I'm having some trouble with a query using some wildcard
 and I was wondering if anyone could tell me why these two
 similar queries do not return the same number of results.
 Basically, the query I'm making should return all docs whose
 title starts
 (or contain) the string lowe'. I suspect some analyzer is
 causing this behaviour and I'd like to know if there is a
 way to fix this problem.
 
 1)
 select?q=*:*fq=title:(+lowe')debugQuery=onrows=0

wildcard queries are not analyzed http://search-lucene.com/m/pnmlH14o6eM1/


  


Re: problem with wildcard

2010-11-11 Thread Jean-Sebastien Vachon

On 2010-11-11, at 3:45 PM, Ahmet Arslan wrote:

 I'm having some trouble with a query using some wildcard
 and I was wondering if anyone could tell me why these two
 similar queries do not return the same number of results.
 Basically, the query I'm making should return all docs whose
 title starts
 (or contain) the string lowe'. I suspect some analyzer is
 causing this behaviour and I'd like to know if there is a
 way to fix this problem.
 
 1)
 select?q=*:*fq=title:(+lowe')debugQuery=onrows=0
 
 wildcard queries are not analyzed http://search-lucene.com/m/pnmlH14o6eM1/
 

Yeah I found out about this a couple of minutes after I posted my problem. If 
there is no analyzer then
why is Solr not finding any documents when a single quote precedes the wildcard?


Re: problem with wildcard

2010-11-11 Thread Ahmet Arslan
 select?q=*:*fq=title:(+lowe')debugQuery=onrows=0
  
  wildcard queries are not analyzed http://search-lucene.com/m/pnmlH14o6eM1/
  
 
 Yeah I found out about this a couple of minutes after I
 posted my problem. If there is no analyzer then
 why is Solr not finding any documents when a single quote
 precedes the wildcard?


Probably your index analyzer (WordDelimiterFilterFactory) eating that single 
quote. You can verify this at admin/analysis.jsp page. In other words there is 
no such term begins with (lowe') in your index. You can try searching just lowe*


  


RE: Problem with Wildcard searches in Solr

2010-07-13 Thread Bastian Spitzer
Hi,

to use leading wildcards you may have a look at this one: 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200910.mbox/%3c4ac4b71c.6040...@gmail.com%3e

basicly u just put a ReversedWildcardFilterFactory in your config and u can use 
leading wildcards. 

good luck!

-Ursprüngliche Nachricht-
Von: imranak [mailto:imranak...@gmail.com] 
Gesendet: Montag, 12. Juli 2010 23:55
An: solr-user@lucene.apache.org
Betreff: RE: Problem with Wildcard searches in Solr


Hi,

Thanks for you response. The dismax query parser doesn't support it but I heard 
the edismax parser supports all kinds of wildcards. Been trying it out but 
without any luck. Could someone please help me with that. I'm unable to make 
leading and in-the-middle wildcard searches work.

Thanks.

Imran.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with Wildcard searches in Solr

2010-07-13 Thread Rebecca Watson
Hi,

earlier this week i started messing with getting wildcard queries to
be analysed

i've got some weird analysers doing stemming/lowercasing and writing
in the same rules into a custom queryparser didn't seem logical given
i just want the analysers to apply as they do at index time

i came up with the hack below, which is just a modified version of
the LuceneQParserPlugin ie. the solr default one which creates
a SolrQueryParser query parser.

in the SolrQueryParser I overwrite the getWildcardQuery function so
that I insert a call to my method - myWildcardQuery.

myWildcardQuery method converts the wildcard term into an analysed
version which it returns (and at least lowercases the if analysis fails
for some reason).

the myWildcardQuery method is just pulling in code from
lucene's QueryParser.getFieldQuery -- so all this code is a magical giant
cut and paste job right now (which you'll see when you look at the lucene/solr
classes involved!)

you use this custom queryparser in the usual way i.e.
by registering the queryparser in the solrconfig.xml file:
queryParser name=ilexirQparser
class=com.ilexir.solr.search.ilexirQParserPlugin/
then call that queryparser in your request handler:
requestHandler name=ilexir class=solr.SearchHandler default=true
!-- default values for query parameters --
 lst name=defaults
 str name=defTypeilexirQparser/str
   str name=echoParamsexplicit/str
 int name=rows10/int
  int name=start0/int
   str name=fl*,score/str
  str name=version2.2/str
  str name=wtstandard/str
  str name=indenton/str
 /lst
 arr name=last-components
 strspellcheck/str
 strtvComponent/str
/arr
  /requestHandler

i enable the leading wildcard queries using the reversedwildcard filter as per
previous email i.e. in index-time analyser add in:
filter class=solr.ReversedWildcardFilterFactory /
(not at query time) -- then the lucene query parser picks up the use of this
filter and allows leading wildcard queries.

of course, non of this is going to sort out trying to match against the query
co?mput?r because you've probably stemmed computer to comput or something
at index time -- but if you add in a copyfield to an extra field that
isn't stemmed
at query time, then query both the original + the non-stemmed field (boost
accordingly -- i.e. you might want to boost the original non-stemmed field
higher!) you'll get the right match then :)

i'd be interested to hear from lucene/solr contributors why wildcards aren't
analysed in general anyway?

anyway hope that helps :)

bec

--



import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.CachingTokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.reverse.ReverseStringFilter;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.WildcardQuery;
import org.apache.solr.analysis.ReversedWildcardFilterFactory;
import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.search.LuceneQParserPlugin;
import org.apache.solr.search.QParser;
import org.apache.solr.search.QueryParsing;
import org.apache.solr.search.SolrQueryParser;

/**
 * modifies the code from LuceneQParserPlugin i.e. the default query parser
 * plugin used by solr.
 * @author bec
 */
public class ilexirQParserPlugin extends LuceneQParserPlugin {
public static String NAME = lucene;

public void init(NamedList args) {
}

public QParser createParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {
return new ilexirQParser(qstr, localParams, params, req);
}
}

class ilexirQParser extends QParser {
String sortStr;
SolrQueryParser lparser;

public ilexirQParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {
super(qstr, localParams, params, req);
}

public Query parse() throws ParseException {
String qstr = getString();

String defaultField = getParam(CommonParams.DF);
if (defaultField == null) {
defaultField = 
getReq().getSchema().getDefaultSearchFieldName();
}
lparser = new SolrQueryParser(this, defaultField) {

/**
 * adapted from lucene's QueryParser.getFieldQuery !!
  

Re: Problem with Wildcard searches in Solr

2010-07-13 Thread Rebecca Watson
hi,

sorry realised i had a typo:

 of course, non of this is going to sort out trying to match against the query
 co?mput?r because you've probably stemmed computer to comput or 
 something
 at index time -- but if you add in a copyfield to an extra field that
 isn't stemmed
 at query time, then query both the original + the non-stemmed field (boost
 accordingly -- i.e. you might want to boost the original non-stemmed field
 higher!) you'll get the right match then :)


should read - but if you add in a copyfield to an extra field that
isn't stemmed at index time

bec :)


Re: Problem with Wildcard searches in Solr

2010-07-13 Thread imranak

Thank you so much guys. You solved my problem :) :)
The problem was I was using stemming, I removed that it works perfectly now.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p963744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with Wildcard searches in Solr

2010-07-12 Thread imranak


Hi,

I am having a problem doing wildcard searches in lucene syntax using the
edismax handler. I have Solr 4.0 nightly build from the trunk.

A general search like 'computer' returns results but 'com*er' doesn't return
any results. Similary, a search like 'co?mput?r' returns no results. The
only type of wildcard searches working currrently is ones with trailing
wildcards(like compute? or comput*).

I want to be able to do searches with wildcards at the beginning (*puter)
and in between (com*er). Could someone please tell me what I am doing wrong
and how to fix it.

Thanks.

Regards,
Imran.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961448.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with Wildcard searches in Solr

2010-07-12 Thread Markus Jelsma
Hi,

 

The DisMaxQParser does not support wildcards in its q parameter [1]. You must 
use the LuceneQParser instead. AFAIK, in DisMax, wildcards are part of the 
search query and may get filtered out in your query analyzer.

 

[1]: http://wiki.apache.org/solr/DisMaxRequestHandler#q

 

Cheers,
 
-Original message-
From: imranak imranak...@gmail.com
Sent: Mon 12-07-2010 22:40
To: solr-user@lucene.apache.org; 
Subject: Problem with Wildcard searches in Solr



Hi,

I am having a problem doing wildcard searches in lucene syntax using the
edismax handler. I have Solr 4.0 nightly build from the trunk.

A general search like 'computer' returns results but 'com*er' doesn't return
any results. Similary, a search like 'co?mput?r' returns no results. The
only type of wildcard searches working currrently is ones with trailing
wildcards(like compute? or comput*).

I want to be able to do searches with wildcards at the beginning (*puter)
and in between (com*er). Could someone please tell me what I am doing wrong
and how to fix it.

Thanks.

Regards,
Imran.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961448.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with Wildcard searches in Solr

2010-07-12 Thread imranak

Hi,

Thanks for you response. The dismax query parser doesn't support it but I
heard the edismax parser supports all kinds of wildcards. Been trying it out
but without any luck. Could someone please help me with that. I'm unable to
make leading and in-the-middle wildcard searches work.

Thanks.

Imran.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with Wildcard searches in Solr

2010-07-12 Thread Markus Jelsma
Hi,

 

Check edismax' JIRA page and its unresolved related issues [1]. AFAIK, it 
hasn't been committed yet.

 

[1]: https://issues.apache.org/jira/browse/SOLR-1553

 

Cheers,
 
-Original message-
From: imranak imranak...@gmail.com
Sent: Mon 12-07-2010 23:55
To: solr-user@lucene.apache.org; 
Subject: RE: Problem with Wildcard searches in Solr


Hi,

Thanks for you response. The dismax query parser doesn't support it but I
heard the edismax parser supports all kinds of wildcards. Been trying it out
but without any luck. Could someone please help me with that. I'm unable to
make leading and in-the-middle wildcard searches work.

Thanks.

Imran.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with Wildcard searches in Solr

2010-07-12 Thread Yonik Seeley
On Mon, Jul 12, 2010 at 4:39 PM, imranak imranak...@gmail.com wrote:
 A general search like 'computer' returns results but 'com*er' doesn't return
 any results.

This is due to issues with wildcards and stemming.
computer is indexed and searched as comput... but it's not
generally possible to stem wildcarded terms.

So comp*er won't match (the terms in the index are comput)
but comp*r should.

If wildcarding is important, use a field type without a stemmer.

-Yonik
http://www.lucidimagination.com



Re: Problem with Wildcard...

2009-10-02 Thread Christian Zambrano
Another thing to remember about wildcard and fuzzy searches is that none 
of the token filters will be applied.


If you are using the LowerCaseFilterFactory at index time, then 
RI-MC50034-1 gets converted to ri-mc50034-1 which is never going to 
match RI-MC5000*


Also, I would probably use the analyze page of your solr admin site to 
see what tokens are produced from RI-MC500034-1 and 500034 based on 
your schema


On 10/01/2009 02:42 AM, Shalin Shekhar Mangar wrote:

On Tue, Sep 29, 2009 at 6:42 PM, Jörg Agatzjoerg.ag...@googlemail.comwrote:

   

Hi Users...

i have a Problem

I have a lot of fields, (type=text) for search in all fields i copy all
fields in the default text field and use this for default search.

Now i will search...

This is into a Field

RI-MC500034-1
when i search RI-MC500034-1 i found it...
if i seacht RI-MC5000* i dosen´t

when i search 500034 i found it...
if i seacht 5000* i dosen´t

what can i do to use the Wildcards?

 

I guess one thing you need to do is to add preserveOriginal=true in the
WordDelimiterFactory section in your field type. That would help match
things like RI-MC5000*. Make sure you re-index all documents after this
change.

As for the others, add debugQuery=on as a request parameter and see how the
query is being parsed. If you have a doubt, paste it on the list and we can
help you.

   


Re: Problem with Wildcard...

2009-10-01 Thread Shalin Shekhar Mangar
On Tue, Sep 29, 2009 at 6:42 PM, Jörg Agatz joerg.ag...@googlemail.comwrote:

 Hi Users...

 i have a Problem

 I have a lot of fields, (type=text) for search in all fields i copy all
 fields in the default text field and use this for default search.

 Now i will search...

 This is into a Field

 RI-MC500034-1
 when i search RI-MC500034-1 i found it...
 if i seacht RI-MC5000* i dosen´t

 when i search 500034 i found it...
 if i seacht 5000* i dosen´t

 what can i do to use the Wildcards?


I guess one thing you need to do is to add preserveOriginal=true in the
WordDelimiterFactory section in your field type. That would help match
things like RI-MC5000*. Make sure you re-index all documents after this
change.

As for the others, add debugQuery=on as a request parameter and see how the
query is being parsed. If you have a doubt, paste it on the list and we can
help you.

-- 
Regards,
Shalin Shekhar Mangar.


Problem with Wildcard...

2009-09-29 Thread Jörg Agatz
Hi Users...

i have a Problem

I have a lot of fields, (type=text) for search in all fields i copy all
fields in the default text field and use this for default search.

Now i will search...

This is into a Field

RI-MC500034-1
when i search RI-MC500034-1 i found it...
if i seacht RI-MC5000* i dosen´t

when i search 500034 i found it...
if i seacht 5000* i dosen´t

what can i do to use the Wildcards?

KingArtus


Re: Problem using wildcard characters ? and *

2008-02-24 Thread Chris Hostetter

If you ar using the text field from the example schema with the 
EnglishPorterFilterFactory, the word create gets stemmed to creat 
which would explain your problem.

In general, stemming and wildcard queries don't work.

:  When i give a search for au?it (audit is the name of the block) it
: shows correct results.
:  But when i try same thing with crea?e (create is the name of the
: block) no results are displayed..
:  Both audit and create are stored in the same place.

-Hoss



Problem using wildcard characters ? and *

2008-02-18 Thread ruchalale

Hi,
I am using Solr in my application to search some blocks. 
These blocks have  unique key = block name + block id

 When i try to search a block uisng '?' it works partially..
 When i give a search for au?it (audit is the name of the block) it
shows correct results.
 But when i try same thing with crea?e (create is the name of the
block) no results are displayed..
 Both audit and create are stored in the same place.

 Also, wildcard character '*' works partially..
If i give a search for del* (i wana search for delete block).. it
shows two results:
  delete and
  softdelete...(this should not have come)

 i tried changing the cases also but it does not work.  can somebody
please explain the reason

-- 
View this message in context: 
http://www.nabble.com/Problem-using-wildcard-characters---and-*-tp15554272p15554272.html
Sent from the Solr - User mailing list archive at Nabble.com.