date:20081210

Multi tokenizer

2008-12-10 Thread Antonio Zippo

Hi all,

I need to tokenize my field on whitespaces, html, punctuation, apostrophe

but if I use HTMLStripStandardTokenizerFactory it strips only html but no 
apostrophes

If I use PatternTokenizerFactory i don't know if i can create a pattern to 
tokenizer all of theese characters...(hmtl, apostrophes..)...
I can filter with a pattern theese chars [^0-9A-Za-z] but with filter if I use 
  as replacement it brokes my text

could you help me to solve this problem?

Bye

dismax difference between q=text:+toto AND q=toto

2008-12-10 Thread sunnyfr


Hi,

I would like to get the difference between  q=text:+toto AND q=toto ?

/select?fl=*qt=dismaxq=text:+toto : 4 docs find.
lst name=params
str name=fl*/str
str name=qtext: toto/str
str name=qtdismax/str

/select?fl=*qt=dismaxq=toto : 5682 docs find.
lst name=params
str name=fl*/str
str name=qtoto/str
str name=qtdismax/str


My schema just stored text field, I don't get this big difference.
Thanks a lot for your time, 

-- 
View this message in context: 
http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932303.html
Sent from the Solr - User mailing list archive at Nabble.com.

Value based boosting - Design Help

2008-12-10 Thread ayyanar


We have a requirement for a keyword search in one of our projects and we are
using Solr/Lucene for the same.   

We have the data, link_id, title, url and a collection of keywords
associated to a link_id. Right now we have indexed link_id, title, url and
keywords (multivalued field) in a single index. 
 

Also, in our requirement each keyword value has a weight associated to it
and this weight is calculated based on certain factors like (if the keyword
exist in title then it takes a specific weight etc…). This weight should
drive the relevancy on the search result. For example, when a user enters a
keyword called “Biology” and clicks search, we search the keywords field in
the index. That document that contains the searched keyword with higher
weight should come first.

 

Eg:

 

Document 1:

LinkID = 100

Title = Biology

Keywords = Biology, BioNews, Bio, Bio chemistry

 

Document 2:

LinkID = 102

Title = Nutrition

Keywords = Biology, Nutrition, Dietics 

 

In the above example document 1 should come first because we will associate
more weight to the keyword biology for link id 100 in document 1

 

We understand that this weight can be applied as a boost to a field. The
problem is that in Solr/Lucene we cannot associate a different boost to
different values of a same field. 

 

It would be vey helpful for us if you can provide your thoughts/inputs on
how to achieve this requirement in Lucene:

 

Do we have a way to associate a different boost to different values of a
same field? 
Can we maintain the list of keywords associated to each link_id in a
separate index, so that we can associate weight to each keyword value? If
so, how do we relate the main index and the keyword index? 
 


-- 
View this message in context: 
http://www.nabble.com/Value-based--boosting---Design-Help-tp20934304p20934304.html
Sent from the Solr - User mailing list archive at Nabble.com.

Setting Request Handler

2008-12-10 Thread Deshpande, Mukta

Hi,
 
I have a request handler in my solrconfig.xml : /spellCheckCompRH 
It utilizes the search component spellcheck.
 
When I specify following query in browser, I get correct spelling
suggestions from the file dictionary.
 
http://localhost:8080/solr/spellCheckCompRH/?q=SolrDocsspellcheck.q=rel
evancyspellcheck=truefl=title,scorespellcheck.dictionary=file
 
Now I write a java program to achieve the same result:
 
Code snippet

 .
 .
 
server = new CommonsHttpSolrServer(http://localhost:8080/solr;);
 .
 .
SolrQuery query = new SolrQuery();
query.setQuery(solr );
query.setFields(*,score);
query.set(qt, spellCheckCompRH);
query.set(spellcheck, true);
query.set(SpellingParams.SPELLCHECK_DICT, file);
query.set(SpellingParams.SPELLCHECK_Q , solt);
 .
 .
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
SpellCheckResponse srsp = rsp.getSpellCheckResponse();
 
I get documents for my query but I do not get any spelling suggestions.
I think that the request handler is not getting set for the query
correctly.
 
Can someone please help. 
 
Best Regards,
Mukta

Re: dismax difference between q=text:+toto AND q=toto

2008-12-10 Thread Erik Hatcher

dismax doesn't support field selection in it's query syntax, only via  
the qf parameter.


add debugQuery=true to see how the queries are being parsed, that'll  
reveal what is going on.


Erik


On Dec 10, 2008, at 5:07 AM, sunnyfr wrote:



Hi,

I would like to get the difference between  q=text:+toto AND q=toto ?

/select?fl=*qt=dismaxq=text:+toto : 4 docs find.
lst name=params
str name=fl*/str
str name=qtext: toto/str
str name=qtdismax/str

/select?fl=*qt=dismaxq=toto : 5682 docs find.
lst name=params
str name=fl*/str
str name=qtoto/str
str name=qtdismax/str


My schema just stored text field, I don't get this big difference.
Thanks a lot for your time,

--
View this message in context: 
http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932303.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full-import and empty ./core/data/index

2008-12-10 Thread Shalin Shekhar Mangar

On Wed, Dec 10, 2008 at 4:23 PM, Marc Sturlese [EMAIL PROTECTED]wrote:


 Is there any way to start solar having the index folder empty without
 having
 and error? What I would like to do is start with the empty folder, do a
 full
 import (wich would create the index from 0) and from there keep updating it
 with delta-import.
 At the moment I must have something in the index folder at the begining.
 Otherwise I get an error.


You can delete the index folder (but keep the data folder) and Solr will
create it at the start. There should be no errors.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Value based boosting - Design Help

2008-12-10 Thread Shalin Shekhar Mangar

On Wed, Dec 10, 2008 at 5:54 PM, ayyanar
[EMAIL PROTECTED]wrote:

Also, in our requirement each keyword value has a weight associated to it
and this weight is calculated based on certain factors like (if the keyword
exist in title then it takes a specific weight etc…). This weight should
drive the relevancy on the search result. For example, when a user enters a
keyword called Biology and clicks search, we search the keywords field in
the index. That document that contains the searched keyword with higher
weight should come first.

It would be vey helpful for us if you can provide your thoughts/inputs on
how to achieve this requirement in Lucene:

Do we have a way to associate a different boost to different values of a
same field?

So you are searching only on the keywords field and not the title field? You
can search on both the title and the keywords field and provide different
boosts to the title field.

Why do you want to assign weights to keywords? If all keywords which are in
title are supposed to be more relevant than all keywords only in keywords
field then assigning a boost value to the title field is enough. Is there
any other use-case?

Can we maintain the list of keywords associated to each link_id in a
separate index, so that we can associate weight to each keyword value? If
so, how do we relate the main index and the keyword index?

No joins like these are not possible in Lucene/Solr. Lucene has payloads
which can be used for boosting a particular term but that functionality is
not available in Solr. Look at BoostingTermQuery in Lucene on how to use it.

59 matches

Mail list logo