Re: Using custom scoring formula

2019-08-08 Thread Chee Yee Lim
Hi Arnold,

One way to approach this is to store the topic vector you calculated with
each of the associated Solr document into a pseudo-vector field (i.e.
formatted string field). Then parse the string field into actual vector for
calculation when you need it. Something similar to this,
https://github.com/saaay71/solr-vector-scoring. But note that the plugin
will not work out of the box for latest Solr version.

Best wishes,
Chee Yee

On Thu, 8 Aug 2019 at 01:07, Arnold Bronley  wrote:

> Hi,
>
> I have a topic verctor calculated for each of the Solr document in a
> collection. Topic vector is calculated using LDA (
> https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation).  Now I want to
> return the similar document to a given document from this collection. I can
> simply use normalized dot product between the given vector and all other
> vectors to see which one has product of ~1. That will tell me that those
> are very similar documents. Is there a way to achieve this using Solr?
>


Using custom scoring formula

2019-08-07 Thread Arnold Bronley
Hi,

I have a topic verctor calculated for each of the Solr document in a
collection. Topic vector is calculated using LDA (
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation).  Now I want to
return the similar document to a given document from this collection. I can
simply use normalized dot product between the given vector and all other
vectors to see which one has product of ~1. That will tell me that those
are very similar documents. Is there a way to achieve this using Solr?


Re: Solr cache when using custom scoring

2015-07-09 Thread amid
Mikhail,

We've now override the equal & hashcode of the custom query to use this new
param as well, and it works like charm.

Thanks allot,
Ami



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cache-when-using-custom-scoring-tp4216419p4216496.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr cache when using custom scoring

2015-07-08 Thread amid
No sure I get you, the parameter is passed to solr as a string.
It seems like solr use for the caching key only the query, sort and range of
documents
(from the doc - "This cache holds the results of previous searches: ordered
lists of document IDs (DocList) based on a query, a sort, and the range of
documents requested")

Searching for a good way to make sure this parameter will be used as well so
different parameters values with the same query will create different cache
keys




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cache-when-using-custom-scoring-tp4216419p4216479.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr cache when using custom scoring

2015-07-08 Thread Mikhail Khludnev
On Wed, Jul 8, 2015 at 11:30 PM, amid  wrote:

> The custom scoring code use a parameter which passed to the solr query,


this param should be evaluated in equals() and hashcode(). isn;t it?


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>



Solr cache when using custom scoring

2015-07-08 Thread amid
Hi,

We are using solr and implemented our own custom scoring.
The custom scoring code use a parameter which passed to the solr query,
different parameter value will change the score of the same query.

The problem which we have is that this parameter is not part of the query
caching so running the same query with different parameter values return the
first cached result.

What is the best way to workaround it (without removing the cache)? Is there
a way to tell solr to cache query with the parameter value as well? or maybe
add a dummy query to the query (the parameter is pretty long json)?

Thanks,
Ami



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cache-when-using-custom-scoring-tp4216419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Antwort: Custom Scoring Question

2015-04-29 Thread Johannes Ruscheinski
Hi Stephan,

On 29/04/15 14:37, Stephan Schubert wrote:
> Hi Johannes,
>
> did you have a look on Solr edismax and function queries? 
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
Just read it.
>
> If I got you right, for the case you just want to ignore fields which have 
> not a value set on a specific field you can filter them out with a filter

Yes, that is a part of our problem.
>  
> query.
>
> Example: 
>
> fieldname: mycustomfield
>
> filterquery to ignore docs with mycustomfield not set: +mycustomfield:*

That seems really useful to us and solves one part of our problem,
thanks.  We still need to figure out how to invoke the custom scorer
that we wrote in Java.  Also, we would like the search to invoke another
custom function that filters out results that are not relevant to a
given query.

--Johannes
>
> Regards
>
> Stephan
>
>
>
> Von:Johannes Ruscheinski 
> An: solr-user@lucene.apache.org, 
> Kopie:  Oliver Obenland 
> Datum:  29.04.2015 14:10
> Betreff:Custom Scoring Question
>
>
>
> Hi,
>
> I am entirely new to the world of SOLR programming and I have the 
> following questions:
>
> In addition to our regular searches we need to implement a specialised 
> form of range search and ranking. We have implemented a CustomScoreQuery 
> and a CustomScoreProvider.  I now have a few questions:
>
> 1) Where and how do we let SOLR know that it should use this? (I presume 
> that will be some XML config file.)
> 2) How do we "tag" our special queries to switch to the custom 
> implementation.
>
> Furthermore, only a small subset of our data will have the database field 
> relevant to this type of query set.  A problem that I can see is that we 
> want SOLR to prefilter, or suppress, any records that have no data in this 
> field and, if the field is non-empty, to call a function provided by us to 
> let it know whether to include said record in the result set or not.
>
> Also, any tips on how to develop and debug this?  I am using the Linux 
> command-line and Emacs.  I am linking against SOLR by using "javac -cp 
> solr-core-4.2.1.jar:. my_code.java".  It is probably not relevant but, I 
> might mention it anyway: We are using SOLR as a part of VuFind.
>
> I'd be greatful for any suggestions.
>
> --Johannes
>

-- 
Dr. Johannes Ruscheinski
Universitätsbibliothek Tübingen - IT-Abteilung -
Wilhelmstr. 32, 72074 Tübingen

Tel: +49 7071 29-72820
FAX: +49 7071 29-5069
Email: johannes.ruschein...@uni-tuebingen.de




Antwort: Custom Scoring Question

2015-04-29 Thread Stephan Schubert
Hi Johannes,

did you have a look on Solr edismax and function queries? 
https://cwiki.apache.org/confluence/display/solr/Function+Queries

If I got you right, for the case you just want to ignore fields which have 
not a value set on a specific field you can filter them out with a filter 
query.

Example: 

fieldname: mycustomfield

filterquery to ignore docs with mycustomfield not set: +mycustomfield:*

Regards

Stephan



Von:Johannes Ruscheinski 
An: solr-user@lucene.apache.org, 
Kopie:  Oliver Obenland 
Datum:  29.04.2015 14:10
Betreff:Custom Scoring Question



Hi,

I am entirely new to the world of SOLR programming and I have the 
following questions:

In addition to our regular searches we need to implement a specialised 
form of range search and ranking. We have implemented a CustomScoreQuery 
and a CustomScoreProvider.  I now have a few questions:

1) Where and how do we let SOLR know that it should use this? (I presume 
that will be some XML config file.)
2) How do we "tag" our special queries to switch to the custom 
implementation.

Furthermore, only a small subset of our data will have the database field 
relevant to this type of query set.  A problem that I can see is that we 
want SOLR to prefilter, or suppress, any records that have no data in this 
field and, if the field is non-empty, to call a function provided by us to 
let it know whether to include said record in the result set or not.

Also, any tips on how to develop and debug this?  I am using the Linux 
command-line and Emacs.  I am linking against SOLR by using "javac -cp 
solr-core-4.2.1.jar:. my_code.java".  It is probably not relevant but, I 
might mention it anyway: We are using SOLR as a part of VuFind.

I'd be greatful for any suggestions.

--Johannes

-- 
Dr. Johannes Ruscheinski
Universitätsbibliothek Tübingen - IT-Abteilung -
Wilhelmstr. 32, 72074 Tübingen

Tel: +49 7071 29-72820
FAX: +49 7071 29-5069
Email: johannes.ruschein...@uni-tuebingen.de



 
 
SICK AG - Sitz: Waldkirch i. Br. - Handelsregister: Freiburg i. Br. HRB 
280355 
Vorstand: Dr. Robert Bauer (Vorsitzender)  -  Reinhard Bösl  -  Dr. Mats 
Gökstorp  -  Dr. Martin Krämer  -  Markus Vatter 
Aufsichtsrat: Gisela Sick (Ehrenvorsitzende) - Klaus M. Bukenberger 
(Vorsitzender) 


Custom Scoring Question

2015-04-29 Thread Johannes Ruscheinski
Hi,

I am entirely new to the world of SOLR programming and I have the following 
questions:

In addition to our regular searches we need to implement a specialised form of 
range search and ranking. We have implemented a CustomScoreQuery and a 
CustomScoreProvider.  I now have a few questions:

1) Where and how do we let SOLR know that it should use this? (I presume that 
will be some XML config file.)
2) How do we "tag" our special queries to switch to the custom implementation.

Furthermore, only a small subset of our data will have the database field 
relevant to this type of query set.  A problem that I can see is that we want 
SOLR to prefilter, or suppress, any records that have no data in this field 
and, if the field is non-empty, to call a function provided by us to let it 
know whether to include said record in the result set or not.

Also, any tips on how to develop and debug this?  I am using the Linux 
command-line and Emacs.  I am linking against SOLR by using "javac -cp 
solr-core-4.2.1.jar:. my_code.java".  It is probably not relevant but, I might 
mention it anyway: We are using SOLR as a part of VuFind.

I'd be greatful for any suggestions.

--Johannes

-- 
Dr. Johannes Ruscheinski
Universitätsbibliothek Tübingen - IT-Abteilung -
Wilhelmstr. 32, 72074 Tübingen

Tel: +49 7071 29-72820
FAX: +49 7071 29-5069
Email: johannes.ruschein...@uni-tuebingen.de




Filter results based on custom scoring and _val_

2012-10-10 Thread jimtronic
I'm using solr function queries to generate my own custom score. I achieve
this using something along these lines:

q=_val_:"my_custom_function()"
This populates the score field as expected, but it also includes documents
that score 0. I need a way to filter the results so that scores below zero
are not included.

I realize that I'm using score in a non-standard way and that normally the
score that lucene/solr produce is not absolute. However, producing my own
score works really well for my needs.

I've tried using {!frange l=0} but this causes the score for all documents
to be "1.0".

I've found that I can do the following:

q=*:*&fl=foo:my_custom_function()&fq={!frange l=1}my_custom_function() 

This puts my custom score into foo, but it requires me to list all the logic
twice. Sometimes my logic is very long.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-results-based-on-custom-scoring-and-val-tp4012968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom scoring question

2012-03-30 Thread Tomás Fernández Löbbe
But if you have that "score" in a field, you could use that field as part
of a function-query instead of directly sorting on it, that would mix this
"score" with the score calculated with other fields.

On Thu, Mar 29, 2012 at 5:49 PM, Darren Govoni  wrote:

> Yeah, I guess that would work. I wasn't sure if it would change relative
> to other documents. But if it were to be combined with other fields,
> that approach may not work because the calculation wouldn't include the
> scoring for other parts of the query. So then you have the dynamic score
> and what to do with it.
>
> On Thu, 2012-03-29 at 16:29 -0300, Tomás Fernández Löbbe wrote:
> > Can't you simply calculate that at index time and assign the result to a
> > field, then sort by that field.
> >
> > On Thu, Mar 29, 2012 at 12:07 PM, Darren Govoni 
> wrote:
> >
> > > I'm going to try index time per-field boosting and do the boost
> > > computation at index time and see if that helps.
> > >
> > > On Thu, 2012-03-29 at 10:08 -0400, Darren Govoni wrote:
> > > > Hi,
> > > >  I have a situation I want to re-score document relevance.
> > > >
> > > > Let's say I have two fields:
> > > >
> > > > text: The quick brown fox jumped over the white fence.
> > > > terms: fox fence
> > > >
> > > > Now my queries come in as:
> > > >
> > > > terms:[* TO *]
> > > >
> > > > and Solr scores them on that field.
> > > >
> > > > What I want is to rank them according to the distribution of field
> > > > "terms" within field "text". Which is a per document calculation.
> > > >
> > > > Can this be done with any kind of dismax? I'm not searching for known
> > > > terms at query time.
> > > >
> > > > If not, what is the best way to implement a custom scoring handler to
> > > > perform this calculation and re-score/sort the results?
> > > >
> > > > thanks for any tips!!!
> > > >
> > >
> > >
> > >
>
>
>


Re: Custom scoring question

2012-03-29 Thread Darren Govoni
Yeah, I guess that would work. I wasn't sure if it would change relative
to other documents. But if it were to be combined with other fields,
that approach may not work because the calculation wouldn't include the
scoring for other parts of the query. So then you have the dynamic score
and what to do with it.

On Thu, 2012-03-29 at 16:29 -0300, Tomás Fernández Löbbe wrote:
> Can't you simply calculate that at index time and assign the result to a
> field, then sort by that field.
> 
> On Thu, Mar 29, 2012 at 12:07 PM, Darren Govoni  wrote:
> 
> > I'm going to try index time per-field boosting and do the boost
> > computation at index time and see if that helps.
> >
> > On Thu, 2012-03-29 at 10:08 -0400, Darren Govoni wrote:
> > > Hi,
> > >  I have a situation I want to re-score document relevance.
> > >
> > > Let's say I have two fields:
> > >
> > > text: The quick brown fox jumped over the white fence.
> > > terms: fox fence
> > >
> > > Now my queries come in as:
> > >
> > > terms:[* TO *]
> > >
> > > and Solr scores them on that field.
> > >
> > > What I want is to rank them according to the distribution of field
> > > "terms" within field "text". Which is a per document calculation.
> > >
> > > Can this be done with any kind of dismax? I'm not searching for known
> > > terms at query time.
> > >
> > > If not, what is the best way to implement a custom scoring handler to
> > > perform this calculation and re-score/sort the results?
> > >
> > > thanks for any tips!!!
> > >
> >
> >
> >




Re: Custom scoring question

2012-03-29 Thread Tomás Fernández Löbbe
Can't you simply calculate that at index time and assign the result to a
field, then sort by that field.

On Thu, Mar 29, 2012 at 12:07 PM, Darren Govoni  wrote:

> I'm going to try index time per-field boosting and do the boost
> computation at index time and see if that helps.
>
> On Thu, 2012-03-29 at 10:08 -0400, Darren Govoni wrote:
> > Hi,
> >  I have a situation I want to re-score document relevance.
> >
> > Let's say I have two fields:
> >
> > text: The quick brown fox jumped over the white fence.
> > terms: fox fence
> >
> > Now my queries come in as:
> >
> > terms:[* TO *]
> >
> > and Solr scores them on that field.
> >
> > What I want is to rank them according to the distribution of field
> > "terms" within field "text". Which is a per document calculation.
> >
> > Can this be done with any kind of dismax? I'm not searching for known
> > terms at query time.
> >
> > If not, what is the best way to implement a custom scoring handler to
> > perform this calculation and re-score/sort the results?
> >
> > thanks for any tips!!!
> >
>
>
>


Re: Custom scoring question

2012-03-29 Thread Darren Govoni
I'm going to try index time per-field boosting and do the boost
computation at index time and see if that helps.

On Thu, 2012-03-29 at 10:08 -0400, Darren Govoni wrote:
> Hi,
>  I have a situation I want to re-score document relevance.
> 
> Let's say I have two fields:
> 
> text: The quick brown fox jumped over the white fence.
> terms: fox fence
> 
> Now my queries come in as:
> 
> terms:[* TO *]
> 
> and Solr scores them on that field. 
> 
> What I want is to rank them according to the distribution of field
> "terms" within field "text". Which is a per document calculation.
> 
> Can this be done with any kind of dismax? I'm not searching for known
> terms at query time.
> 
> If not, what is the best way to implement a custom scoring handler to
> perform this calculation and re-score/sort the results?
> 
> thanks for any tips!!!
> 




Custom scoring question

2012-03-29 Thread Darren Govoni
Hi,
 I have a situation I want to re-score document relevance.

Let's say I have two fields:

text: The quick brown fox jumped over the white fence.
terms: fox fence

Now my queries come in as:

terms:[* TO *]

and Solr scores them on that field. 

What I want is to rank them according to the distribution of field
"terms" within field "text". Which is a per document calculation.

Can this be done with any kind of dismax? I'm not searching for known
terms at query time.

If not, what is the best way to implement a custom scoring handler to
perform this calculation and re-score/sort the results?

thanks for any tips!!!



Re: custom scoring

2012-02-20 Thread Em
Hi Carlos,

> "query_score" is a field that is indexed and stored
> with every document.
Thanks for clarifying that, now the whole query-string makes more sense
to me.

Did you check whether query() - without product() and pow() - is also
much slower than a normal query?

I guess, if the performance-decrease without product() and pow() is not
that large, you are hitting the small overhead that comes with every
function query.
It would be nice, if you could check that.

However, let's take a step back and look what you really want to achieve
instead of how you are trying to achieve it right now.

You want to influence the score of your actual query by a value that
represents a combination of some static values and the likelyness of how
good a query matches a document.

>From your query, I can see that you are using the same fields in your
FunctionQuery and within your MainQuery (let's call the q-param
"MainQuery").
This means that the scores of your query()-method and your MainQuery
should be identical.
Let's call this value just "score" and rename your field "query_score"
"popularity".

I don't know how you are implementing the FunctionQuery (boost by
multiplication, boost by addition), but it seems clear to me that your
formula looks this way:

score x (score^0.5*popularity) where x is kind of an operator (+,*,...)

Why don't you reduce it to

score * boost(log(popularity)).

This is a trade-off between precision and performance.

You could even improve the above by setting the doc's boost equal to
log(populary) at indexing time.

What do you think about that?

Regards,
Em



Am 20.02.2012 15:37, schrieb Carlos Gonzalez-Cadenas:
> Hi Em:
> 
> The HTTP request is not gonna help you a lot because we use a custom
> QParser (that builds the query that I've pasted before). In any case, here
> it is:
> 
> http://localhost:8080/solr/core0/select?shards=…(shards
> here)…&indent=on&wt=exon&timeAllowed=50&fl=resulting_phrase%2Cquery_id%2Ctype%2Chighlighting&start=0&rows=16&limit=20&q=%7B!exonautocomplete%7Dhoteles
> 
> We're implementing a query autocomplete system, therefore our Lucene
> documents are queries. "query_score" is a field that is indexed and stored
> with every document. It expresses how popular a given query is (i.e. common
> queries like "hotels in barcelona" have a bigger query_score than less
> common queries like "hotels in barcelona near the beach").
> 
> Let me know if you need something else.
> 
> Thanks,
> Carlos
> 
> 
> 
> 
> 
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Mon, Feb 20, 2012 at 3:12 PM, Em  wrote:
> 
>> Could you please provide me the original request (the HTTP-request)?
>> I am a little bit confused to what "query_score" refers.
>> As far as I can see it isn't a magic-value.
>>
>> Kind regards,
>> Em
>>
>> Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas:
>>> Yeah Em, it helped a lot :)
>>>
>>> Here it is (for the user query "hoteles"):
>>>
>>> *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles |
>>> wildcard_stopword_shortened_phrase:hoteles |
>>> wildcard_stopword_phrase:hoteles) *
>>>
>>> *product(pow(query((stopword_shortened_phrase:hoteles |
>>> stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles |
>>>
>> wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))*
>>>
>>> Thanks a lot for your help.
>>>
>>> Carlos
>>> Carlos Gonzalez-Cadenas
>>> CEO, ExperienceOn - New generation search
>>> http://www.experienceon.com
>>>
>>> Mobile: +34 652 911 201
>>> Skype: carlosgonzalezcadenas
>>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
>>>
>>>
>>> On Mon, Feb 20, 2012 at 1:50 PM, Em 
>> wrote:
>>>
 Carlos,

 nice to hear that the approach helped you!

 Could you show us how your query-request looks like after reworking?

 Regards,
 Em

 Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
> Hello all:
>
> We've done some tests with Em's approach of putting a BooleanQuery in
 front
> of our user query, that means:
>
> BooleanQuery
> must (DismaxQuery)
> should (FunctionQuery)
>
> The FunctionQuery obtains the SOLR IR score by mea

Re: custom scoring

2012-02-20 Thread Carlos Gonzalez-Cadenas
Hi Em:

The HTTP request is not gonna help you a lot because we use a custom
QParser (that builds the query that I've pasted before). In any case, here
it is:

http://localhost:8080/solr/core0/select?shards=…(shards
here)…&indent=on&wt=exon&timeAllowed=50&fl=resulting_phrase%2Cquery_id%2Ctype%2Chighlighting&start=0&rows=16&limit=20&q=%7B!exonautocomplete%7Dhoteles

We're implementing a query autocomplete system, therefore our Lucene
documents are queries. "query_score" is a field that is indexed and stored
with every document. It expresses how popular a given query is (i.e. common
queries like "hotels in barcelona" have a bigger query_score than less
common queries like "hotels in barcelona near the beach").

Let me know if you need something else.

Thanks,
Carlos





Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Mon, Feb 20, 2012 at 3:12 PM, Em  wrote:

> Could you please provide me the original request (the HTTP-request)?
> I am a little bit confused to what "query_score" refers.
> As far as I can see it isn't a magic-value.
>
> Kind regards,
> Em
>
> Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas:
> > Yeah Em, it helped a lot :)
> >
> > Here it is (for the user query "hoteles"):
> >
> > *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles |
> > wildcard_stopword_shortened_phrase:hoteles |
> > wildcard_stopword_phrase:hoteles) *
> >
> > *product(pow(query((stopword_shortened_phrase:hoteles |
> > stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles |
> >
> wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))*
> >
> > Thanks a lot for your help.
> >
> > Carlos
> > Carlos Gonzalez-Cadenas
> > CEO, ExperienceOn - New generation search
> > http://www.experienceon.com
> >
> > Mobile: +34 652 911 201
> > Skype: carlosgonzalezcadenas
> > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> >
> >
> > On Mon, Feb 20, 2012 at 1:50 PM, Em 
> wrote:
> >
> >> Carlos,
> >>
> >> nice to hear that the approach helped you!
> >>
> >> Could you show us how your query-request looks like after reworking?
> >>
> >> Regards,
> >> Em
> >>
> >> Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
> >>> Hello all:
> >>>
> >>> We've done some tests with Em's approach of putting a BooleanQuery in
> >> front
> >>> of our user query, that means:
> >>>
> >>> BooleanQuery
> >>> must (DismaxQuery)
> >>> should (FunctionQuery)
> >>>
> >>> The FunctionQuery obtains the SOLR IR score by means of a
> >> QueryValueSource,
> >>> then does the SQRT of this value, and then multiplies it by our custom
> >>> "query_score" float, pulling it by means of a FieldCacheSource.
> >>>
> >>> In particular, we've proceeded in the following way:
> >>>
> >>>- we've loaded the whole index in the page cache of the OS to make
> >> sure
> >>>we don't have disk IO problems that might affect the benchmarks (our
> >>>machine has enough memory to load all the index in RAM)
> >>>- we've executed an out-of-benchmark query 10-20 times to make sure
> >> that
> >>>everything is jitted and that Lucene's FieldCache is properly
> >> populated.
> >>>- we've disabled all the caches (filter query cache, document cache,
> >>>query cache)
> >>>- we've executed 8 different user queries with and without
> >>>FunctionQueries, with early termination in both cases (our collector
> >> stops
> >>>after collecting 50 documents per shard)
> >>>
> >>> Em was correct, the query is much faster with the BooleanQuery in
> front,
> >>> but it's still 30-40% slower than the query without FunctionQueries.
> >>>
> >>> Although one may think that it's reasonable that the query response
> time
> >>> increases because of the extra computations, we believe that the
> increase
> >>> is too big, given that we're collecting just 500-600 documents due to
> the
> >>> early query termination techniques we currently use.
> >>>
> >>> Any ideas on how to make it faster?.
> >>>
> >>> Thanks a lot,
> >>> Carlos
> >>>
> >>> Carlos Gonzalez-Cadenas
> >>> CEO, ExperienceOn - New generation search
> >>> http://www.experienceon.com
> >>>
> >>> Mobile: +34 652 911 201
> >>> Skype: carlosgonzalezcadenas
> >>> LinkedIn: http://www.lin

Re: custom scoring

2012-02-20 Thread Em
Could you please provide me the original request (the HTTP-request)?
I am a little bit confused to what "query_score" refers.
As far as I can see it isn't a magic-value.

Kind regards,
Em

Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas:
> Yeah Em, it helped a lot :)
> 
> Here it is (for the user query "hoteles"):
> 
> *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles |
> wildcard_stopword_shortened_phrase:hoteles |
> wildcard_stopword_phrase:hoteles) *
> 
> *product(pow(query((stopword_shortened_phrase:hoteles |
> stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles |
> wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))*
> 
> Thanks a lot for your help.
> 
> Carlos
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Mon, Feb 20, 2012 at 1:50 PM, Em  wrote:
> 
>> Carlos,
>>
>> nice to hear that the approach helped you!
>>
>> Could you show us how your query-request looks like after reworking?
>>
>> Regards,
>> Em
>>
>> Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
>>> Hello all:
>>>
>>> We've done some tests with Em's approach of putting a BooleanQuery in
>> front
>>> of our user query, that means:
>>>
>>> BooleanQuery
>>> must (DismaxQuery)
>>> should (FunctionQuery)
>>>
>>> The FunctionQuery obtains the SOLR IR score by means of a
>> QueryValueSource,
>>> then does the SQRT of this value, and then multiplies it by our custom
>>> "query_score" float, pulling it by means of a FieldCacheSource.
>>>
>>> In particular, we've proceeded in the following way:
>>>
>>>- we've loaded the whole index in the page cache of the OS to make
>> sure
>>>we don't have disk IO problems that might affect the benchmarks (our
>>>machine has enough memory to load all the index in RAM)
>>>- we've executed an out-of-benchmark query 10-20 times to make sure
>> that
>>>everything is jitted and that Lucene's FieldCache is properly
>> populated.
>>>- we've disabled all the caches (filter query cache, document cache,
>>>query cache)
>>>- we've executed 8 different user queries with and without
>>>FunctionQueries, with early termination in both cases (our collector
>> stops
>>>after collecting 50 documents per shard)
>>>
>>> Em was correct, the query is much faster with the BooleanQuery in front,
>>> but it's still 30-40% slower than the query without FunctionQueries.
>>>
>>> Although one may think that it's reasonable that the query response time
>>> increases because of the extra computations, we believe that the increase
>>> is too big, given that we're collecting just 500-600 documents due to the
>>> early query termination techniques we currently use.
>>>
>>> Any ideas on how to make it faster?.
>>>
>>> Thanks a lot,
>>> Carlos
>>>
>>> Carlos Gonzalez-Cadenas
>>> CEO, ExperienceOn - New generation search
>>> http://www.experienceon.com
>>>
>>> Mobile: +34 652 911 201
>>> Skype: carlosgonzalezcadenas
>>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
>>>
>>>
>>> On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas <
>>> c...@experienceon.com> wrote:
>>>
 Thanks Em, Robert, Chris for your time and valuable advice. We'll make
 some tests and will let you know soon.



 On Thu, Feb 16, 2012 at 11:43 PM, Em 
>> wrote:

> Hello Carlos,
>
> I think we missunderstood eachother.
>
> As an example:
> BooleanQuery (
>  clauses: (
> MustMatch(
>   DisjunctionMaxQuery(
>   TermQuery("stopword_field", "barcelona"),
>   TermQuery("stopword_field", "hoteles")
>   )
> ),
> ShouldMatch(
>  FunctionQuery(
>*please insert your function here*
> )
> )
>  )
> )
>
> Explanation:
> You construct an artificial BooleanQuery which wraps your user's query
> as well as your function query.
> Your user's query - in that case - is just a DisjunctionMaxQuery
> consisting of two TermQueries.
> In the real world you might construct another BooleanQuery around your
> DisjunctionMaxQuery in order to have more flexibility.
> However the interesting part of the given example is, that we specify
> the user's query as a MustMatch-condition of the BooleanQuery and the
> FunctionQuery just as a ShouldMatch.
> Constructed that way, I am expecting the FunctionQuery only scores
>> those
> documents which fit the MustMatch-Condition.
>
> I conclude that from the fact that the FunctionQuery-class also has a
> skipTo-method and I would expect that the scorer will use it to score
> only matching documents (however I did not search where and how it
>> might
> get called).
>

Re: custom scoring

2012-02-20 Thread Carlos Gonzalez-Cadenas
Yeah Em, it helped a lot :)

Here it is (for the user query "hoteles"):

*+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles |
wildcard_stopword_shortened_phrase:hoteles |
wildcard_stopword_phrase:hoteles) *

*product(pow(query((stopword_shortened_phrase:hoteles |
stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles |
wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))*

Thanks a lot for your help.

Carlos
Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Mon, Feb 20, 2012 at 1:50 PM, Em  wrote:

> Carlos,
>
> nice to hear that the approach helped you!
>
> Could you show us how your query-request looks like after reworking?
>
> Regards,
> Em
>
> Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
> > Hello all:
> >
> > We've done some tests with Em's approach of putting a BooleanQuery in
> front
> > of our user query, that means:
> >
> > BooleanQuery
> > must (DismaxQuery)
> > should (FunctionQuery)
> >
> > The FunctionQuery obtains the SOLR IR score by means of a
> QueryValueSource,
> > then does the SQRT of this value, and then multiplies it by our custom
> > "query_score" float, pulling it by means of a FieldCacheSource.
> >
> > In particular, we've proceeded in the following way:
> >
> >- we've loaded the whole index in the page cache of the OS to make
> sure
> >we don't have disk IO problems that might affect the benchmarks (our
> >machine has enough memory to load all the index in RAM)
> >- we've executed an out-of-benchmark query 10-20 times to make sure
> that
> >everything is jitted and that Lucene's FieldCache is properly
> populated.
> >- we've disabled all the caches (filter query cache, document cache,
> >query cache)
> >- we've executed 8 different user queries with and without
> >FunctionQueries, with early termination in both cases (our collector
> stops
> >after collecting 50 documents per shard)
> >
> > Em was correct, the query is much faster with the BooleanQuery in front,
> > but it's still 30-40% slower than the query without FunctionQueries.
> >
> > Although one may think that it's reasonable that the query response time
> > increases because of the extra computations, we believe that the increase
> > is too big, given that we're collecting just 500-600 documents due to the
> > early query termination techniques we currently use.
> >
> > Any ideas on how to make it faster?.
> >
> > Thanks a lot,
> > Carlos
> >
> > Carlos Gonzalez-Cadenas
> > CEO, ExperienceOn - New generation search
> > http://www.experienceon.com
> >
> > Mobile: +34 652 911 201
> > Skype: carlosgonzalezcadenas
> > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> >
> >
> > On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas <
> > c...@experienceon.com> wrote:
> >
> >> Thanks Em, Robert, Chris for your time and valuable advice. We'll make
> >> some tests and will let you know soon.
> >>
> >>
> >>
> >> On Thu, Feb 16, 2012 at 11:43 PM, Em 
> wrote:
> >>
> >>> Hello Carlos,
> >>>
> >>> I think we missunderstood eachother.
> >>>
> >>> As an example:
> >>> BooleanQuery (
> >>>  clauses: (
> >>> MustMatch(
> >>>   DisjunctionMaxQuery(
> >>>   TermQuery("stopword_field", "barcelona"),
> >>>   TermQuery("stopword_field", "hoteles")
> >>>   )
> >>> ),
> >>> ShouldMatch(
> >>>  FunctionQuery(
> >>>*please insert your function here*
> >>> )
> >>> )
> >>>  )
> >>> )
> >>>
> >>> Explanation:
> >>> You construct an artificial BooleanQuery which wraps your user's query
> >>> as well as your function query.
> >>> Your user's query - in that case - is just a DisjunctionMaxQuery
> >>> consisting of two TermQueries.
> >>> In the real world you might construct another BooleanQuery around your
> >>> DisjunctionMaxQuery in order to have more flexibility.
> >>> However the interesting part of the given example is, that we specify
> >>> the user's query as a MustMatch-condition of the BooleanQuery and the
> >>> FunctionQuery just as a ShouldMatch.
> >>> Constructed that way, I am expecting the FunctionQuery only scores
> those
> >>> documents which fit the MustMatch-Condition.
> >>>
> >>> I conclude that from the fact that the FunctionQuery-class also has a
> >>> skipTo-method and I would expect that the scorer will use it to score
> >>> only matching documents (however I did not search where and how it
> might
> >>> get called).
> >>>
> >>> If my conclusion is wrong than hopefully Robert Muir (as far as I can
> >>> see the author of that class) can tell us what was the intention by
> >>> constructing an every-time-match-all-function-query.
> >>>
> >>> Can you validate whether your QueryParser constructs a query in the
> form
> >>> I drew above

Re: custom scoring

2012-02-20 Thread Em
Carlos,

nice to hear that the approach helped you!

Could you show us how your query-request looks like after reworking?

Regards,
Em

Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
> Hello all:
> 
> We've done some tests with Em's approach of putting a BooleanQuery in front
> of our user query, that means:
> 
> BooleanQuery
> must (DismaxQuery)
> should (FunctionQuery)
> 
> The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource,
> then does the SQRT of this value, and then multiplies it by our custom
> "query_score" float, pulling it by means of a FieldCacheSource.
> 
> In particular, we've proceeded in the following way:
> 
>- we've loaded the whole index in the page cache of the OS to make sure
>we don't have disk IO problems that might affect the benchmarks (our
>machine has enough memory to load all the index in RAM)
>- we've executed an out-of-benchmark query 10-20 times to make sure that
>everything is jitted and that Lucene's FieldCache is properly populated.
>- we've disabled all the caches (filter query cache, document cache,
>query cache)
>- we've executed 8 different user queries with and without
>FunctionQueries, with early termination in both cases (our collector stops
>after collecting 50 documents per shard)
> 
> Em was correct, the query is much faster with the BooleanQuery in front,
> but it's still 30-40% slower than the query without FunctionQueries.
> 
> Although one may think that it's reasonable that the query response time
> increases because of the extra computations, we believe that the increase
> is too big, given that we're collecting just 500-600 documents due to the
> early query termination techniques we currently use.
> 
> Any ideas on how to make it faster?.
> 
> Thanks a lot,
> Carlos
> 
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas <
> c...@experienceon.com> wrote:
> 
>> Thanks Em, Robert, Chris for your time and valuable advice. We'll make
>> some tests and will let you know soon.
>>
>>
>>
>> On Thu, Feb 16, 2012 at 11:43 PM, Em  wrote:
>>
>>> Hello Carlos,
>>>
>>> I think we missunderstood eachother.
>>>
>>> As an example:
>>> BooleanQuery (
>>>  clauses: (
>>> MustMatch(
>>>   DisjunctionMaxQuery(
>>>   TermQuery("stopword_field", "barcelona"),
>>>   TermQuery("stopword_field", "hoteles")
>>>   )
>>> ),
>>> ShouldMatch(
>>>  FunctionQuery(
>>>*please insert your function here*
>>> )
>>> )
>>>  )
>>> )
>>>
>>> Explanation:
>>> You construct an artificial BooleanQuery which wraps your user's query
>>> as well as your function query.
>>> Your user's query - in that case - is just a DisjunctionMaxQuery
>>> consisting of two TermQueries.
>>> In the real world you might construct another BooleanQuery around your
>>> DisjunctionMaxQuery in order to have more flexibility.
>>> However the interesting part of the given example is, that we specify
>>> the user's query as a MustMatch-condition of the BooleanQuery and the
>>> FunctionQuery just as a ShouldMatch.
>>> Constructed that way, I am expecting the FunctionQuery only scores those
>>> documents which fit the MustMatch-Condition.
>>>
>>> I conclude that from the fact that the FunctionQuery-class also has a
>>> skipTo-method and I would expect that the scorer will use it to score
>>> only matching documents (however I did not search where and how it might
>>> get called).
>>>
>>> If my conclusion is wrong than hopefully Robert Muir (as far as I can
>>> see the author of that class) can tell us what was the intention by
>>> constructing an every-time-match-all-function-query.
>>>
>>> Can you validate whether your QueryParser constructs a query in the form
>>> I drew above?
>>>
>>> Regards,
>>> Em
>>>
>>> Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas:
 Hello Em:

 1) Here's a printout of an example DisMax query (as you can see mostly
>>> MUST
 terms except for some SHOULD terms used for boosting scores for
>>> stopwords)
 *
 *
 *((+stopword_shortened_phrase:hoteles
>>> +stopword_shortened_phrase:barcelona
 stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
 +stopword_phrase:barcelona
 stopword_phrase:en) | (+stopword_shortened_phrase:hoteles
>>> +stopword_short
 ened_phrase:barcelona stopword_shortened_phrase:en) |
>>> (+stopword_phrase:hoteles
 +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
 tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
 stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
>>> +wildcard_stopw
 ord_phrase:barcelona stopword_phrase:en) |
>>> (+stopword_shortened_ph

Re: custom scoring

2012-02-20 Thread Carlos Gonzalez-Cadenas
Hello all:

We've done some tests with Em's approach of putting a BooleanQuery in front
of our user query, that means:

BooleanQuery
must (DismaxQuery)
should (FunctionQuery)

The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource,
then does the SQRT of this value, and then multiplies it by our custom
"query_score" float, pulling it by means of a FieldCacheSource.

In particular, we've proceeded in the following way:

   - we've loaded the whole index in the page cache of the OS to make sure
   we don't have disk IO problems that might affect the benchmarks (our
   machine has enough memory to load all the index in RAM)
   - we've executed an out-of-benchmark query 10-20 times to make sure that
   everything is jitted and that Lucene's FieldCache is properly populated.
   - we've disabled all the caches (filter query cache, document cache,
   query cache)
   - we've executed 8 different user queries with and without
   FunctionQueries, with early termination in both cases (our collector stops
   after collecting 50 documents per shard)

Em was correct, the query is much faster with the BooleanQuery in front,
but it's still 30-40% slower than the query without FunctionQueries.

Although one may think that it's reasonable that the query response time
increases because of the extra computations, we believe that the increase
is too big, given that we're collecting just 500-600 documents due to the
early query termination techniques we currently use.

Any ideas on how to make it faster?.

Thanks a lot,
Carlos

Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas <
c...@experienceon.com> wrote:

> Thanks Em, Robert, Chris for your time and valuable advice. We'll make
> some tests and will let you know soon.
>
>
>
> On Thu, Feb 16, 2012 at 11:43 PM, Em  wrote:
>
>> Hello Carlos,
>>
>> I think we missunderstood eachother.
>>
>> As an example:
>> BooleanQuery (
>>  clauses: (
>> MustMatch(
>>   DisjunctionMaxQuery(
>>   TermQuery("stopword_field", "barcelona"),
>>   TermQuery("stopword_field", "hoteles")
>>   )
>> ),
>> ShouldMatch(
>>  FunctionQuery(
>>*please insert your function here*
>> )
>> )
>>  )
>> )
>>
>> Explanation:
>> You construct an artificial BooleanQuery which wraps your user's query
>> as well as your function query.
>> Your user's query - in that case - is just a DisjunctionMaxQuery
>> consisting of two TermQueries.
>> In the real world you might construct another BooleanQuery around your
>> DisjunctionMaxQuery in order to have more flexibility.
>> However the interesting part of the given example is, that we specify
>> the user's query as a MustMatch-condition of the BooleanQuery and the
>> FunctionQuery just as a ShouldMatch.
>> Constructed that way, I am expecting the FunctionQuery only scores those
>> documents which fit the MustMatch-Condition.
>>
>> I conclude that from the fact that the FunctionQuery-class also has a
>> skipTo-method and I would expect that the scorer will use it to score
>> only matching documents (however I did not search where and how it might
>> get called).
>>
>> If my conclusion is wrong than hopefully Robert Muir (as far as I can
>> see the author of that class) can tell us what was the intention by
>> constructing an every-time-match-all-function-query.
>>
>> Can you validate whether your QueryParser constructs a query in the form
>> I drew above?
>>
>> Regards,
>> Em
>>
>> Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas:
>> > Hello Em:
>> >
>> > 1) Here's a printout of an example DisMax query (as you can see mostly
>> MUST
>> > terms except for some SHOULD terms used for boosting scores for
>> stopwords)
>> > *
>> > *
>> > *((+stopword_shortened_phrase:hoteles
>> +stopword_shortened_phrase:barcelona
>> > stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
>> > +stopword_phrase:barcelona
>> > stopword_phrase:en) | (+stopword_shortened_phrase:hoteles
>> +stopword_short
>> > ened_phrase:barcelona stopword_shortened_phrase:en) |
>> (+stopword_phrase:hoteles
>> > +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
>> > tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
>> > stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
>> +wildcard_stopw
>> > ord_phrase:barcelona stopword_phrase:en) |
>> (+stopword_shortened_phrase:hoteles
>> > +wildcard_stopword_shortened_phrase:barcelona
>> stopword_shortened_phrase:en)
>> > | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona
>> > stopword_phrase:en))*
>> > *
>> > *
>> > 2)* *The collector is inserted in the SolrIndexSearcher (replacing the
>> > TimeLimitingCollector). We trigger it through the SOLR interface by

Re: custom scoring

2012-02-17 Thread Carlos Gonzalez-Cadenas
Thanks Em, Robert, Chris for your time and valuable advice. We'll make some
tests and will let you know soon.



On Thu, Feb 16, 2012 at 11:43 PM, Em  wrote:

> Hello Carlos,
>
> I think we missunderstood eachother.
>
> As an example:
> BooleanQuery (
>  clauses: (
> MustMatch(
>   DisjunctionMaxQuery(
>   TermQuery("stopword_field", "barcelona"),
>   TermQuery("stopword_field", "hoteles")
>   )
> ),
> ShouldMatch(
>  FunctionQuery(
>*please insert your function here*
> )
> )
>  )
> )
>
> Explanation:
> You construct an artificial BooleanQuery which wraps your user's query
> as well as your function query.
> Your user's query - in that case - is just a DisjunctionMaxQuery
> consisting of two TermQueries.
> In the real world you might construct another BooleanQuery around your
> DisjunctionMaxQuery in order to have more flexibility.
> However the interesting part of the given example is, that we specify
> the user's query as a MustMatch-condition of the BooleanQuery and the
> FunctionQuery just as a ShouldMatch.
> Constructed that way, I am expecting the FunctionQuery only scores those
> documents which fit the MustMatch-Condition.
>
> I conclude that from the fact that the FunctionQuery-class also has a
> skipTo-method and I would expect that the scorer will use it to score
> only matching documents (however I did not search where and how it might
> get called).
>
> If my conclusion is wrong than hopefully Robert Muir (as far as I can
> see the author of that class) can tell us what was the intention by
> constructing an every-time-match-all-function-query.
>
> Can you validate whether your QueryParser constructs a query in the form
> I drew above?
>
> Regards,
> Em
>
> Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas:
> > Hello Em:
> >
> > 1) Here's a printout of an example DisMax query (as you can see mostly
> MUST
> > terms except for some SHOULD terms used for boosting scores for
> stopwords)
> > *
> > *
> > *((+stopword_shortened_phrase:hoteles
> +stopword_shortened_phrase:barcelona
> > stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
> > +stopword_phrase:barcelona
> > stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +stopword_short
> > ened_phrase:barcelona stopword_shortened_phrase:en) |
> (+stopword_phrase:hoteles
> > +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
> > tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
> > stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopw
> > ord_phrase:barcelona stopword_phrase:en) |
> (+stopword_shortened_phrase:hoteles
> > +wildcard_stopword_shortened_phrase:barcelona
> stopword_shortened_phrase:en)
> > | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona
> > stopword_phrase:en))*
> > *
> > *
> > 2)* *The collector is inserted in the SolrIndexSearcher (replacing the
> > TimeLimitingCollector). We trigger it through the SOLR interface by
> passing
> > the timeAllowed parameter. We know this is a hack but AFAIK there's no
> > out-of-the-box way to specify custom collectors by now (
> > https://issues.apache.org/jira/browse/SOLR-1680). In any case the
> collector
> > part works perfectly as of now, so clearly this is not the problem.
> >
> > 3) Re: your sentence:
> > *
> > *
> > **I* would expect that with a shrinking set of matching documents to
> > the overall-query, the function query only checks those documents that
> are
> > guaranteed to be within the result set.*
> > *
> > *
> > Yes, I agree with this, but this snippet of code in FunctionQuery.java
> > seems to say otherwise:
> >
> > // instead of matching all docs, we could also embed a query.
> > // the score could either ignore the subscore, or boost it.
> > // Containment:  floatline(foo:myTerm, "myFloatField", 1.0, 0.0f)
> > // Boost:foo:myTerm^floatline("myFloatField",1.0,0.0f)
> > @Override
> > public int nextDoc() throws IOException {
> >   for(;;) {
> > ++doc;
> > if (doc>=maxDoc) {
> >   return doc=NO_MORE_DOCS;
> > }
> > if (acceptDocs != null && !acceptDocs.get(doc)) continue;
> > return doc;
> >   }
> > }
> >
> > It seems that the author also thought of maybe embedding a query in order
> > to restrict matches, but this doesn't seem to be in place as of now (or
> > maybe I'm not understanding how the whole thing works :) ).
> >
> > Thanks
> > Carlos
> > *
> > *
> >
> > Carlos Gonzalez-Cadenas
> > CEO, ExperienceOn - New generation search
> > http://www.experienceon.com
> >
> > Mobile: +34 652 911 201
> > Skype: carlosgonzalezcadenas
> > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> >
> >
> > On Thu, Feb 16, 2012 at 8:09 PM, Em 
> wrote:
> >
> >> Hello Carlos,
> >>
> >>> We have some more tests on that matter: now we're moving from issuing
> >> this
> >>> large query through the SOL

Re: custom scoring

2012-02-16 Thread Em
I just modified some TestCases a little bit to see how the FunctionQuery
behaves.

Given that you got an index containing 14 docs, where 13 of them
containing the term "batman" and two contain the term "superman", a
search for

q=+text:superman _val_:"query($qq)"&qq=text:superman

Leads to two hits and the FunctionQuery has two iterations.

If you remove that little plus-symbol before "text:superman", it
wouldn't be a mustMatch-condition anymore and the whole query results
into 14 hits (default-operator is OR):

q=text:superman _val_:"query($qq)"&qq=text:superman

If both queries, the TermQuery and the FunctionQuery must match, it
would also result into two hits:

q=text:superman AND _val_:"query($qq)"&qq=text:superman

There is some behaviour that I currently don't understand (if 14 docs
match, the FunctionQuery's AllScorer re-iterates for two times over the
0th and the 1st doc and the reason for that seems to be the construction
of two AllScorers), but as far as I can see the performance of your
queries *should* increase if you construct your query as I explained in
my last eMail.

Kind regards,
Em

Am 16.02.2012 23:43, schrieb Em:
> Hello Carlos,
> 
> I think we missunderstood eachother.
> 
> As an example:
> BooleanQuery (
>   clauses: (
>  MustMatch(
>DisjunctionMaxQuery(
>TermQuery("stopword_field", "barcelona"),
>TermQuery("stopword_field", "hoteles")
>)
>  ),
>  ShouldMatch(
>   FunctionQuery(
> *please insert your function here*
>  )
>  )
>   )
> )
> 
> Explanation:
> You construct an artificial BooleanQuery which wraps your user's query
> as well as your function query.
> Your user's query - in that case - is just a DisjunctionMaxQuery
> consisting of two TermQueries.
> In the real world you might construct another BooleanQuery around your
> DisjunctionMaxQuery in order to have more flexibility.
> However the interesting part of the given example is, that we specify
> the user's query as a MustMatch-condition of the BooleanQuery and the
> FunctionQuery just as a ShouldMatch.
> Constructed that way, I am expecting the FunctionQuery only scores those
> documents which fit the MustMatch-Condition.
> 
> I conclude that from the fact that the FunctionQuery-class also has a
> skipTo-method and I would expect that the scorer will use it to score
> only matching documents (however I did not search where and how it might
> get called).
> 
> If my conclusion is wrong than hopefully Robert Muir (as far as I can
> see the author of that class) can tell us what was the intention by
> constructing an every-time-match-all-function-query.
> 
> Can you validate whether your QueryParser constructs a query in the form
> I drew above?
> 
> Regards,
> Em
> 
> Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas:
>> Hello Em:
>>
>> 1) Here's a printout of an example DisMax query (as you can see mostly MUST
>> terms except for some SHOULD terms used for boosting scores for stopwords)
>> *
>> *
>> *((+stopword_shortened_phrase:hoteles +stopword_shortened_phrase:barcelona
>> stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
>> +stopword_phrase:barcelona
>> stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +stopword_short
>> ened_phrase:barcelona stopword_shortened_phrase:en) | 
>> (+stopword_phrase:hoteles
>> +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
>> tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
>> stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopw
>> ord_phrase:barcelona stopword_phrase:en) | 
>> (+stopword_shortened_phrase:hoteles
>> +wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en)
>> | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona
>> stopword_phrase:en))*
>> *
>> *
>> 2)* *The collector is inserted in the SolrIndexSearcher (replacing the
>> TimeLimitingCollector). We trigger it through the SOLR interface by passing
>> the timeAllowed parameter. We know this is a hack but AFAIK there's no
>> out-of-the-box way to specify custom collectors by now (
>> https://issues.apache.org/jira/browse/SOLR-1680). In any case the collector
>> part works perfectly as of now, so clearly this is not the problem.
>>
>> 3) Re: your sentence:
>> *
>> *
>> **I* would expect that with a shrinking set of matching documents to
>> the overall-query, the function query only checks those documents that are
>> guaranteed to be within the result set.*
>> *
>> *
>> Yes, I agree with this, but this snippet of code in FunctionQuery.java
>> seems to say otherwise:
>>
>> // instead of matching all docs, we could also embed a query.
>> // the score could either ignore the subscore, or boost it.
>> // Containment:  floatline(foo:myTerm, "myFloatField", 1.0, 0.0f)
>> // Boost:foo:myTerm^floatline("myFloatField",1.0,0.0f)
>> @Override
>> public int nextDoc() thr

Re: custom scoring

2012-02-16 Thread Em
Hello Carlos,

I think we missunderstood eachother.

As an example:
BooleanQuery (
  clauses: (
 MustMatch(
   DisjunctionMaxQuery(
   TermQuery("stopword_field", "barcelona"),
   TermQuery("stopword_field", "hoteles")
   )
 ),
 ShouldMatch(
  FunctionQuery(
*please insert your function here*
 )
 )
  )
)

Explanation:
You construct an artificial BooleanQuery which wraps your user's query
as well as your function query.
Your user's query - in that case - is just a DisjunctionMaxQuery
consisting of two TermQueries.
In the real world you might construct another BooleanQuery around your
DisjunctionMaxQuery in order to have more flexibility.
However the interesting part of the given example is, that we specify
the user's query as a MustMatch-condition of the BooleanQuery and the
FunctionQuery just as a ShouldMatch.
Constructed that way, I am expecting the FunctionQuery only scores those
documents which fit the MustMatch-Condition.

I conclude that from the fact that the FunctionQuery-class also has a
skipTo-method and I would expect that the scorer will use it to score
only matching documents (however I did not search where and how it might
get called).

If my conclusion is wrong than hopefully Robert Muir (as far as I can
see the author of that class) can tell us what was the intention by
constructing an every-time-match-all-function-query.

Can you validate whether your QueryParser constructs a query in the form
I drew above?

Regards,
Em

Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas:
> Hello Em:
> 
> 1) Here's a printout of an example DisMax query (as you can see mostly MUST
> terms except for some SHOULD terms used for boosting scores for stopwords)
> *
> *
> *((+stopword_shortened_phrase:hoteles +stopword_shortened_phrase:barcelona
> stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
> +stopword_phrase:barcelona
> stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +stopword_short
> ened_phrase:barcelona stopword_shortened_phrase:en) | 
> (+stopword_phrase:hoteles
> +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
> tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
> stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopw
> ord_phrase:barcelona stopword_phrase:en) | (+stopword_shortened_phrase:hoteles
> +wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en)
> | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona
> stopword_phrase:en))*
> *
> *
> 2)* *The collector is inserted in the SolrIndexSearcher (replacing the
> TimeLimitingCollector). We trigger it through the SOLR interface by passing
> the timeAllowed parameter. We know this is a hack but AFAIK there's no
> out-of-the-box way to specify custom collectors by now (
> https://issues.apache.org/jira/browse/SOLR-1680). In any case the collector
> part works perfectly as of now, so clearly this is not the problem.
> 
> 3) Re: your sentence:
> *
> *
> **I* would expect that with a shrinking set of matching documents to
> the overall-query, the function query only checks those documents that are
> guaranteed to be within the result set.*
> *
> *
> Yes, I agree with this, but this snippet of code in FunctionQuery.java
> seems to say otherwise:
> 
> // instead of matching all docs, we could also embed a query.
> // the score could either ignore the subscore, or boost it.
> // Containment:  floatline(foo:myTerm, "myFloatField", 1.0, 0.0f)
> // Boost:foo:myTerm^floatline("myFloatField",1.0,0.0f)
> @Override
> public int nextDoc() throws IOException {
>   for(;;) {
> ++doc;
> if (doc>=maxDoc) {
>   return doc=NO_MORE_DOCS;
> }
> if (acceptDocs != null && !acceptDocs.get(doc)) continue;
> return doc;
>   }
> }
> 
> It seems that the author also thought of maybe embedding a query in order
> to restrict matches, but this doesn't seem to be in place as of now (or
> maybe I'm not understanding how the whole thing works :) ).
> 
> Thanks
> Carlos
> *
> *
> 
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Thu, Feb 16, 2012 at 8:09 PM, Em  wrote:
> 
>> Hello Carlos,
>>
>>> We have some more tests on that matter: now we're moving from issuing
>> this
>>> large query through the SOLR interface to creating our own
>> QueryParser. The
>>> initial tests we've done in our QParser (that internally creates multiple
>>> queries and inserts them inside a DisjunctionMaxQuery) are very good,
>> we're
>>> getting very good response times and high quality answers. But when we've
>>> tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a
>>> QueryValueSource that wraps th

Re: custom scoring

2012-02-16 Thread Robert Muir
On Thu, Feb 16, 2012 at 8:34 AM, Carlos Gonzalez-Cadenas
 wrote:
> Hello all:
>
> We'd like to score the matching documents using a combination of SOLR's IR
> score with another application-specific score that we store within the
> documents themselves (i.e. a float field containing the app-specific
> score). In particular, we'd like to calculate the final score doing some
> operations with both numbers (i.e product, sqrt, ...)
...
>
> 1) Apart from the two options I mentioned, is there any other (simple) way
> to achieve this that we're not aware of?
>

In general there is always a third option, that may or may not fit,
depending really upon how you are trying to model relevance and how
you want to integrate with scoring, and thats to tie in your factors
directly into Similarity (lucene's term weighting api). For example,
some people use index-time boosting, but in lucene index-time boost
really just means 'make the document appear shorter'. You might for
example, have other boosts that modify term-frequency before
normalization, or however you want to do it. Similarity is pluggable
into Solr via schema.xml.

Since you are using trunk, this is a lot more flexible than previous
releases, e.g. you can access things from FieldCache, DocValues, or
even your own rapidly-changing float[] or whatever you want :) There
are also a lot more predefined models than just the vector space model
to work with if you find you can easily imagine your notion of
relevance in terms of an existing model.

-- 
lucidimagination.com


Re: custom scoring

2012-02-16 Thread Chris Hostetter

: We'd like to score the matching documents using a combination of SOLR's IR
: score with another application-specific score that we store within the
: documents themselves (i.e. a float field containing the app-specific
: score). In particular, we'd like to calculate the final score doing some
: operations with both numbers (i.e product, sqrt, ...)

let's back up a minute.

if your ultimate goal is to have the final score of all documents be a 
simple multiplication of an indexed field ("query_score") against the 
score of your "base" query, that's fairely trivial use of the 
BoostQParser...

q={!boost f=query_score}your base query

...or to split it out using pram derefrencing...

q={!boost f=query_score v=$qq}
qq=your base query

: A) Sort by function [1]: We've tested an expression like
: "sort=product(score, query_score)" in the SOLR query, where score is the
: common SOLR IR score and query_score is our own precalculated score, but it
: seems that SOLR can only do this with stored/indexed fields (and obviously
: "score" is not stored/indexed).

you could do this by replacing "score" with the query whose score you 
want, which could be a ref back to "$q" -- but that's really only needed 
if you want the "scores" returned for each document to be differnt then the 
value used for sorting (ie: score comes from solr, sort value includes you 
query_score and the score from the main query -- or some completley diff 
query)

based on what you've said, you don't need that and it would be 
unneccessary overhead.

: B) Function queries: We've used _val_ and function queries like max, sqrt
: and query, and we've obtained the desired results from a functional point
: of view. However, our index is quite large (400M documents) and the
: performance degrades heavily, given that function queries are AFAIK
: matching all the documents.

based on the examples you've given in your subsequent queries, it's not 
hard to see why...

> "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),

wrapping queries in functions in queries can have that effect, because 
functions ultimatley match all documents -- even when that function wraps 
a query -- so your outermost query is still scoring every document in the 
index.

you want to do as much "pruning" with the query as possible, and only 
multiply by your boost function on matching docs, hence the 
purpose of the BoostQParser.

-Hoss


Re: custom scoring

2012-02-16 Thread Carlos Gonzalez-Cadenas
Hello Em:

1) Here's a printout of an example DisMax query (as you can see mostly MUST
terms except for some SHOULD terms used for boosting scores for stopwords)
*
*
*((+stopword_shortened_phrase:hoteles +stopword_shortened_phrase:barcelona
stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
+stopword_phrase:barcelona
stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +stopword_short
ened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
+stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopw
ord_phrase:barcelona stopword_phrase:en) | (+stopword_shortened_phrase:hoteles
+wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en)
| (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona
stopword_phrase:en))*
*
*
2)* *The collector is inserted in the SolrIndexSearcher (replacing the
TimeLimitingCollector). We trigger it through the SOLR interface by passing
the timeAllowed parameter. We know this is a hack but AFAIK there's no
out-of-the-box way to specify custom collectors by now (
https://issues.apache.org/jira/browse/SOLR-1680). In any case the collector
part works perfectly as of now, so clearly this is not the problem.

3) Re: your sentence:
*
*
**I* would expect that with a shrinking set of matching documents to
the overall-query, the function query only checks those documents that are
guaranteed to be within the result set.*
*
*
Yes, I agree with this, but this snippet of code in FunctionQuery.java
seems to say otherwise:

// instead of matching all docs, we could also embed a query.
// the score could either ignore the subscore, or boost it.
// Containment:  floatline(foo:myTerm, "myFloatField", 1.0, 0.0f)
// Boost:foo:myTerm^floatline("myFloatField",1.0,0.0f)
@Override
public int nextDoc() throws IOException {
  for(;;) {
++doc;
if (doc>=maxDoc) {
  return doc=NO_MORE_DOCS;
}
if (acceptDocs != null && !acceptDocs.get(doc)) continue;
return doc;
  }
}

It seems that the author also thought of maybe embedding a query in order
to restrict matches, but this doesn't seem to be in place as of now (or
maybe I'm not understanding how the whole thing works :) ).

Thanks
Carlos
*
*

Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Thu, Feb 16, 2012 at 8:09 PM, Em  wrote:

> Hello Carlos,
>
> > We have some more tests on that matter: now we're moving from issuing
> this
> > large query through the SOLR interface to creating our own
> QueryParser. The
> > initial tests we've done in our QParser (that internally creates multiple
> > queries and inserts them inside a DisjunctionMaxQuery) are very good,
> we're
> > getting very good response times and high quality answers. But when we've
> > tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a
> > QueryValueSource that wraps the DisMaxQuery), then the times move from
> > 10-20 msec to 200-300msec.
> I reviewed the sourcecode and yes, the FunctionQuery iterates over the
> whole index, however... let's see!
>
> In relation to the DisMaxQuery you create within your parser: What kind
> of clause is the FunctionQuery and what kind of clause are your other
> queries (MUST, SHOULD, MUST_NOT...)?
>
> *I* would expect that with a shrinking set of matching documents to the
> overall-query, the function query only checks those documents that are
> guaranteed to be within the result set.
>
> > Note that we're using early termination of queries (via a custom
> > collector), and therefore (as shown by the numbers I included above) even
> > if the query is very complex, we're getting very fast answers. The only
> > situation where the response time explodes is when we include a
> > FunctionQuery.
> Could you give us some details about how/where did you plugin the
> Collector, please?
>
> Kind regards,
> Em
>
> Am 16.02.2012 19:41, schrieb Carlos Gonzalez-Cadenas:
> > Hello Em:
> >
> > Thanks for your answer.
> >
> > Yes, we initially also thought that the excessive increase in response
> time
> > was caused by the several queries being executed, and we did another
> test.
> > We executed one of the subqueries that I've shown to you directly in the
> > "q" parameter and then we tested this same subquery (only this one,
> without
> > the others) with the function query "query($q1)" in the "q" parameter.
> >
> > Theoretically the times for these two queries should be more or less the
> > same, but the second one is several times slower than the first one.
> After
> > this observation we learned more about function queries and we learned
> from
> > the code and from some comments in the forums [1] that the

Re: custom scoring

2012-02-16 Thread Em
Hello Carlos,

> We have some more tests on that matter: now we're moving from issuing this
> large query through the SOLR interface to creating our own
QueryParser. The
> initial tests we've done in our QParser (that internally creates multiple
> queries and inserts them inside a DisjunctionMaxQuery) are very good,
we're
> getting very good response times and high quality answers. But when we've
> tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a
> QueryValueSource that wraps the DisMaxQuery), then the times move from
> 10-20 msec to 200-300msec.
I reviewed the sourcecode and yes, the FunctionQuery iterates over the
whole index, however... let's see!

In relation to the DisMaxQuery you create within your parser: What kind
of clause is the FunctionQuery and what kind of clause are your other
queries (MUST, SHOULD, MUST_NOT...)?

*I* would expect that with a shrinking set of matching documents to the
overall-query, the function query only checks those documents that are
guaranteed to be within the result set.

> Note that we're using early termination of queries (via a custom
> collector), and therefore (as shown by the numbers I included above) even
> if the query is very complex, we're getting very fast answers. The only
> situation where the response time explodes is when we include a
> FunctionQuery.
Could you give us some details about how/where did you plugin the
Collector, please?

Kind regards,
Em

Am 16.02.2012 19:41, schrieb Carlos Gonzalez-Cadenas:
> Hello Em:
> 
> Thanks for your answer.
> 
> Yes, we initially also thought that the excessive increase in response time
> was caused by the several queries being executed, and we did another test.
> We executed one of the subqueries that I've shown to you directly in the
> "q" parameter and then we tested this same subquery (only this one, without
> the others) with the function query "query($q1)" in the "q" parameter.
> 
> Theoretically the times for these two queries should be more or less the
> same, but the second one is several times slower than the first one. After
> this observation we learned more about function queries and we learned from
> the code and from some comments in the forums [1] that the FunctionQueries
> are expected to match all documents.
> 
> We have some more tests on that matter: now we're moving from issuing this
> large query through the SOLR interface to creating our own QueryParser. The
> initial tests we've done in our QParser (that internally creates multiple
> queries and inserts them inside a DisjunctionMaxQuery) are very good, we're
> getting very good response times and high quality answers. But when we've
> tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a
> QueryValueSource that wraps the DisMaxQuery), then the times move from
> 10-20 msec to 200-300msec.
> 
> Note that we're using early termination of queries (via a custom
> collector), and therefore (as shown by the numbers I included above) even
> if the query is very complex, we're getting very fast answers. The only
> situation where the response time explodes is when we include a
> FunctionQuery.
> 
> Re: your question of what we're trying to achieve ... We're implementing a
> powerful query autocomplete system, and we use several fields to a) improve
> performance on wildcard queries and b) have a very precise control over the
> score.
> 
> Thanks a lot for your help,
> Carlos
> 
> [1]: http://grokbase.com/p/lucene/solr-user/11bjw87bt5/functionquery-score-0
> 
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Thu, Feb 16, 2012 at 7:09 PM, Em  wrote:
> 
>> Hello Carlos,
>>
>> well, you must take into account that you are executing up to 8 queries
>> per request instead of one query per request.
>>
>> I am not totally sure about the details of the implementation of the
>> max-function-query, but I guess it first iterates over the results of
>> the first max-query, afterwards over the results of the second max-query
>> and so on. This is a much higher complexity than in the case of a normal
>> query.
>>
>> I would suggest you to optimize your request. I don't think that this
>> particular function query is matching *all* docs. Instead I think it
>> just matches those docs specified by your inner-query (although I might
>> be wrong about that).
>>
>> What are you trying to achieve by your request?
>>
>> Regards,
>> Em
>>
>> Am 16.02.2012 16:24, schrieb Carlos Gonzalez-Cadenas:
>>> Hello Em:
>>>
>>> The URL is quite large (w/ shards, ...), maybe it's best if I paste the
>>> relevant parts.
>>>
>>> Our "q" parameter is:
>>>
>>>
>> "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)\"",
>>>
>>> The subqueries q8, q7, q4 and q3 are regular queries, for example:
>>>
>>> "q7":"stopword_phrase:colomba

Re: custom scoring

2012-02-16 Thread Carlos Gonzalez-Cadenas
Hello Em:

Thanks for your answer.

Yes, we initially also thought that the excessive increase in response time
was caused by the several queries being executed, and we did another test.
We executed one of the subqueries that I've shown to you directly in the
"q" parameter and then we tested this same subquery (only this one, without
the others) with the function query "query($q1)" in the "q" parameter.

Theoretically the times for these two queries should be more or less the
same, but the second one is several times slower than the first one. After
this observation we learned more about function queries and we learned from
the code and from some comments in the forums [1] that the FunctionQueries
are expected to match all documents.

We have some more tests on that matter: now we're moving from issuing this
large query through the SOLR interface to creating our own QueryParser. The
initial tests we've done in our QParser (that internally creates multiple
queries and inserts them inside a DisjunctionMaxQuery) are very good, we're
getting very good response times and high quality answers. But when we've
tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a
QueryValueSource that wraps the DisMaxQuery), then the times move from
10-20 msec to 200-300msec.

Note that we're using early termination of queries (via a custom
collector), and therefore (as shown by the numbers I included above) even
if the query is very complex, we're getting very fast answers. The only
situation where the response time explodes is when we include a
FunctionQuery.

Re: your question of what we're trying to achieve ... We're implementing a
powerful query autocomplete system, and we use several fields to a) improve
performance on wildcard queries and b) have a very precise control over the
score.

Thanks a lot for your help,
Carlos

[1]: http://grokbase.com/p/lucene/solr-user/11bjw87bt5/functionquery-score-0

Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Thu, Feb 16, 2012 at 7:09 PM, Em  wrote:

> Hello Carlos,
>
> well, you must take into account that you are executing up to 8 queries
> per request instead of one query per request.
>
> I am not totally sure about the details of the implementation of the
> max-function-query, but I guess it first iterates over the results of
> the first max-query, afterwards over the results of the second max-query
> and so on. This is a much higher complexity than in the case of a normal
> query.
>
> I would suggest you to optimize your request. I don't think that this
> particular function query is matching *all* docs. Instead I think it
> just matches those docs specified by your inner-query (although I might
> be wrong about that).
>
> What are you trying to achieve by your request?
>
> Regards,
> Em
>
> Am 16.02.2012 16:24, schrieb Carlos Gonzalez-Cadenas:
> > Hello Em:
> >
> > The URL is quite large (w/ shards, ...), maybe it's best if I paste the
> > relevant parts.
> >
> > Our "q" parameter is:
> >
> >
> "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)\"",
> >
> > The subqueries q8, q7, q4 and q3 are regular queries, for example:
> >
> > "q7":"stopword_phrase:colomba~1 AND stopword_phrase:santa AND
> > wildcard_stopword_phrase:car^0.7 AND stopword_phrase:hoteles OR
> > (stopword_phrase:las AND stopword_phrase:de)"
> >
> > We've executed the subqueries q3-q8 independently and they're very fast,
> > but when we introduce the function queries as described below, it all
> goes
> > 10X slower.
> >
> > Let me know if you need anything else.
> >
> > Thanks
> > Carlos
> >
> >
> > Carlos Gonzalez-Cadenas
> > CEO, ExperienceOn - New generation search
> > http://www.experienceon.com
> >
> > Mobile: +34 652 911 201
> > Skype: carlosgonzalezcadenas
> > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> >
> >
> > On Thu, Feb 16, 2012 at 4:02 PM, Em 
> wrote:
> >
> >> Hello carlos,
> >>
> >> could you show us how your Solr-call looks like?
> >>
> >> Regards,
> >> Em
> >>
> >> Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas:
> >>> Hello all:
> >>>
> >>> We'd like to score the matching documents using a combination of SOLR's
> >> IR
> >>> score with another application-specific score that we store within the
> >>> documents themselves (i.e. a float field containing the app-specific
> >>> score). In particular, we'd like to calculate the final score doing
> some
> >>> operations with both numbers (i.e product, sqrt, ...)
> >>>
> >>> According to what we know, there are two ways to do this in SOLR:
> >>>
> >>> A) Sort by function [1]: We've tested an expression like
> >>> "sort=product(score, query_score)" in the SOLR query, where score is
> the
> >>> common SOLR IR score and query_score is our own precalculated score,
> but
> >> it
> >>> seems that SOLR can only do this w

Re: custom scoring

2012-02-16 Thread Em
Hello Carlos,

well, you must take into account that you are executing up to 8 queries
per request instead of one query per request.

I am not totally sure about the details of the implementation of the
max-function-query, but I guess it first iterates over the results of
the first max-query, afterwards over the results of the second max-query
and so on. This is a much higher complexity than in the case of a normal
query.

I would suggest you to optimize your request. I don't think that this
particular function query is matching *all* docs. Instead I think it
just matches those docs specified by your inner-query (although I might
be wrong about that).

What are you trying to achieve by your request?

Regards,
Em

Am 16.02.2012 16:24, schrieb Carlos Gonzalez-Cadenas:
> Hello Em:
> 
> The URL is quite large (w/ shards, ...), maybe it's best if I paste the
> relevant parts.
> 
> Our "q" parameter is:
> 
>   
> "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)\"",
> 
> The subqueries q8, q7, q4 and q3 are regular queries, for example:
> 
> "q7":"stopword_phrase:colomba~1 AND stopword_phrase:santa AND
> wildcard_stopword_phrase:car^0.7 AND stopword_phrase:hoteles OR
> (stopword_phrase:las AND stopword_phrase:de)"
> 
> We've executed the subqueries q3-q8 independently and they're very fast,
> but when we introduce the function queries as described below, it all goes
> 10X slower.
> 
> Let me know if you need anything else.
> 
> Thanks
> Carlos
> 
> 
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Thu, Feb 16, 2012 at 4:02 PM, Em  wrote:
> 
>> Hello carlos,
>>
>> could you show us how your Solr-call looks like?
>>
>> Regards,
>> Em
>>
>> Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas:
>>> Hello all:
>>>
>>> We'd like to score the matching documents using a combination of SOLR's
>> IR
>>> score with another application-specific score that we store within the
>>> documents themselves (i.e. a float field containing the app-specific
>>> score). In particular, we'd like to calculate the final score doing some
>>> operations with both numbers (i.e product, sqrt, ...)
>>>
>>> According to what we know, there are two ways to do this in SOLR:
>>>
>>> A) Sort by function [1]: We've tested an expression like
>>> "sort=product(score, query_score)" in the SOLR query, where score is the
>>> common SOLR IR score and query_score is our own precalculated score, but
>> it
>>> seems that SOLR can only do this with stored/indexed fields (and
>> obviously
>>> "score" is not stored/indexed).
>>>
>>> B) Function queries: We've used _val_ and function queries like max, sqrt
>>> and query, and we've obtained the desired results from a functional point
>>> of view. However, our index is quite large (400M documents) and the
>>> performance degrades heavily, given that function queries are AFAIK
>>> matching all the documents.
>>>
>>> I have two questions:
>>>
>>> 1) Apart from the two options I mentioned, is there any other (simple)
>> way
>>> to achieve this that we're not aware of?
>>>
>>> 2) If we have to choose the function queries path, would it be very
>>> difficult to modify the actual implementation so that it doesn't match
>> all
>>> the documents, that is, to pass a query so that it only operates over the
>>> documents matching the query?. Looking at the FunctionQuery.java source
>>> code, there's a comment that says "// instead of matching all docs, we
>>> could also embed a query. the score could either ignore the subscore, or
>>> boost it", which is giving us some hope that maybe it's possible and even
>>> desirable to go in this direction. If you can give us some directions
>> about
>>> how to go about this, we may be able to do the actual implementation.
>>>
>>> BTW, we're using Lucene/SOLR trunk.
>>>
>>> Thanks a lot for your help.
>>> Carlos
>>>
>>> [1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
>>>
>>
> 


Re: custom scoring

2012-02-16 Thread Carlos Gonzalez-Cadenas
Hello Em:

The URL is quite large (w/ shards, ...), maybe it's best if I paste the
relevant parts.

Our "q" parameter is:

  
"q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)\"",

The subqueries q8, q7, q4 and q3 are regular queries, for example:

"q7":"stopword_phrase:colomba~1 AND stopword_phrase:santa AND
wildcard_stopword_phrase:car^0.7 AND stopword_phrase:hoteles OR
(stopword_phrase:las AND stopword_phrase:de)"

We've executed the subqueries q3-q8 independently and they're very fast,
but when we introduce the function queries as described below, it all goes
10X slower.

Let me know if you need anything else.

Thanks
Carlos


Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Thu, Feb 16, 2012 at 4:02 PM, Em  wrote:

> Hello carlos,
>
> could you show us how your Solr-call looks like?
>
> Regards,
> Em
>
> Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas:
> > Hello all:
> >
> > We'd like to score the matching documents using a combination of SOLR's
> IR
> > score with another application-specific score that we store within the
> > documents themselves (i.e. a float field containing the app-specific
> > score). In particular, we'd like to calculate the final score doing some
> > operations with both numbers (i.e product, sqrt, ...)
> >
> > According to what we know, there are two ways to do this in SOLR:
> >
> > A) Sort by function [1]: We've tested an expression like
> > "sort=product(score, query_score)" in the SOLR query, where score is the
> > common SOLR IR score and query_score is our own precalculated score, but
> it
> > seems that SOLR can only do this with stored/indexed fields (and
> obviously
> > "score" is not stored/indexed).
> >
> > B) Function queries: We've used _val_ and function queries like max, sqrt
> > and query, and we've obtained the desired results from a functional point
> > of view. However, our index is quite large (400M documents) and the
> > performance degrades heavily, given that function queries are AFAIK
> > matching all the documents.
> >
> > I have two questions:
> >
> > 1) Apart from the two options I mentioned, is there any other (simple)
> way
> > to achieve this that we're not aware of?
> >
> > 2) If we have to choose the function queries path, would it be very
> > difficult to modify the actual implementation so that it doesn't match
> all
> > the documents, that is, to pass a query so that it only operates over the
> > documents matching the query?. Looking at the FunctionQuery.java source
> > code, there's a comment that says "// instead of matching all docs, we
> > could also embed a query. the score could either ignore the subscore, or
> > boost it", which is giving us some hope that maybe it's possible and even
> > desirable to go in this direction. If you can give us some directions
> about
> > how to go about this, we may be able to do the actual implementation.
> >
> > BTW, we're using Lucene/SOLR trunk.
> >
> > Thanks a lot for your help.
> > Carlos
> >
> > [1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
> >
>


Re: custom scoring

2012-02-16 Thread Em
Hello carlos,

could you show us how your Solr-call looks like?

Regards,
Em

Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas:
> Hello all:
> 
> We'd like to score the matching documents using a combination of SOLR's IR
> score with another application-specific score that we store within the
> documents themselves (i.e. a float field containing the app-specific
> score). In particular, we'd like to calculate the final score doing some
> operations with both numbers (i.e product, sqrt, ...)
> 
> According to what we know, there are two ways to do this in SOLR:
> 
> A) Sort by function [1]: We've tested an expression like
> "sort=product(score, query_score)" in the SOLR query, where score is the
> common SOLR IR score and query_score is our own precalculated score, but it
> seems that SOLR can only do this with stored/indexed fields (and obviously
> "score" is not stored/indexed).
> 
> B) Function queries: We've used _val_ and function queries like max, sqrt
> and query, and we've obtained the desired results from a functional point
> of view. However, our index is quite large (400M documents) and the
> performance degrades heavily, given that function queries are AFAIK
> matching all the documents.
> 
> I have two questions:
> 
> 1) Apart from the two options I mentioned, is there any other (simple) way
> to achieve this that we're not aware of?
> 
> 2) If we have to choose the function queries path, would it be very
> difficult to modify the actual implementation so that it doesn't match all
> the documents, that is, to pass a query so that it only operates over the
> documents matching the query?. Looking at the FunctionQuery.java source
> code, there's a comment that says "// instead of matching all docs, we
> could also embed a query. the score could either ignore the subscore, or
> boost it", which is giving us some hope that maybe it's possible and even
> desirable to go in this direction. If you can give us some directions about
> how to go about this, we may be able to do the actual implementation.
> 
> BTW, we're using Lucene/SOLR trunk.
> 
> Thanks a lot for your help.
> Carlos
> 
> [1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
> 


custom scoring

2012-02-16 Thread Carlos Gonzalez-Cadenas
Hello all:

We'd like to score the matching documents using a combination of SOLR's IR
score with another application-specific score that we store within the
documents themselves (i.e. a float field containing the app-specific
score). In particular, we'd like to calculate the final score doing some
operations with both numbers (i.e product, sqrt, ...)

According to what we know, there are two ways to do this in SOLR:

A) Sort by function [1]: We've tested an expression like
"sort=product(score, query_score)" in the SOLR query, where score is the
common SOLR IR score and query_score is our own precalculated score, but it
seems that SOLR can only do this with stored/indexed fields (and obviously
"score" is not stored/indexed).

B) Function queries: We've used _val_ and function queries like max, sqrt
and query, and we've obtained the desired results from a functional point
of view. However, our index is quite large (400M documents) and the
performance degrades heavily, given that function queries are AFAIK
matching all the documents.

I have two questions:

1) Apart from the two options I mentioned, is there any other (simple) way
to achieve this that we're not aware of?

2) If we have to choose the function queries path, would it be very
difficult to modify the actual implementation so that it doesn't match all
the documents, that is, to pass a query so that it only operates over the
documents matching the query?. Looking at the FunctionQuery.java source
code, there's a comment that says "// instead of matching all docs, we
could also embed a query. the score could either ignore the subscore, or
boost it", which is giving us some hope that maybe it's possible and even
desirable to go in this direction. If you can give us some directions about
how to go about this, we may be able to do the actual implementation.

BTW, we're using Lucene/SOLR trunk.

Thanks a lot for your help.
Carlos

[1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function


custom scoring

2012-02-13 Thread Carlos Gonzalez-Cadenas
Hello all:

We'd like to score the matching documents using a combination of SOLR's IR
score with another application-specific score that we store within the
documents themselves (i.e. a float field containing the app-specific
score). In particular, we'd like to calculate the final score doing some
operations with both numbers (i.e product, sqrt, ...)

According to what we know, there are two ways to do this in SOLR:

A) Sort by function [1]: We've tested an expression like
"sort=product(score, query_score)" in the SOLR query, where score is the
common SOLR IR score and query_score is our own precalculated score, but it
seems that SOLR can only do this with stored/indexed fields (and obviously
"score" is not stored/indexed).

B) Function queries: We've used _val_ and function queries like max, sqrt
and query, and we've obtained the desired results from a functional point
of view. However, our index is quite large (400M documents) and the
performance degrades heavily, given that function queries are AFAIK
matching all the documents.

I have two questions:

1) Apart from the two options I mentioned, is there any other (simple) way
to achieve this that we're not aware of?

2) If we have to choose the function queries path, would it be very
difficult to modify the actual implementation so that it doesn't match all
the documents, that is, to pass a query so that it only operates over the
documents matching the query?. Looking at the FunctionQuery.java source
code, there's a comment that says "// instead of matching all docs, we
could also embed a query. the score could either ignore the subscore, or
boost it", which is giving us some hope that maybe it's possible and even
desirable to go in this direction. If you can give us some directions about
how to go about this, we may be able to do the actual implementation.

BTW, we're using Lucene/SOLR trunk.

Thanks a lot for your help.
Carlos

[1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function


How to do custom scoring using query parameters?

2011-06-01 Thread ngaurav2005
Hi All,

We need to score documents based on some parameters received in query
string. Since this was not possible via function query as we need to use
"if" condition, which can be emulated through map function, but one of the
output values of "if" condition has to be function, where as map only
accepts constants. So if I rephrase my requirements, it would be:

1. Calculate score for each document using query parameters(search
parameters)
2. Sort these documents based on score

So I know that I can change default scoring by overriding DefaultSimilarity
class, but how does this class can receive query parameters, which are
required for score calculation. Also, once score is calculated, how can I
sort those results based on scores?

Regards,
Gaurav

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-custom-scoring-using-query-parameters-tp3013788p3013788.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom Scoring relying on another server.

2011-05-31 Thread arian487
bump

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Scoring-relying-on-another-server-tp2994546p3006873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom Scoring relying on another server.

2011-05-27 Thread arian487
I know this question has been asked before but I think my situation is a
little different.  Basically I need to do custom scores that the traditional
function queries simply won't allow me to do.  I actually need to hit
another server from Java (passing in a bunch of things mostly relying on how
to score result).  So I want to extend the current scorer and add in the
things I need it to do for the scoring (make a trip to the scoring server
with a bunch of parameters, and come back with the scores).  

Can someone point me to the right direction to doing this?  Exactly where
does the document scoring happen in Solr?  Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Scoring-relying-on-another-server-tp2994546p2994546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom scoring for searhing geographic objects

2010-12-19 Thread Alexey Serba
Hi Pavel,

I had the similar problem several years ago - I had to find
geographical locations in textual descriptions, geocode these objects
to lat/long during indexing process and allow users to filter/sort
search results to specific geographical areas. The important issue was
that there were several types of geographical objects - street < town
< region < country. The idea was to geocode to most narrow
geographical area as possible. Relevance logic in this case could be
specified as "find the most narrow result that is unique identified by
your text or search query".  So I came up with custom algorithm that
was quite good in terms of performance and precision/recall. Here's
the simple description:
* You can intersect all text/searchquery terms with locations
dictionary to find only geo terms
* Search in your locations Lucene index and filter only street objects
(the most narrow areas). Due to tf*idf formula you'll get the most
relevant results. Then you need to post process N (3/5/10) results and
verify that they are matches indeed. I did intersect search terms with
result's terms and make another lucene search to verify if these terms
are unique identifying the match. If it's then return matching street.
If there's no any match proceed using the same algorithm with towns,
regions, countries.

HTH,
Alexey

On Wed, Dec 15, 2010 at 6:28 PM, Pavel Minchenkov  wrote:
> Hi,
> Please give me advise how to create custom scoring. I need to result that
> documents were in order, depending on how popular each term in the document
> (popular = how many times it appears in the index) and length of the
> document (less terms - higher in search results).
>
> For example, index contains following data:
>
> ID    | SEARCH_FIELD
> --
> 1     | Russia
> 2     | Russia, Moscow
> 3     | Russia, Volgograd
> 4     | Russia, Ivanovo
> 5     | Russia, Ivanovo, Altayskaya street 45
> 6     | Russia, Moscow, Kremlin
> 7     | Russia, Moscow, Altayskaya street
> 8     | Russia, Moscow, Altayskaya street 15
> 9     | Russia, Moscow, Altayskaya street 15/26
>
>
> And I should get next results:
>
>
> Query                     | Document result set
> --
> Russia                    | 1,2,4,3,6,7,8,9,5
> Moscow                  | 2,6,7,8,9
> Ivanovo                    | 4,5
> Altayskaya              | 7,8,9,5
>
> In fact --- it is a search for geographic objects (cities, streets, houses).
> At the same time can be given only part of the address, and the results
> should appear the most relevant results.
>
> Thanks.
> --
> Pavel Minchenkov
>


Re: Custom scoring for searhing geographic objects

2010-12-15 Thread Grant Ingersoll
Have a look at http://lucene.apache.org/java/3_0_2/scoring.html on how Lucene's 
scoring works.  You can override the Similarity class in Solr as well via the 
schema.xml file.  

On Dec 15, 2010, at 10:28 AM, Pavel Minchenkov wrote:

> Hi,
> Please give me advise how to create custom scoring. I need to result that
> documents were in order, depending on how popular each term in the document
> (popular = how many times it appears in the index) and length of the
> document (less terms - higher in search results).
> 
> For example, index contains following data:
> 
> ID| SEARCH_FIELD
> --
> 1 | Russia
> 2 | Russia, Moscow
> 3 | Russia, Volgograd
> 4 | Russia, Ivanovo
> 5 | Russia, Ivanovo, Altayskaya street 45
> 6 | Russia, Moscow, Kremlin
> 7 | Russia, Moscow, Altayskaya street
> 8 | Russia, Moscow, Altayskaya street 15
> 9 | Russia, Moscow, Altayskaya street 15/26
> 
> 
> And I should get next results:
> 
> 
> Query | Document result set
> --
> Russia| 1,2,4,3,6,7,8,9,5
> Moscow  | 2,6,7,8,9
> Ivanovo| 4,5
> Altayskaya  | 7,8,9,5
> 
> In fact --- it is a search for geographic objects (cities, streets, houses).
> At the same time can be given only part of the address, and the results
> should appear the most relevant results.
> 
> Thanks.
> -- 
> Pavel Minchenkov

--
Grant Ingersoll
http://www.lucidimagination.com



Custom scoring for searhing geographic objects

2010-12-15 Thread Pavel Minchenkov
Hi,
Please give me advise how to create custom scoring. I need to result that
documents were in order, depending on how popular each term in the document
(popular = how many times it appears in the index) and length of the
document (less terms - higher in search results).

For example, index contains following data:

ID| SEARCH_FIELD
--
1 | Russia
2 | Russia, Moscow
3 | Russia, Volgograd
4 | Russia, Ivanovo
5 | Russia, Ivanovo, Altayskaya street 45
6 | Russia, Moscow, Kremlin
7 | Russia, Moscow, Altayskaya street
8 | Russia, Moscow, Altayskaya street 15
9 | Russia, Moscow, Altayskaya street 15/26


And I should get next results:


Query | Document result set
--
Russia| 1,2,4,3,6,7,8,9,5
Moscow  | 2,6,7,8,9
Ivanovo| 4,5
Altayskaya  | 7,8,9,5

In fact --- it is a search for geographic objects (cities, streets, houses).
At the same time can be given only part of the address, and the results
should appear the most relevant results.

Thanks.
-- 
Pavel Minchenkov


Re: Custom scoring

2010-09-01 Thread Lance Norskog
Check out the function query feature, and the bf= parameter. It may be
that the existing functions meet your needs, or that you can add a few
new functions.

It can take a while to understand what you really want to do, so
writing a large piece of code now can be wasteful.

On Mon, Aug 30, 2010 at 2:04 PM, Brad Kellett  wrote:
> Hi all,
>
> I'm looking for examples or pointers to some info on implementing custom 
> scoring in solr/lucene. Basically, what we're looking at doing is to augment 
> the score from a dismax query with some custom signals based on data in 
> fields from the row initially matched. There will be several of these 
> features dynamically scored at query-time (due to the nature of the data, 
> pre-computed stuff isn't really what we're looking for).
>
> I do apologize for the vagueness of this, but a lot of this data is stuff we 
> want to keep under wraps. Essentially, I'm just looking for a place to use 
> some custom java code to be able to manipulate the score for a row matched in 
> a dismax query.
>
> I've been Googling like a mad man, but haven't really hit on something that 
> seems ideal yet. Custom similarity appears to just allow changing the 
> components of the TF-IDF score, for example. Can someone point me to an 
> example of doing something like this?
>
> ~Brad



-- 
Lance Norskog
goks...@gmail.com


Custom scoring

2010-08-30 Thread Brad Kellett
Hi all,

I'm looking for examples or pointers to some info on implementing custom 
scoring in solr/lucene. Basically, what we're looking at doing is to augment 
the score from a dismax query with some custom signals based on data in fields 
from the row initially matched. There will be several of these features 
dynamically scored at query-time (due to the nature of the data, pre-computed 
stuff isn't really what we're looking for).

I do apologize for the vagueness of this, but a lot of this data is stuff we 
want to keep under wraps. Essentially, I'm just looking for a place to use some 
custom java code to be able to manipulate the score for a row matched in a 
dismax query.

I've been Googling like a mad man, but haven't really hit on something that 
seems ideal yet. Custom similarity appears to just allow changing the 
components of the TF-IDF score, for example. Can someone point me to an example 
of doing something like this?

~Brad

Re: custom scoring phrase queries

2010-06-18 Thread Marco Martinez
Hi Otis,

Finally i construct my own function query that gives more score if the value
is at the start  of the field. But, its possible to tell solr to use
spanFirstQuery without coding. I think i have read that its no possible.

Thanks,


Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/6/18 Otis Gospodnetic 

> Marco,
>
> I don't think there is anything in Solr to do that (is there?), but you
> could do it with some coding if you combined the "regular query" with
> SpanFirstQuery with bigger boost:
>
>
> http://search-lucene.com/jd/lucene/org/apache/lucene/search/spans/SpanFirstQuery.html
>
> Oh, here are some examples and at the bottom you will see exactly what I
> suggested above:
>
>
> http://search-lucene.com/c/Lucene:/src/java/org/apache/lucene/search/spans/package.html||SpanFirstQuery<http://search-lucene.com/c/Lucene:/src/java/org/apache/lucene/search/spans/package.html%7C%7CSpanFirstQuery>
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Marco Martinez 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, June 18, 2010 4:34:45 AM
> > Subject: custom scoring phrase queries
> >
> > Hi,
>
> I want to know if its posiible to get a higher score in a phrase
> > query when
> the matching is on the left side of the field. For
> > example:
>
>
> doc1=name:stores peter john
> doc2=name:peter john
> > stores
> doc3=name:peter john something
>
> if you do a search with
> > name="peter john" the resultset i want to get
> > is:
>
> doc2
> doc3
> doc1
>
> because the terms peter john are on the
> > left side of the field and they get
> a higher score.
>
> Thanks in
> > advance,
>
>
> Marco Martínez Bautista
>
> > href="http://www.paradigmatecnologico.com"; target=_blank
> > >http://www.paradigmatecnologico.com
> Avenida de Europa, 26. Ática 5. 3ª
> > Planta
> 28224 Pozuelo de Alarcón
> Tel.: 91 352 59 42
>


Re: custom scoring phrase queries

2010-06-18 Thread Otis Gospodnetic
Marco,

I don't think there is anything in Solr to do that (is there?), but you could 
do it with some coding if you combined the "regular query" with SpanFirstQuery 
with bigger boost:

http://search-lucene.com/jd/lucene/org/apache/lucene/search/spans/SpanFirstQuery.html
 
Oh, here are some examples and at the bottom you will see exactly what I 
suggested above:

http://search-lucene.com/c/Lucene:/src/java/org/apache/lucene/search/spans/package.html||SpanFirstQuery

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Marco Martinez 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 18, 2010 4:34:45 AM
> Subject: custom scoring phrase queries
> 
> Hi,

I want to know if its posiible to get a higher score in a phrase 
> query when
the matching is on the left side of the field. For 
> example:


doc1=name:stores peter john
doc2=name:peter john 
> stores
doc3=name:peter john something

if you do a search with 
> name="peter john" the resultset i want to get 
> is:

doc2
doc3
doc1

because the terms peter john are on the 
> left side of the field and they get
a higher score.

Thanks in 
> advance,


Marco Martínez Bautista

> href="http://www.paradigmatecnologico.com"; target=_blank 
> >http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª 
> Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


custom scoring phrase queries

2010-06-18 Thread Marco Martinez
Hi,

I want to know if its posiible to get a higher score in a phrase query when
the matching is on the left side of the field. For example:


doc1=name:stores peter john
doc2=name:peter john stores
doc3=name:peter john something

if you do a search with name="peter john" the resultset i want to get is:

doc2
doc3
doc1

because the terms peter john are on the left side of the field and they get
a higher score.

Thanks in advance,


Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: Custom scoring example

2008-09-10 Thread Mike Klaas


On 5-Sep-08, at 5:01 PM, Ravindra Sharma wrote:


I am looking for an example if anyone has done any custom scoring with
Solr/Lucene.

I need to implement a Query similar to DisjunctionMaxQuery, the only
difference would
be it should score based on sum of score of sub queries' scores  
instead of

max.

Any custom scoring example will help.

(On one hand, DisjunctionMaxQuery itself is an example :-). It is too
professional :-)


DisjunctionMaxQuery takes the max plus (tiebreak)*sum(others).  So, if  
you set tie=1.0, dismax becomes exactly what you are seeking.


-Mike


Re: Custom scoring example

2008-09-10 Thread Chris Hostetter

: I need to implement a Query similar to DisjunctionMaxQuery, the only
: difference would
: be it should score based on sum of score of sub queries' scores instead of
: max.

BooleanQuery computes scores that are the sub of hte subscores -- you just 
need to disable the coordFactor (there is a constructor arg for this).



-Hoss



Re: Custom scoring example

2008-09-10 Thread Grant Ingersoll
The only thing I can suggest is that each and every Query in Solr/ 
Lucene is an example of custom scoring.  You might be better off  
starting w/ TermQuery and working through PhraseQuery, BooleanQuery,  
on up.  At the point you get to DisJunctionMax, then ask questions  
about that specific one.


On Sep 5, 2008, at 8:01 PM, Ravindra Sharma wrote:


I am looking for an example if anyone has done any custom scoring with
Solr/Lucene.

I need to implement a Query similar to DisjunctionMaxQuery, the only
difference would
be it should score based on sum of score of sub queries' scores  
instead of

max.

Any custom scoring example will help.

(On one hand, DisjunctionMaxQuery itself is an example :-). It is too
professional :-)

Thanks,
Ravi





Custom scoring example

2008-09-05 Thread Ravindra Sharma
I am looking for an example if anyone has done any custom scoring with
Solr/Lucene.

I need to implement a Query similar to DisjunctionMaxQuery, the only
difference would
be it should score based on sum of score of sub queries' scores instead of
max.

Any custom scoring example will help.

(On one hand, DisjunctionMaxQuery itself is an example :-). It is too
professional :-)

Thanks,
Ravi