Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-08-23 Thread Mikhail Khludnev
Hello Upayavira,
It's a long month ago! I just described this approach in
http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
Coming back to our discussion I think I miss {!func} which turn fieldname
into function query.

On Fri, Jul 24, 2015 at 3:41 PM, Upayavira  wrote:

> Mikhail,
>
> I've tried this out, but to be honest I can't work out what the score=
> parameter is supposed to add.
>
> I assume that if I do {!join fromIndex=other from=other_key to=key
> score=max}somefield:(abc dev)
>
> It will calculate the score for each document that has the same "key"
> value, and include that in the score for the main document?
>
> If this is the case, then I should be able to do:
>
> {!join fromIndex=other from=other_key to=key score=max}{!boost
> b=my_boost_value_field}*:*
>
> In which case, it'll take the value of "my_boost_field" in the other
> core, and include it in the score for my document that has the value of
> "key"?
>
> Upayavira
>
> On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
> > I've heard that people use
> > https://issues.apache.org/jira/browse/SOLR-6234
> > for such purpose - adding scores from fast moving core to the bigger slow
> > moving one
> >
> > On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
> >
> > > All,
> > >
> > > I have knocked up what I think could be a really cool function query -
> > > it allows you to retrieve a value from another core (much like a pseudo
> > > join) and use that value during scoring (much like an
> > > ExternalFileField).
> > >
> > > Examples:
> > >  * Selective boosting of documents based upon a category based value
> > >  * boost on aggregated popularity values
> > >  * boost on fast moving data on your slow moving index
> > >
> > > It *works* but it does so very slowly (on 3m docs, milliseconds
> without,
> > > and 24s with it). There are two things that happen a lot:
> > >
> > >  * locate a document with unique ID value of X
> > >  * retrieve the value of field Y for that doc
> > >
> > > What it seems to me now is that I need to implement a cache that will
> > > have a string value as the key and the (float) field value as the
> > > object, that is warmed alongside existing caches.
> > >
> > > Any pointers to examples of how I could do this, or other ways to do
> the
> > > conversion from a key value to a float value faster?
> > >
> > > NB. I hope to contribute this if I can make it perform.
> > >
> > > Thanks!
> > >
> > > Upayavira
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-24 Thread Mikhail Khludnev
I think it's intended for

{!join fromIndex=other from=other_key to=key score=max}my_boost_value_field

thus it runs functional query, which matches all docs at "other" core with
field value 'my_boost_value_field' as a score. Then, this score is passed
through join query for other.other_key=key. Do you see something on
debugQuery=true?

On Fri, Jul 24, 2015 at 3:41 PM, Upayavira  wrote:

> Mikhail,
>
> I've tried this out, but to be honest I can't work out what the score=
> parameter is supposed to add.
>
> I assume that if I do {!join fromIndex=other from=other_key to=key
> score=max}somefield:(abc dev)
>
> It will calculate the score for each document that has the same "key"
> value, and include that in the score for the main document?
>
> If this is the case, then I should be able to do:
>
> {!join fromIndex=other from=other_key to=key score=max}{!boost
> b=my_boost_value_field}*:*
>
> In which case, it'll take the value of "my_boost_field" in the other
> core, and include it in the score for my document that has the value of
> "key"?
>
> Upayavira
>
> On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
> > I've heard that people use
> > https://issues.apache.org/jira/browse/SOLR-6234
> > for such purpose - adding scores from fast moving core to the bigger slow
> > moving one
> >
> > On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
> >
> > > All,
> > >
> > > I have knocked up what I think could be a really cool function query -
> > > it allows you to retrieve a value from another core (much like a pseudo
> > > join) and use that value during scoring (much like an
> > > ExternalFileField).
> > >
> > > Examples:
> > >  * Selective boosting of documents based upon a category based value
> > >  * boost on aggregated popularity values
> > >  * boost on fast moving data on your slow moving index
> > >
> > > It *works* but it does so very slowly (on 3m docs, milliseconds
> without,
> > > and 24s with it). There are two things that happen a lot:
> > >
> > >  * locate a document with unique ID value of X
> > >  * retrieve the value of field Y for that doc
> > >
> > > What it seems to me now is that I need to implement a cache that will
> > > have a string value as the key and the (float) field value as the
> > > object, that is warmed alongside existing caches.
> > >
> > > Any pointers to examples of how I could do this, or other ways to do
> the
> > > conversion from a key value to a float value faster?
> > >
> > > NB. I hope to contribute this if I can make it perform.
> > >
> > > Thanks!
> > >
> > > Upayavira
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-24 Thread Upayavira
Mikhail,

I've tried this out, but to be honest I can't work out what the score=
parameter is supposed to add.

I assume that if I do {!join fromIndex=other from=other_key to=key
score=max}somefield:(abc dev)

It will calculate the score for each document that has the same "key"
value, and include that in the score for the main document?

If this is the case, then I should be able to do:

{!join fromIndex=other from=other_key to=key score=max}{!boost
b=my_boost_value_field}*:*

In which case, it'll take the value of "my_boost_field" in the other
core, and include it in the score for my document that has the value of
"key"?

Upayavira

On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
> I've heard that people use
> https://issues.apache.org/jira/browse/SOLR-6234
> for such purpose - adding scores from fast moving core to the bigger slow
> moving one
> 
> On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
> 
> > All,
> >
> > I have knocked up what I think could be a really cool function query -
> > it allows you to retrieve a value from another core (much like a pseudo
> > join) and use that value during scoring (much like an
> > ExternalFileField).
> >
> > Examples:
> >  * Selective boosting of documents based upon a category based value
> >  * boost on aggregated popularity values
> >  * boost on fast moving data on your slow moving index
> >
> > It *works* but it does so very slowly (on 3m docs, milliseconds without,
> > and 24s with it). There are two things that happen a lot:
> >
> >  * locate a document with unique ID value of X
> >  * retrieve the value of field Y for that doc
> >
> > What it seems to me now is that I need to implement a cache that will
> > have a string value as the key and the (float) field value as the
> > object, that is warmed alongside existing caches.
> >
> > Any pointers to examples of how I could do this, or other ways to do the
> > conversion from a key value to a float value faster?
> >
> > NB. I hope to contribute this if I can make it perform.
> >
> > Thanks!
> >
> > Upayavira
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
> 


Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-10 Thread Upayavira
Hi Erick,

You are right that I could actually be asking for a stored field. That's
an exceptionally good point, and yes, would suck. Better would be to
retrieve a docValue from document. I'll look into that.

Upayavira

On Fri, Jul 10, 2015, at 06:28 PM, Erick Erickson wrote:
> Upayavira:
> 
> bq: retrieve the value of field Y for that doc
> 
> If this is fetching the stored field it's going to be horrible as
> it'll probably read/decompress a 16K block each time. Yccck. If
> you can read the value from a DocValues field (or, indeed, any indexed
> field which would only really work for non-text types).
> 
> There's also Solr's User cache. This is just a cache like filterCache
> etc. that you code up yourself. The kicker is that it gets a "refresh
> yourself" message whenever a new searcher is opened.
> 
> I vaguely remember some work about efficiently finding the 
> bit can't lay my hands on it.
> 
> I also wonder which of these would be handled by updateable doc values
> and whether that effort is more general?
> 
> Best,
> Erick
> 
> On Fri, Jul 10, 2015 at 8:30 AM, Upayavira  wrote:
> > Mikhail,
> >
> > Thanks for pointing this out.
> >
> > I'd say that ticket is in distinct need of some examples or use-cases.
> > It is extremely hard to work out what "scoring" actually means. What is
> > used to score what?
> >
> > It'd be great to see some examples and some explanations as to what
> > effect those examples have on scoring.
> >
> > I'll dig into that patch to see if I can work it out.
> >
> > Upayavira
> >
> >
> > On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
> >> I've heard that people use
> >> https://issues.apache.org/jira/browse/SOLR-6234
> >> for such purpose - adding scores from fast moving core to the bigger slow
> >> moving one
> >>
> >> On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
> >>
> >> > All,
> >> >
> >> > I have knocked up what I think could be a really cool function query -
> >> > it allows you to retrieve a value from another core (much like a pseudo
> >> > join) and use that value during scoring (much like an
> >> > ExternalFileField).
> >> >
> >> > Examples:
> >> >  * Selective boosting of documents based upon a category based value
> >> >  * boost on aggregated popularity values
> >> >  * boost on fast moving data on your slow moving index
> >> >
> >> > It *works* but it does so very slowly (on 3m docs, milliseconds without,
> >> > and 24s with it). There are two things that happen a lot:
> >> >
> >> >  * locate a document with unique ID value of X
> >> >  * retrieve the value of field Y for that doc
> >> >
> >> > What it seems to me now is that I need to implement a cache that will
> >> > have a string value as the key and the (float) field value as the
> >> > object, that is warmed alongside existing caches.
> >> >
> >> > Any pointers to examples of how I could do this, or other ways to do the
> >> > conversion from a key value to a float value faster?
> >> >
> >> > NB. I hope to contribute this if I can make it perform.
> >> >
> >> > Thanks!
> >> >
> >> > Upayavira
> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >> 
> >> 


Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-10 Thread Erick Erickson
Upayavira:

bq: retrieve the value of field Y for that doc

If this is fetching the stored field it's going to be horrible as
it'll probably read/decompress a 16K block each time. Yccck. If
you can read the value from a DocValues field (or, indeed, any indexed
field which would only really work for non-text types).

There's also Solr's User cache. This is just a cache like filterCache
etc. that you code up yourself. The kicker is that it gets a "refresh
yourself" message whenever a new searcher is opened.

I vaguely remember some work about efficiently finding the 
bit can't lay my hands on it.

I also wonder which of these would be handled by updateable doc values
and whether that effort is more general?

Best,
Erick

On Fri, Jul 10, 2015 at 8:30 AM, Upayavira  wrote:
> Mikhail,
>
> Thanks for pointing this out.
>
> I'd say that ticket is in distinct need of some examples or use-cases.
> It is extremely hard to work out what "scoring" actually means. What is
> used to score what?
>
> It'd be great to see some examples and some explanations as to what
> effect those examples have on scoring.
>
> I'll dig into that patch to see if I can work it out.
>
> Upayavira
>
>
> On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
>> I've heard that people use
>> https://issues.apache.org/jira/browse/SOLR-6234
>> for such purpose - adding scores from fast moving core to the bigger slow
>> moving one
>>
>> On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
>>
>> > All,
>> >
>> > I have knocked up what I think could be a really cool function query -
>> > it allows you to retrieve a value from another core (much like a pseudo
>> > join) and use that value during scoring (much like an
>> > ExternalFileField).
>> >
>> > Examples:
>> >  * Selective boosting of documents based upon a category based value
>> >  * boost on aggregated popularity values
>> >  * boost on fast moving data on your slow moving index
>> >
>> > It *works* but it does so very slowly (on 3m docs, milliseconds without,
>> > and 24s with it). There are two things that happen a lot:
>> >
>> >  * locate a document with unique ID value of X
>> >  * retrieve the value of field Y for that doc
>> >
>> > What it seems to me now is that I need to implement a cache that will
>> > have a string value as the key and the (float) field value as the
>> > object, that is warmed alongside existing caches.
>> >
>> > Any pointers to examples of how I could do this, or other ways to do the
>> > conversion from a key value to a float value faster?
>> >
>> > NB. I hope to contribute this if I can make it perform.
>> >
>> > Thanks!
>> >
>> > Upayavira
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> 
>> 


Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-10 Thread Upayavira
Mikhail,

Thanks for pointing this out.

I'd say that ticket is in distinct need of some examples or use-cases.
It is extremely hard to work out what "scoring" actually means. What is
used to score what?

It'd be great to see some examples and some explanations as to what
effect those examples have on scoring.

I'll dig into that patch to see if I can work it out.

Upayavira


On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
> I've heard that people use
> https://issues.apache.org/jira/browse/SOLR-6234
> for such purpose - adding scores from fast moving core to the bigger slow
> moving one
> 
> On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
> 
> > All,
> >
> > I have knocked up what I think could be a really cool function query -
> > it allows you to retrieve a value from another core (much like a pseudo
> > join) and use that value during scoring (much like an
> > ExternalFileField).
> >
> > Examples:
> >  * Selective boosting of documents based upon a category based value
> >  * boost on aggregated popularity values
> >  * boost on fast moving data on your slow moving index
> >
> > It *works* but it does so very slowly (on 3m docs, milliseconds without,
> > and 24s with it). There are two things that happen a lot:
> >
> >  * locate a document with unique ID value of X
> >  * retrieve the value of field Y for that doc
> >
> > What it seems to me now is that I need to implement a cache that will
> > have a string value as the key and the (float) field value as the
> > object, that is warmed alongside existing caches.
> >
> > Any pointers to examples of how I could do this, or other ways to do the
> > conversion from a key value to a float value faster?
> >
> > NB. I hope to contribute this if I can make it perform.
> >
> > Thanks!
> >
> > Upayavira
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
> 


Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-10 Thread Mikhail Khludnev
I've heard that people use https://issues.apache.org/jira/browse/SOLR-6234
for such purpose - adding scores from fast moving core to the bigger slow
moving one

On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:

> All,
>
> I have knocked up what I think could be a really cool function query -
> it allows you to retrieve a value from another core (much like a pseudo
> join) and use that value during scoring (much like an
> ExternalFileField).
>
> Examples:
>  * Selective boosting of documents based upon a category based value
>  * boost on aggregated popularity values
>  * boost on fast moving data on your slow moving index
>
> It *works* but it does so very slowly (on 3m docs, milliseconds without,
> and 24s with it). There are two things that happen a lot:
>
>  * locate a document with unique ID value of X
>  * retrieve the value of field Y for that doc
>
> What it seems to me now is that I need to implement a cache that will
> have a string value as the key and the (float) field value as the
> object, that is warmed alongside existing caches.
>
> Any pointers to examples of how I could do this, or other ways to do the
> conversion from a key value to a float value faster?
>
> NB. I hope to contribute this if I can make it perform.
>
> Thanks!
>
> Upayavira
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-10 Thread Upayavira
All,

I have knocked up what I think could be a really cool function query -
it allows you to retrieve a value from another core (much like a pseudo
join) and use that value during scoring (much like an
ExternalFileField).

Examples:
 * Selective boosting of documents based upon a category based value
 * boost on aggregated popularity values
 * boost on fast moving data on your slow moving index

It *works* but it does so very slowly (on 3m docs, milliseconds without,
and 24s with it). There are two things that happen a lot:

 * locate a document with unique ID value of X
 * retrieve the value of field Y for that doc

What it seems to me now is that I need to implement a cache that will
have a string value as the key and the (float) field value as the
object, that is warmed alongside existing caches.

Any pointers to examples of how I could do this, or other ways to do the
conversion from a key value to a float value faster?

NB. I hope to contribute this if I can make it perform.

Thanks!

Upayavira