Hi,-

 i would like to ask the following to make it clearer (for me at least):

Document doc = new Document();



Field  f1= new TextField("field1", "string1", Field.Store.YES);


doc.add(f1); 
f1.setBoost(2.0f);



Field f2 = new TextField("field2", "string2", Field.Store.YES);


doc.add(f2);


f2.setBoost(1.0f);




This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

Field  f1= new TextField("field1", "string1", Field.Store.YES);


Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);

// If this boost value needs to be stored, a separate storedField instance needs to be added as well
… ( i will post this soon)

Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);

// If this boost value needs to be stored, a separate storedField instance needs to be added as well
… ( i will post this soon)


Now, in the searching code (i.e., at query time) should i need the FunctionScoreQuery because in this case

the boost is just a constant value but not a function? However, constant value can be argued to be a function with the same value all the time, too.


Expression expr = JavascriptCompiler.compile(“_boost");



// SimpleBindings just maps variables to SortField instances


SimpleBindings bindings = new SimpleBindings();


bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 


// create a query that matches based on body:contents but


// scores using expr


Query query = new FunctionScoreQuery(


    new TermQuery(new Term("field1", "term_to_look_for")),


expr.getDoubleValuesSource(bindings));


searcher.search(query, 10);


So, if boost is a single constant value, do we need the Javascript part above?

Best regards


On 10/18/19 4:07 PM, baris.ka...@oracle.com wrote:
Uwe,-

 can this https://lucene.apache.org/core/7_7_2/expressions/org/apache/lucene/expressions/Expression.html doc example that You also gave be extended with NumericDocValuesField part that needs to be done at indexing time boosting, too?

i see now why You meant that this is mixed type of boosting (i.e., both indexing time and search time).

I need then include this query mentioned in this example on these _score field (i would call it _boost field in my case) into my overall BooleanQuery.

i will now try to combine these together and post here for future help.

Best regards


On 10/18/19 3:18 PM, Uwe Schindler wrote:
Hi,

Read my original email! The index time values are written using NumericDocValuesField. The expressions docs also refer to that when the bindings are documented.

It's separate from the indexed data (TextField). Think of it like an additional numeric field in your database table with a factor in each row.

Uwe

Am October 18, 2019 7:14:03 PM UTC schrieb baris.ka...@oracle.com:
Uwe,-

Two questions there:

i guess this is applicable to TextField, too.

And i was expecting a index writer object in the example for index time

boosting.

Best regards


On 10/18/19 2:57 PM, Uwe Schindler wrote:
Sorry I was imprecise. It's a mix of both. The factors are stored per
document in index (this is why I called it index time). During query
time the expression use the index time values to fold them into the
query boost at query time.
What's your problem with that approach?

Uwe

Am October 18, 2019 6:50:40 PM UTC schrieb baris.ka...@oracle.com:
Uwe,-

   Thanks, if possible i am looking for a pure Java methodology to do
the
index time boosting.

This example looks like a search time boosting example:


https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=


Best regards

On 10/18/19 2:31 PM, Uwe Schindler wrote:
Hi,

Is there a working example for this? Is this mentioned in the
Lucene
Javadocs or any other docs so that i can look it?
To index the docvalues, see NumericDocValuesField (it can be added
to
documents like indexed or stored fields). You may have used them for
sorting already.
this methodology seems sort of like discouraging using index time
boosting.
Not really. Many use this all the time. It's one of the killer
features of both Solr and Elasticsearch. The problem was how the
Document.setBoost()worked (it did not work correctly, see below).
Previous setBoost method call was fine and easy to use.
Did it have some performance issues and then is that why it was
deprecated?
No the reason for deprecating this was for several reasons:
setBoost
was not doing what the user had expected. Internally the boost value
was just multiplied into the document norm factor (which is
internally
also a docvalues field). The norm factors are only very inprecise
floats stored in a byte, so precision is not well. If you put some
values into it and the length norm was already consuming all bits,
the
boosting was very coarse. It was also only multiplied into and most
users want to do some stuff like record click counts in the index
and
then boost for example with the logarithm or some other function. If
the boost is just multiplied into the length norm you have no
flexibility at all.
In addition you can have several docvalues fields and use their
values in a function (e.g. one field with click count and another
one
with product price). After that you can combine click count and
price
(which can be modified indipenently during index updates) and change
boost to boost lower price and higher click count up.
This is what you can do with the expressions module. You just give
it
a function.
Here is an example, the second example is using a
FunctionScoreQuery
that modifies the score based on the function and the given
docvalues:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
FunctionScoreQuery usage with MultiFieldQueryParser would also be
nice
where

MultiFieldQuery already has boosts field to do this in its
constructor.
The boots in the query parser are applied for fields during query
time (to have a different weight per field). Index time boosting is
per
document. So you can combine both.
Maybe it is not needed with MultiFieldQueryParser.
You use MultiFieldQueryParser to adjust weights of the fields (e.g.
title versus body). The parsed query is then wrapped with an
expression
that modifies the score per document according to the docvalues.
Uwe

On 10/18/19 1:28 PM, Uwe Schindler wrote:

Hi,

that's not true. You can do index time boosting, but you need to
do
that
using a separate field. You just index a numeric docvalues field
(which may
contain a long or float value per document). Later you wrap your
query with
some FunctionScoreQuery (e.g., use the Javascript function query
syntax in
the expressions module). This allows you to compile a javascript
function
that calculated the final score based on the score returned by the
inner query
and combines them with docvalues that were indexed per document.
Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-
3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
eMail: u...@thetaphi.de

-----Original Message-----
From: baris.ka...@oracle.com <baris.ka...@oracle.com>
Sent: Friday, October 18, 2019 5:28 PM
To: java-user@lucene.apache.org
Cc: baris.ka...@oracle.com
Subject: Re: Index-time boosting: Deprecated setBoost method

It looks like index-time boosting (field) is not possible since
Lucene
version 7.7.2 and

i was using before for another case the BoostQuery at search
time
for
boosting and

this seems to be the only boosting option now in Lucene.

Best regards


On 10/18/19 10:01 AM, baris.ka...@oracle.com wrote:
Hi,-

i saw this in the Field class docs and i am figuring out the
following
note in the docs:

setBoost(float boost)
Deprecated.
Index-time boosts are deprecated, please index index-time
scoring
factors into a doc value field and combine them with the score
at
query time using eg. FunctionScoreQuery.

I appreciate this note. Is there an example about this? I wish
docs
would give a simple example to further help.


https://urldefense.proofpoint.com/v2/url?u=https-
3A__lucene.apache.org_core_6-5F6-
5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
Field.html
vs


https://urldefense.proofpoint.com/v2/url?u=https-
3A__lucene.apache.org_core_7-5F7-
5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
ield.html
Best regards


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Uwe Schindler
Achterdiek 19, 28357 Bremen

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1TEcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to