Antw: Re: Boosting of Join Results

Alena Dengler Tue, 22 Mar 2016 06:20:05 -0700

Mikhail, 

Thanks a lot for the suggestion. We now implemented the query as
follows: 
 q=(+geschichte +rom) OR _query_:{!boost b=0.01}{!join from=expandtype
fromIndex=pages to=id score=avg v='pageno_content:(+geschichte +rom)'})
With the factor of 0.01 it seems to work well with our data.


Best Regards
Alena


>>> Mikhail Khludnev <mkhlud...@griddynamics.com> 22.03.2016 12:44 >>>
what is you nest join into boost eg q=+foo {!boost ..}{!join ...
v=...}

see
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser


if it works, you may vote for
https://issues.apache.org/jira/browse/SOLR-7814 

On Tue, Mar 22, 2016 at 12:39 PM, Alena Dengler <
alena.deng...@bsb-muenchen.de> wrote:

> Hello,
>
> we are currently developing a combined index for book metadata and
> fulltexts. Our primary core contains metadata of ~12Mio. books.
~0.5Mio.
> of them have fulltexts; those fulltexts are indexed in a secondary
core.
> This secondary core has one index document per fulltext page.
> We are joining all matching fulltext pages with the bookwise
metadata
> in the primary core. Currently we have the problem that scores for
books
> with matches from the secondary core are not comparable with matches
> from metadata only. So we are trying to normalize fulltext scores to
be
> in the same dimension as the metadata scores for non-digitized
results.
>
> This is a basic query without join using only the primary core
> (metadata):
> http://server/solr/live/select?&q=+geschichte&fl=id,score 
> Top 10 result scores range from 2.0 to 1.7
>
> For fulltexts, the query is extended with a join:
>
>
http://server/solr/live/select?q=%28%28+geschichte%29%20OR%20_query_:{!join%20from=expandtype%20fromIndex=pages%20to=id%20score=max%20v=%27pageno_content:%28+geschichte%29%27}%29&fl=id,score

> Top 10 result scores range from 5.4 to 4.8 (4.7 score points for the
> first hit result from the joined secondary core. We would like to
reduce
> this value. See explain output below [1])
>
> This difference will effectively hide any books without fulltexts
from
> hitlists, which is not our goal.
>
> We tried to add lucene boosts to the join subquery, but they do not
> have any effect on the final scores. E.g. we 'down boost' the
fulltext
> results by a factor of 0.1:
> q=((+geschichte) OR _query_:{!join from=expandtype fromIndex=pages
> to=id score=max v='pageno_content:(+geschichte)^0.1'})
> But the resulting scores are the same as from the join example
above.
>
> Is this the correct query syntax, or should the boost for the join
> query be put somewhere else?
>
> Thanks for any suggestions.
>
> Best Regards
> Alena
>
> [1] Explain output for the first hit of the join example query
> 5.398742 = sum of:
>   4.816505 = sum of:
>     0.07251295 = max of:
>       0.07251295 = weight(title:geschichte in 10585926)
> [ClassicSimilarity], result of:
>         0.07251295 = score(doc=10585926,freq=1.0), product of:
>           0.037440736 = queryWeight, product of:
>             5.1646385 = idf(docFreq=197504, maxDocs=12713278)
>             0.00724944 = queryNorm
>           1.9367394 = fieldWeight in 10585926, product of:
>             1.0 = tf(freq=1.0), with freq of:
>               1.0 = termFreq=1.0
>             5.1646385 = idf(docFreq=197504, maxDocs=12713278)
>             0.375 = fieldNorm(doc=10585926)
>       0.005904072 = weight(free_search:geschichte in 10585926)
> [ClassicSimilarity], result of:
>         0.005904072 = score(doc=10585926,freq=2.0), product of:
>           0.022005465 = queryWeight, product of:
>             3.035471 = idf(docFreq=1660594, maxDocs=12713278)
>             0.00724944 = queryNorm
>           0.26830027 = fieldWeight in 10585926, product of:
>             1.4142135 = tf(freq=2.0), with freq of:
>               2.0 = termFreq=2.0
>             3.035471 = idf(docFreq=1660594, maxDocs=12713278)
>             0.0625 = fieldNorm(doc=10585926)
>     4.743992 = Score based on join value 957245
>   0.58188105 = weight(statusband:F in 10585926) [ClassicSimilarity],
> result of:
>     0.58188105 = score(doc=10585926,freq=1.0), product of:
>       0.4592555 = queryWeight, product of:
>         50.0 = boost
>         1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
>         0.00724944 = queryNorm
>       1.2670095 = fieldWeight in 10585926, product of:
>         1.0 = tf(freq=1.0), with freq of:
>           1.0 = termFreq=1.0
>         1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
>         1.0 = fieldNorm(doc=10585926)
>   3.5596997E-4 =
>
>
FunctionQuery(1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)))+1.0)),
> product of:
>     0.00491031 =
>
>
1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)=1813-01-01T00:00:01Z))+1.0)
>     0.0724944 = boost
>     1.0 = queryNorm
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Antw: Re: Boosting of Join Results

Reply via email to