Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-28 Thread O. Olson
Thank you very much Chris. I was not aware of debug.explain.structured. It
seems to be what I was looking for. 

Thanks also to Jack Krupansky. Yes, delving into those numbers would be my
next step, but I will get to that later.
O. O.


Chris Hostetter-3 wrote
> Just to be clear, regardless of *which* response writer you use (xml, 
> ruby, json, etc...) the default behavior is to include the score 
> explanation sa a single string which uses tabs/newlines to deal with the 
> nested (this nesting is visible if you view the raw response, no matter 
> what ResponseWriter)
> 
> You can however add a param indicating that you want the explaantion 
> information to be returned as a *structured data* instead o a simple 
> string...
> 
> https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured
> 
> ...if you wnat to programatically process debug info, this is the 
> recomended way to to so.
> 
> -Hoss
> http://www.lucidworks.com/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149521.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread Chris Hostetter

: Thank you very much Erik. This is exactly what I was looking for. While at
: the moment I have no clue about these numbers, they ruby formatting makes it
: much more easier to understand.

Just to be clear, regardless of *which* response writer you use (xml, 
ruby, json, etc...) the default behavior is to include the score 
explanation sa a single string which uses tabs/newlines to deal with the 
nested (this nesting is visible if you view the raw response, no matter 
what ResponseWriter)

You can however add a param indicating that you want the explaantion 
information to be returned as a *structured data* instead o a simple 
string...

https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured

...if you wnat to programatically process debug info, this is the 
recomended way to to so.

-Hoss
http://www.lucidworks.com/


Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread Jack Krupansky
The formatting is one thing, but ultimately it is just a giant expression, 
one for each document. The expression is computing the score, based on your 
chosen or default "similarity" algorithm. All the terms in the expressions 
are detailed here:


http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Unless you dive into that math (not so bad, really, if you are motivated), 
the expressions are going to be rather opaque to you.


The long floating point numbers are mostly just the intermediate (and final) 
calculations of the math described above.


Try constructing a very simple collection of simple, contrived documents, 
like a short sentence in each, with some common terms, and then try simply 
queries to see how the expression term values change. Try computing TF, DF, 
IDF yourself (just count the terms by hand), and compare to what debug gives 
you.


-- Jack Krupansky

-Original Message- 
From: O. Olson

Sent: Thursday, July 24, 2014 6:45 PM
To: solr-user@lucene.apache.org
Subject: Understanding the Debug explanations for Query Result 
Scoring/Ranking


Hi,

If you add /*&debug=true*/ to the Solr request /(and &wt=xml if your
current output is not XML)/, you would get a node in the resulting XML that
is named "debug". There is a child node to this called "explain" to this
which has a list showing why the results are ranked in a particular order.
I'm curious if there is some documentation on understanding these
numbers/results.

I am new to Solr, so I apologize that I may be using the wrong terms to
describe my problem. I also aware of
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
though I have not completely understood it.

My problem is trying to understand something like this:

1.5797625 = (MATCH) sum of: 0.4717142 = (MATCH) weight(text:televis in
44109) [DefaultSimilarity], result of: 0.4717142 = score(doc=44109,freq=1.0
= termFreq=1.0 ), product of: 0.71447384 = queryWeight, product of:
7.0424104 = idf(docFreq=896, maxDocs=377553) 0.10145303 = queryNorm 0.660226
= fieldWeight in 44109, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109) 1.1080483 = (MATCH) weight(text:tv in 44109)
[DefaultSimilarity], result of: 1.1080483 = score(doc=44109,freq=6.0 =
termFreq=6.0 ), product of: 0.6996622 = queryWeight, product of: 6.896415 =
idf(docFreq=1037, maxDocs=377553) 0.10145303 = queryNorm 1.5836904 =
fieldWeight in 44109, product of: 2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109)

*Note:* I have searched for "televisions". My search field is a single
catch-all field. The Edismax parser seems to break up my search term into
"televis" and "tv"

Is there some documentation on how to understand these numbers. They do not
seem to be properly delimited. At the minimum, I can understand something
like:
1.5797625 =  0.4717142 + 1.1080483
and
0.71447384  = 7.0424104 * 0.10145303

But, I cannot understand if something like "0.10145303 = queryNorm 0.660226
= fieldWeight in 44109" is used in the calculation anywhere. Also since
there were only two terms /("televis" and "tv")/ I could use subtraction to
find out 1.1080483 was the start of a new result.

I'd also appreciate if someone can tell me which class dumps out the above
data. If I know it, I can edit that class to make the output a bit more
understandable for me.

Thank you,
O. O.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread O. Olson
Thank you very much Erik. This is exactly what I was looking for. While at
the moment I have no clue about these numbers, they ruby formatting makes it
much more easier to understand.

Thanks to you Koji. I'm sorry I did not acknowledge you before. I think
Erik's solution is what I was looking for.
O. O.



Erik Hatcher-4 wrote
> The format of the XML explain output is not indented or very readable. 
> When I really need to see the explain indented, I use wt=ruby&indent=true
> (I don’t think the indent parameter is relevant for the explain output,
> but I use it anyway)
> 
>   Erik





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149226.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread Erik Hatcher
The format of the XML explain output is not indented or very readable.  When I 
really need to see the explain indented, I use wt=ruby&indent=true (I don’t 
think the indent parameter is relevant for the explain output, but I use it 
anyway)

Erik

On Jul 25, 2014, at 10:11 AM, O. Olson  wrote:

> Thank you Uwe. Unfortunately, I could not get your explain solr website to
> work. I always get an error saying "Ops. We have internal server error. This
> event was logged. We will try fix this soon. We are sorry for
> inconvenience."
> 
> At this point, I know that I need to have some technical background to
> understanding how these numbers are calculated. However even with that, I am
> sure that the format of this output is not obvious. I am curious about the
> documentation of this output format. It seems to be unintelligible. 
> 
> If this is not documented anywhere, can someone point me to which class is
> doing this output.
> 
> Thank you,
> O. O.
> 
> 
> an6 wrote
>> Hi,
>> 
>> to get an idea of the meaning of all this numbers, have a look on 
>> http://explain.solr.pl. I like this tool, it's great.
>> 
>> Uwe
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149217.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread O. Olson
Thank you Uwe. Unfortunately, I could not get your explain solr website to
work. I always get an error saying "Ops. We have internal server error. This
event was logged. We will try fix this soon. We are sorry for
inconvenience."

At this point, I know that I need to have some technical background to
understanding how these numbers are calculated. However even with that, I am
sure that the format of this output is not obvious. I am curious about the
documentation of this output format. It seems to be unintelligible. 

If this is not documented anywhere, can someone point me to which class is
doing this output.

Thank you,
O. O.


an6 wrote
> Hi,
> 
> to get an idea of the meaning of all this numbers, have a look on 
> http://explain.solr.pl. I like this tool, it's great.
> 
> Uwe





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149217.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-24 Thread Koji Sekiguchi

Hi,

In addition, this might be useful:

Fundamentals of Information Retrieval, Illustration with Apache Lucene
https://www.youtube.com/watch?v=SCsS5ePGmCs

This video is about 40 minutes long, but you can fast forward to 24:00
to learn scoring based on vector space model and how Lucene customize it.

Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

(2014/07/25 8:00), Uwe Reh wrote:

Hi,

to get an idea of the meaning of all this numbers, have a look on 
http://explain.solr.pl. I like
this tool, it's great.

Uwe

Am 25.07.2014 00:45, schrieb O. Olson:

Hi,

If you add /*&debug=true*/ to the Solr request /(and &wt=xml if your
current output is not XML)/, you would get a node in the resulting XML that
is named "debug". There is a child node to this called "explain" to this
which has a list showing why the results are ranked in a particular order.
I'm curious if there is some documentation on understanding these
numbers/results.

I am new to Solr, so I apologize that I may be using the wrong terms to
describe my problem. I also aware of
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
though I have not completely understood it.

My problem is trying to understand something like this:

1.5797625 = (MATCH) sum of: 0.4717142 = (MATCH) weight(text:televis in
44109) [DefaultSimilarity], result of: 0.4717142 = score(doc=44109,freq=1.0
= termFreq=1.0 ), product of: 0.71447384 = queryWeight, product of:
7.0424104 = idf(docFreq=896, maxDocs=377553) 0.10145303 = queryNorm 0.660226
= fieldWeight in 44109, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109) 1.1080483 = (MATCH) weight(text:tv in 44109)
[DefaultSimilarity], result of: 1.1080483 = score(doc=44109,freq=6.0 =
termFreq=6.0 ), product of: 0.6996622 = queryWeight, product of: 6.896415 =
idf(docFreq=1037, maxDocs=377553) 0.10145303 = queryNorm 1.5836904 =
fieldWeight in 44109, product of: 2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109)

*Note:* I have searched for "televisions". My search field is a single
catch-all field. The Edismax parser seems to break up my search term into
"televis" and "tv"

Is there some documentation on how to understand these numbers. They do not
seem to be properly delimited. At the minimum, I can understand something
like:
1.5797625 =  0.4717142 + 1.1080483
and
0.71447384  = 7.0424104 * 0.10145303

But, I cannot understand if something like "0.10145303 = queryNorm 0.660226
= fieldWeight in 44109" is used in the calculation anywhere. Also since
there were only two terms /("televis" and "tv")/ I could use subtraction to
find out 1.1080483 was the start of a new result.

I'd also appreciate if someone can tell me which class dumps out the above
data. If I know it, I can edit that class to make the output a bit more
understandable for me.

Thank you,
O. O.






--
View this message in context:
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137.html

Sent from the Solr - User mailing list archive at Nabble.com.










Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-24 Thread Uwe Reh

Hi,

to get an idea of the meaning of all this numbers, have a look on 
http://explain.solr.pl. I like this tool, it's great.


Uwe

Am 25.07.2014 00:45, schrieb O. Olson:

Hi,

If you add /*&debug=true*/ to the Solr request /(and &wt=xml if your
current output is not XML)/, you would get a node in the resulting XML that
is named "debug". There is a child node to this called "explain" to this
which has a list showing why the results are ranked in a particular order.
I'm curious if there is some documentation on understanding these
numbers/results.

I am new to Solr, so I apologize that I may be using the wrong terms to
describe my problem. I also aware of
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
though I have not completely understood it.

My problem is trying to understand something like this:

1.5797625 = (MATCH) sum of: 0.4717142 = (MATCH) weight(text:televis in
44109) [DefaultSimilarity], result of: 0.4717142 = score(doc=44109,freq=1.0
= termFreq=1.0 ), product of: 0.71447384 = queryWeight, product of:
7.0424104 = idf(docFreq=896, maxDocs=377553) 0.10145303 = queryNorm 0.660226
= fieldWeight in 44109, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109) 1.1080483 = (MATCH) weight(text:tv in 44109)
[DefaultSimilarity], result of: 1.1080483 = score(doc=44109,freq=6.0 =
termFreq=6.0 ), product of: 0.6996622 = queryWeight, product of: 6.896415 =
idf(docFreq=1037, maxDocs=377553) 0.10145303 = queryNorm 1.5836904 =
fieldWeight in 44109, product of: 2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109)

*Note:* I have searched for "televisions". My search field is a single
catch-all field. The Edismax parser seems to break up my search term into
"televis" and "tv"

Is there some documentation on how to understand these numbers. They do not
seem to be properly delimited. At the minimum, I can understand something
like:
1.5797625 =  0.4717142 + 1.1080483
and
0.71447384  = 7.0424104 * 0.10145303

But, I cannot understand if something like "0.10145303 = queryNorm 0.660226
= fieldWeight in 44109" is used in the calculation anywhere. Also since
there were only two terms /("televis" and "tv")/ I could use subtraction to
find out 1.1080483 was the start of a new result.

I'd also appreciate if someone can tell me which class dumps out the above
data. If I know it, I can edit that class to make the output a bit more
understandable for me.

Thank you,
O. O.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137.html
Sent from the Solr - User mailing list archive at Nabble.com.





Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-24 Thread O. Olson
Hi,

If you add /*&debug=true*/ to the Solr request /(and &wt=xml if your
current output is not XML)/, you would get a node in the resulting XML that
is named "debug". There is a child node to this called "explain" to this
which has a list showing why the results are ranked in a particular order.
I'm curious if there is some documentation on understanding these
numbers/results. 

I am new to Solr, so I apologize that I may be using the wrong terms to
describe my problem. I also aware of
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
though I have not completely understood it. 

My problem is trying to understand something like this: 

1.5797625 = (MATCH) sum of: 0.4717142 = (MATCH) weight(text:televis in
44109) [DefaultSimilarity], result of: 0.4717142 = score(doc=44109,freq=1.0
= termFreq=1.0 ), product of: 0.71447384 = queryWeight, product of:
7.0424104 = idf(docFreq=896, maxDocs=377553) 0.10145303 = queryNorm 0.660226
= fieldWeight in 44109, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109) 1.1080483 = (MATCH) weight(text:tv in 44109)
[DefaultSimilarity], result of: 1.1080483 = score(doc=44109,freq=6.0 =
termFreq=6.0 ), product of: 0.6996622 = queryWeight, product of: 6.896415 =
idf(docFreq=1037, maxDocs=377553) 0.10145303 = queryNorm 1.5836904 =
fieldWeight in 44109, product of: 2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109)

*Note:* I have searched for "televisions". My search field is a single
catch-all field. The Edismax parser seems to break up my search term into
"televis" and "tv"

Is there some documentation on how to understand these numbers. They do not
seem to be properly delimited. At the minimum, I can understand something
like: 
1.5797625 =  0.4717142 + 1.1080483
and
0.71447384  = 7.0424104 * 0.10145303

But, I cannot understand if something like "0.10145303 = queryNorm 0.660226
= fieldWeight in 44109" is used in the calculation anywhere. Also since
there were only two terms /("televis" and "tv")/ I could use subtraction to
find out 1.1080483 was the start of a new result.

I'd also appreciate if someone can tell me which class dumps out the above
data. If I know it, I can edit that class to make the output a bit more
understandable for me.

Thank you,
O. O.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137.html
Sent from the Solr - User mailing list archive at Nabble.com.