[jira] [Created] (SOLR-16942) Improve knn explain output

Marc Byrd (Jira) Fri, 18 Aug 2023 17:57:53 -0700

Marc Byrd created SOLR-16942:
--------------------------------

             Summary: Improve knn explain output
                 Key: SOLR-16942
                 URL: https://issues.apache.org/jira/browse/SOLR-16942
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Marc Byrd



The following is explain output for a query involving both reRank and a \{!knn} 
query:
1.4137135 = combined unscaled first and scaled second pass score 
  0.9137135 = first pass score
    0.9137135 = sum of:
      0.0039847707 = sum of:
        0.0039847707 = max of:
          0.0014896907 = weight(description_t:miles in 113) [SchemaSimilarity], 
result of:
            0.0014896907 = score(freq=2.0), computed as boost * idf * tf from:
              0.001 = boost
              2.0111222 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) 
from:
                26 = n, number of documents containing term
                197 = N, total number of documents with field
              0.740726 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / 
avgdl)) from:
                2.0 = freq, occurrences of term within document
                1.2 = k1, term saturation parameter
                0.75 = b, length normalization parameter
                21.0 = dl, length of field
                47.243656 = avgdl, average length of field
          0.0039847707 = weight(title_t:miles in 113) [SchemaSimilarity], 
result of:
            0.0039847707 = score(freq=2.0), computed as boost * idf * tf from:
              0.002 = boost
              2.84592 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) 
from:
                11 = n, number of documents containing term
                197 = N, total number of documents with field
              0.7000848 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / 
avgdl)) from:
                2.0 = freq, occurrences of term within document
                1.2 = k1, term saturation parameter
                0.75 = b, length normalization parameter
                7.0 = dl, length of field
                11.314721 = avgdl, average length of field
      0.90972877 = within top 100
  1.0 = second pass score scaled between:0-1
    3.9847708 = second pass score
      3.9847708 = sum of:
        3.9847708 = max of:
          1.4896905 = weight(description_t:miles in 113) [SchemaSimilarity], 
result of:
            1.4896905 = score(freq=2.0), computed as boost * idf * tf from:
              2.0111222 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) 
from:
                26 = n, number of documents containing term
                197 = N, total number of documents with field
              0.740726 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / 
avgdl)) from:
                2.0 = freq, occurrences of term within document
                1.2 = k1, term saturation parameter
                0.75 = b, length normalization parameter
                21.0 = dl, length of field
                47.243656 = avgdl, average length of field
          3.9847708 = weight(title_t:miles in 113) [SchemaSimilarity], result 
of:
            3.9847708 = score(freq=2.0), computed as boost * idf * tf from:
              2.0 = boost
              2.84592 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) 
from:
                11 = n, number of documents containing term
                197 = N, total number of documents with field
              0.7000848 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / 
avgdl)) from:
                2.0 = freq, occurrences of term within document
                1.2 = k1, term saturation parameter
                0.75 = b, length normalization parameter
                7.0 = dl, length of field
                11.314721 = avgdl, average length of field
    0.8636209 = min second pass score
    3.9847708 = max sceond pass score
  0.5 = rerank weight

Note the detail in the reRank explain, compared to the knn part having one 
entry:
  0.90972877 = within top 100

 

(And we only know that as a result of doing a knn-only query).  

Perhaps it doesn't need to be (and can't be) as detailed as the above, it 
should at least include:
* topK
* dimensions
* scoring method - dot product, cosine similarity, etc.
* maybe some insights into the HNSW tree walk?  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-16942) Improve knn explain output

Reply via email to