Re: Understanding query explain information

2011-06-27 Thread Chris Hostetter
: Simply trying to understand why these strings generated such scores, and as
: far as I can understand, the only difference between them is the field
: norms, as all the other results maintain themselves.
...
: Well, if this is true, the field norm for my first document should be 0.5
: (1/sqrt(4)) as  Livro - IPAD - O Guia do Profissional ends up with the
: terms livro|ipad|guia|profissional as tokens.
...
: 3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
:   1.0 = tf(termFreq(itemName:ipad)=1)
:   8.413407 = idf(docFreq=165, maxDocs=275239)
:   0.4375 = fieldNorm(field=itemName, doc=102507)

fieldNorms are encoded into a compact byte representation which looses 
some precision...

http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Similarity.html#encodeNorm%28float%29


-Hoss


Re: Understanding query explain information

2011-06-24 Thread lee carroll
Is it possible that synonyms are being added (synonym expansion) or at
least changing
the field length. I've saw this before. Check what exactly what terms
have been added.


On 23 June 2011 22:50, Alexander Ramos Jardim
alexander.ramos.jar...@gmail.com wrote:
 Yes, I am using synonims in index time.

 2011/6/22 lee carroll lee.a.carr...@googlemail.com

 Hi are you using synonyms ?



 On 22 June 2011 10:30, Alexander Ramos Jardim
 alexander.ramos.jar...@gmail.com wrote:
  Hi guys,
 
  I am getting some doubts about how to correctly understand the debugQuery
  output. I have a field named itemName in my index. This is a text field,
  just that. When I quqery a simple ?q=itemName:iPad , I end up with the
  following query result.
 
  Simply trying to understand why these strings generated such scores, and
 as
  far as I can understand, the only difference between them is the field
  norms, as all the other results maintain themselves.
 
  Now, how do I get these field norm values? Field Norm is the result of
 this
  formula right?
 
  *1/square root of (terms)*,* where terms is the number of terms in my
 field
  after it is indexed*
 
 
  Well, if this is true, the field norm for my first document should be 0.5
  (1/sqrt(4)) as  Livro - IPAD - O Guia do Profissional ends up with the
  terms livro|ipad|guia|profissional as tokens.
 
  What I am forgetting to take into account?
 
  ?xml version=1.0 encoding=UTF-8?
  response
 
  lst name=responseHeader
   int name=status0/int
   int name=QTime3/int
   lst name=params
   str name=debugQueryon/str
   str name=start0/str
 
   str name=rows10/str
   arr name=indent
         stron/str
         stron/str
   /arr
   str name=flitemName,score/str
   str name=version2.2/str
 
   str name=qitemName:ipad/str
   /lst
  /lst
  result name=response numFound=161 start=0 maxScore=3.6808658
   doc
   float name=score3.6808658/float
   str name=itemNameLivro - IPAD - O Guia do Profissional/str
   /doc
 
   doc
   float name=score3.1550279/float
   str name=itemNameLeitor de Cartão para Ipad - Mobimax/str
   /doc
   doc
   float name=score3.1550279/float
   str name=itemNameSleeve para iPad/str
 
   /doc
   doc
   float name=score3.1550279/float
   str name=itemNameSleeve de Neoprene para iPad/str
   /doc
   doc
   float name=score3.1550279/float
 
   str name=itemNameCarregador de parede para iPad/str
   /doc
   doc
   float name=score2.6291897/float
   str name=itemNameCase Envelope para iPad - Black - Built NY/str
   /doc
   doc
 
   float name=score2.6291897/float
   str name=itemNameCase Protetora p/ IPad de Silicone Duo - Browm
  - Iskin/str
   /doc
   doc
   float name=score2.6291897/float
   str name=itemNameCase Protetora p/ IPad de Silicone Duo - Clear
  - Iskin/str
   /doc
 
   doc
   float name=score2.6291897/float
   str name=itemNameCase p/ iPad Sleeve - Black - Built NY/str
   /doc
   doc
   float name=score2.6291897/float
   str name=itemNameBolsa de Proteção p/ iPad Preta - Geonav/str
 
   /doc
  /result
  lst name=debug
   str name=rawquerystringitemName:ipad/str
   str name=querystringitemName:ipad/str
   str name=parsedqueryitemName:ipad/str
   str name=parsedquery_toStringitemName:ipad/str
   lst name=explain
 
   str name=7369507
  3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.4375 = fieldNorm(field=itemName, doc=102507)
  /str
   str name=739
  3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.375 = fieldNorm(field=itemName, doc=226401)
  /str
   str name=7356941
  3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.375 = fieldNorm(field=itemName, doc=226409)
  /str
   str name=7356931
  3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.375 = fieldNorm(field=itemName, doc=226447)
  /str
   str name=7360321
 
  3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.375 = fieldNorm(field=itemName, doc=226583)
  /str
   str name=7428354
  2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.3125 = fieldNorm(field=itemName, doc=223178)
  /str
   str name=7366074
  2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.3125 = fieldNorm(field=itemName, doc=223196)
  /str
   str name=7366068
  2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   

Re: Understanding query explain information

2011-06-23 Thread Alexander Ramos Jardim
Yes, I am using synonims in index time.

2011/6/22 lee carroll lee.a.carr...@googlemail.com

 Hi are you using synonyms ?



 On 22 June 2011 10:30, Alexander Ramos Jardim
 alexander.ramos.jar...@gmail.com wrote:
  Hi guys,
 
  I am getting some doubts about how to correctly understand the debugQuery
  output. I have a field named itemName in my index. This is a text field,
  just that. When I quqery a simple ?q=itemName:iPad , I end up with the
  following query result.
 
  Simply trying to understand why these strings generated such scores, and
 as
  far as I can understand, the only difference between them is the field
  norms, as all the other results maintain themselves.
 
  Now, how do I get these field norm values? Field Norm is the result of
 this
  formula right?
 
  *1/square root of (terms)*,* where terms is the number of terms in my
 field
  after it is indexed*
 
 
  Well, if this is true, the field norm for my first document should be 0.5
  (1/sqrt(4)) as  Livro - IPAD - O Guia do Profissional ends up with the
  terms livro|ipad|guia|profissional as tokens.
 
  What I am forgetting to take into account?
 
  ?xml version=1.0 encoding=UTF-8?
  response
 
  lst name=responseHeader
   int name=status0/int
   int name=QTime3/int
   lst name=params
   str name=debugQueryon/str
   str name=start0/str
 
   str name=rows10/str
   arr name=indent
 stron/str
 stron/str
   /arr
   str name=flitemName,score/str
   str name=version2.2/str
 
   str name=qitemName:ipad/str
   /lst
  /lst
  result name=response numFound=161 start=0 maxScore=3.6808658
   doc
   float name=score3.6808658/float
   str name=itemNameLivro - IPAD - O Guia do Profissional/str
   /doc
 
   doc
   float name=score3.1550279/float
   str name=itemNameLeitor de Cartão para Ipad - Mobimax/str
   /doc
   doc
   float name=score3.1550279/float
   str name=itemNameSleeve para iPad/str
 
   /doc
   doc
   float name=score3.1550279/float
   str name=itemNameSleeve de Neoprene para iPad/str
   /doc
   doc
   float name=score3.1550279/float
 
   str name=itemNameCarregador de parede para iPad/str
   /doc
   doc
   float name=score2.6291897/float
   str name=itemNameCase Envelope para iPad - Black - Built NY/str
   /doc
   doc
 
   float name=score2.6291897/float
   str name=itemNameCase Protetora p/ IPad de Silicone Duo - Browm
  - Iskin/str
   /doc
   doc
   float name=score2.6291897/float
   str name=itemNameCase Protetora p/ IPad de Silicone Duo - Clear
  - Iskin/str
   /doc
 
   doc
   float name=score2.6291897/float
   str name=itemNameCase p/ iPad Sleeve - Black - Built NY/str
   /doc
   doc
   float name=score2.6291897/float
   str name=itemNameBolsa de Proteção p/ iPad Preta - Geonav/str
 
   /doc
  /result
  lst name=debug
   str name=rawquerystringitemName:ipad/str
   str name=querystringitemName:ipad/str
   str name=parsedqueryitemName:ipad/str
   str name=parsedquery_toStringitemName:ipad/str
   lst name=explain
 
   str name=7369507
  3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.4375 = fieldNorm(field=itemName, doc=102507)
  /str
   str name=739
  3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.375 = fieldNorm(field=itemName, doc=226401)
  /str
   str name=7356941
  3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.375 = fieldNorm(field=itemName, doc=226409)
  /str
   str name=7356931
  3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.375 = fieldNorm(field=itemName, doc=226447)
  /str
   str name=7360321
 
  3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.375 = fieldNorm(field=itemName, doc=226583)
  /str
   str name=7428354
  2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.3125 = fieldNorm(field=itemName, doc=223178)
  /str
   str name=7366074
  2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.3125 = fieldNorm(field=itemName, doc=223196)
  /str
   str name=7366068
  2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, maxDocs=275239)
   0.3125 = fieldNorm(field=itemName, doc=223831)
  /str
   str name=7428358
  2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223856), product of:
   1.0 = tf(termFreq(itemName:ipad)=1)
   8.413407 = idf(docFreq=165, 

Re: Understanding query explain information

2011-06-22 Thread lee carroll
Hi are you using synonyms ?



On 22 June 2011 10:30, Alexander Ramos Jardim
alexander.ramos.jar...@gmail.com wrote:
 Hi guys,

 I am getting some doubts about how to correctly understand the debugQuery
 output. I have a field named itemName in my index. This is a text field,
 just that. When I quqery a simple ?q=itemName:iPad , I end up with the
 following query result.

 Simply trying to understand why these strings generated such scores, and as
 far as I can understand, the only difference between them is the field
 norms, as all the other results maintain themselves.

 Now, how do I get these field norm values? Field Norm is the result of this
 formula right?

 *1/square root of (terms)*,* where terms is the number of terms in my field
 after it is indexed*


 Well, if this is true, the field norm for my first document should be 0.5
 (1/sqrt(4)) as  Livro - IPAD - O Guia do Profissional ends up with the
 terms livro|ipad|guia|profissional as tokens.

 What I am forgetting to take into account?

 ?xml version=1.0 encoding=UTF-8?
 response

 lst name=responseHeader
  int name=status0/int
  int name=QTime3/int
  lst name=params
  str name=debugQueryon/str
  str name=start0/str

  str name=rows10/str
  arr name=indent
        stron/str
        stron/str
  /arr
  str name=flitemName,score/str
  str name=version2.2/str

  str name=qitemName:ipad/str
  /lst
 /lst
 result name=response numFound=161 start=0 maxScore=3.6808658
  doc
  float name=score3.6808658/float
  str name=itemNameLivro - IPAD - O Guia do Profissional/str
  /doc

  doc
  float name=score3.1550279/float
  str name=itemNameLeitor de Cartão para Ipad - Mobimax/str
  /doc
  doc
  float name=score3.1550279/float
  str name=itemNameSleeve para iPad/str

  /doc
  doc
  float name=score3.1550279/float
  str name=itemNameSleeve de Neoprene para iPad/str
  /doc
  doc
  float name=score3.1550279/float

  str name=itemNameCarregador de parede para iPad/str
  /doc
  doc
  float name=score2.6291897/float
  str name=itemNameCase Envelope para iPad - Black - Built NY/str
  /doc
  doc

  float name=score2.6291897/float
  str name=itemNameCase Protetora p/ IPad de Silicone Duo - Browm
 - Iskin/str
  /doc
  doc
  float name=score2.6291897/float
  str name=itemNameCase Protetora p/ IPad de Silicone Duo - Clear
 - Iskin/str
  /doc

  doc
  float name=score2.6291897/float
  str name=itemNameCase p/ iPad Sleeve - Black - Built NY/str
  /doc
  doc
  float name=score2.6291897/float
  str name=itemNameBolsa de Proteção p/ iPad Preta - Geonav/str

  /doc
 /result
 lst name=debug
  str name=rawquerystringitemName:ipad/str
  str name=querystringitemName:ipad/str
  str name=parsedqueryitemName:ipad/str
  str name=parsedquery_toStringitemName:ipad/str
  lst name=explain

  str name=7369507
 3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.4375 = fieldNorm(field=itemName, doc=102507)
 /str
  str name=739
 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226401)
 /str
  str name=7356941
 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226409)
 /str
  str name=7356931
 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226447)
 /str
  str name=7360321

 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226583)
 /str
  str name=7428354
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223178)
 /str
  str name=7366074
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223196)
 /str
  str name=7366068
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223831)
 /str
  str name=7428358
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223856), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223856)

 /str
  str name=7422680
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223908), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 =