Re: Ignore a field in the scoring

2015-01-08 Thread Roger de Cordova Farias
Thank you very much

2015-01-08 4:35 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com:

 Hi,

 I believe it's intended according to
 https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
 .
 It says:
 --
 Note that CollectionStatistics.maxDoc() is used instead of
 IndexReader#numDocs() because also TermStatistics.docFreq() is used, and
 when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in
 the same direction. In addition, CollectionStatistics.maxDoc() is more
 efficient to compute
 --

 Masaru

 On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Thank you for your explanation

 Do you know if it is a bug of intended behavior?

 I don't think deleted (marked as deleted) docs should be used at all

 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com:

 Hi,

 Update is delete and add. I mean, instead of updating existing document,
 it deletes it and adds it as new document.
 And those deleted documents are just marked as deleted and aren’t
 actually removed from index until the segment merge.

 IDF doesn’t take those deleted-but-not-removed document into account (it
 counts those documents).
 That’s the reason you see different IDF score (you see both maxDocs and
 docFreq are incremented).

 Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID).
 But when the document is updated (delete + add), it got new ID 0 in new
 segment.

 So, I think it’s not possible to keep score when you update documents.
 You can run optimise with max_num_segments=1 every time you update
 documents but it’s not practical (and until optimise is done, you see
 different score)


 Masaru



 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533-8TBoyPmfpqj12T_TVb4z%2BrgLKqtuOxRfReajti7WfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ignore a field in the scoring

2015-01-07 Thread Masaru Hasegawa
Hi,

I believe it's intended according to
https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
.
It says:
--
Note that CollectionStatistics.maxDoc() is used instead of
IndexReader#numDocs() because also TermStatistics.docFreq() is used, and
when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in
the same direction. In addition, CollectionStatistics.maxDoc() is more
efficient to compute
--

Masaru

On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias 
roger.far...@fontec.inf.br wrote:

 Thank you for your explanation

 Do you know if it is a bug of intended behavior?

 I don't think deleted (marked as deleted) docs should be used at all

 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com:

 Hi,

 Update is delete and add. I mean, instead of updating existing document,
 it deletes it and adds it as new document.
 And those deleted documents are just marked as deleted and aren’t
 actually removed from index until the segment merge.

 IDF doesn’t take those deleted-but-not-removed document into account (it
 counts those documents).
 That’s the reason you see different IDF score (you see both maxDocs and
 docFreq are incremented).

 Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID). But
 when the document is updated (delete + add), it got new ID 0 in new segment.

 So, I think it’s not possible to keep score when you update documents.
 You can run optimise with max_num_segments=1 every time you update
 documents but it’s not practical (and until optimise is done, you see
 different score)


 Masaru



 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ignore a field in the scoring

2015-01-06 Thread Masaru Hasegawa
Hi,

Update is delete and add. I mean, instead of updating existing document, it 
deletes it and adds it as new document.
And those deleted documents are just marked as deleted and aren’t actually 
removed from index until the segment merge.

IDF doesn’t take those deleted-but-not-removed document into account (it counts 
those documents).
That’s the reason you see different IDF score (you see both maxDocs and docFreq 
are incremented).

Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID). But when 
the document is updated (delete + add), it got new ID 0 in new segment.

So, I think it’s not possible to keep score when you update documents.
You can run optimise with max_num_segments=1 every time you update documents 
but it’s not practical (and until optimise is done, you see different score)


Masaru



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local.
For more options, visit https://groups.google.com/d/optout.


Re: Ignore a field in the scoring

2015-01-05 Thread Roger de Cordova Farias
Now I ran the query with explain = true. The results are the following:


*Explain before the update:*


   details: [
 {
   value: 5.752348,
   description: fieldWeight in 424, product of:,
   details: [
 {
   value: 1,
   description: tf(freq=1.0), with freq of:,
   details: [
 {
   value: 1,
   description: termFreq=1.0
 }
   ]
 },
 {
   value: 9.203756,
   description: idf(docFreq=201, maxDocs=738240)
 },
 {
   value: 0.625,
   description: fieldNorm(doc=424)
 }
   ]
 }
   ]



*Update script (scriptLang = groovy, profileId = 1):*

if (ctx._source.bookmarked_by == null) {
 ctx._source.bookmarked_by = [profileId]
 } else if (ctx._source.bookmarked_by.contains(profileId)) {
 ctx.op = none
 } else {
 ctx._source.bookmarked_by += profileId
 }



*Explain after the update:*

  details: [
 {
   value: 5.749262,
   description: fieldWeight in 0, product of:,
   details: [
 {
   value: 1,
   description: tf(freq=1.0), with freq of:,
   details: [
 {
   value: 1,
   description: termFreq=1.0
 }
   ]
 },
 {
   value: 9.198819,
   description: idf(docFreq=202, maxDocs=738241)
 },
 {
   value: 0.625,
   description: fieldNorm(doc=0)
 }
   ]
 }
   ]



* Query used with the explain:*

{
   query: {
 query_string: {
   fields: [
 name
   ],
   query: roger
 }
   }
 }





The inverse document frequency (idf) is changed after adding a new field
that is not used in the query. Also, it changed the fieldWeight in 424
and fieldNorm(doc=424) to  fieldWeight in 0 and fieldNorm(doc=0) (idk
if it changes something)

Can someone help me on how to not change the score of the document after
running the update? Note that the update creates a new field if it was not
found (== null), but this field is not used in the query

2015-01-05 13:35 GMT-02:00 Roger de Cordova Farias 
roger.far...@fontec.inf.br:

 The added field is an array of Integers, but we are not using it in the
 query at all

 We are not querying the _all field, it is disabled in our type mapping

 Our query is something like this:

 {
   query: {
 query_string: {
   fields: [
 name
   ],
   query: roger
 }
   }
 }


 I ran this query. In the first result, I added a new field called
 bookmarked_by with a numeric value. Then I ran the same query again. The
 document in which I added the new field is no longer the first result

 2014-12-26 17:34 GMT-02:00 Doug Turnbull 
 dturnb...@opensourceconnections.com:

 Are you querying the _all field? How are you doing your searches?

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html

 The _all field receives a copy of every  field you index, so adding data
 here could impact scores regardless of the source field.

 Otherwise, fields are scored independently before being put together by
 other queries like boolean queries or dismax. Are you using
 boolean/dismax/etc over multiple fields?

 -Doug

 On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote:

 Use the field in a filter and not part of the query. Is this field free
 text?

 Ivan
 On Dec 23, 2014 9:12 PM, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Hello

 Our documents have metadata indexed with them, but we don't want the
 metadata to interfere in the scoring

 After a user searches for documents, they can bookmark them (what means
 we add more metadata to the document), then in the next search with the
 same query the bookmarked document  appears in a lower (worse) position

 Is there a way to completely ignore one or more specific fields in the
 scoring of every query? as in indexing time or something?

 Note that we are not using the metadata field in the query, but yet it
 lowers the score of every query

 We cannot set the index attribute of this field to no because we
 are gonna use it in other queries

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 

Re: Ignore a field in the scoring

2015-01-05 Thread Roger de Cordova Farias
The added field is an array of Integers, but we are not using it in the
query at all

We are not querying the _all field, it is disabled in our type mapping

Our query is something like this:

{
   query: {
 query_string: {
   fields: [
 name
   ],
   query: roger
 }
   }
 }


I ran this query. In the first result, I added a new field called
bookmarked_by with a numeric value. Then I ran the same query again. The
document in which I added the new field is no longer the first result

2014-12-26 17:34 GMT-02:00 Doug Turnbull 
dturnb...@opensourceconnections.com:

 Are you querying the _all field? How are you doing your searches?

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html

 The _all field receives a copy of every  field you index, so adding data
 here could impact scores regardless of the source field.

 Otherwise, fields are scored independently before being put together by
 other queries like boolean queries or dismax. Are you using
 boolean/dismax/etc over multiple fields?

 -Doug

 On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote:

 Use the field in a filter and not part of the query. Is this field free
 text?

 Ivan
 On Dec 23, 2014 9:12 PM, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Hello

 Our documents have metadata indexed with them, but we don't want the
 metadata to interfere in the scoring

 After a user searches for documents, they can bookmark them (what means
 we add more metadata to the document), then in the next search with the
 same query the bookmarked document  appears in a lower (worse) position

 Is there a way to completely ignore one or more specific fields in the
 scoring of every query? as in indexing time or something?

 Note that we are not using the metadata field in the query, but yet it
 lowers the score of every query

 We cannot set the index attribute of this field to no because we are
 gonna use it in other queries

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 --
 Doug Turnbull
 Search  Big Data Architect
 OpenSource Connections http://o19s.com

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533UjpAz2dvNitdD-%3DaoXL9rrkZdd%3DzC3LZz8xWYvBAoFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ignore a field in the scoring

2014-12-26 Thread Ivan Brusic
Use the field in a filter and not part of the query. Is this field free
text?

Ivan
On Dec 23, 2014 9:12 PM, Roger de Cordova Farias 
roger.far...@fontec.inf.br wrote:

 Hello

 Our documents have metadata indexed with them, but we don't want the
 metadata to interfere in the scoring

 After a user searches for documents, they can bookmark them (what means we
 add more metadata to the document), then in the next search with the same
 query the bookmarked document  appears in a lower (worse) position

 Is there a way to completely ignore one or more specific fields in the
 scoring of every query? as in indexing time or something?

 Note that we are not using the metadata field in the query, but yet it
 lowers the score of every query

 We cannot set the index attribute of this field to no because we are
 gonna use it in other queries

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ignore a field in the scoring

2014-12-26 Thread Doug Turnbull
Are you querying the _all field? How are you doing your searches?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html

The _all field receives a copy of every  field you index, so adding data
here could impact scores regardless of the source field.

Otherwise, fields are scored independently before being put together by
other queries like boolean queries or dismax. Are you using
boolean/dismax/etc over multiple fields?

-Doug

On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote:

 Use the field in a filter and not part of the query. Is this field free
 text?

 Ivan
 On Dec 23, 2014 9:12 PM, Roger de Cordova Farias 
 roger.far...@fontec.inf.br wrote:

 Hello

 Our documents have metadata indexed with them, but we don't want the
 metadata to interfere in the scoring

 After a user searches for documents, they can bookmark them (what means
 we add more metadata to the document), then in the next search with the
 same query the bookmarked document  appears in a lower (worse) position

 Is there a way to completely ignore one or more specific fields in the
 scoring of every query? as in indexing time or something?

 Note that we are not using the metadata field in the query, but yet it
 lowers the score of every query

 We cannot set the index attribute of this field to no because we are
 gonna use it in other queries

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Doug Turnbull
Search  Big Data Architect
OpenSource Connections http://o19s.com

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Ignore a field in the scoring

2014-12-23 Thread Roger de Cordova Farias
Hello

Our documents have metadata indexed with them, but we don't want the
metadata to interfere in the scoring

After a user searches for documents, they can bookmark them (what means we
add more metadata to the document), then in the next search with the same
query the bookmarked document  appears in a lower (worse) position

Is there a way to completely ignore one or more specific fields in the
scoring of every query? as in indexing time or something?

Note that we are not using the metadata field in the query, but yet it
lowers the score of every query

We cannot set the index attribute of this field to no because we are
gonna use it in other queries

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.