Re: Ignore a field in the scoring
Thank you very much 2015-01-08 4:35 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com: Hi, I believe it's intended according to https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html . It says: -- Note that CollectionStatistics.maxDoc() is used instead of IndexReader#numDocs() because also TermStatistics.docFreq() is used, and when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in the same direction. In addition, CollectionStatistics.maxDoc() is more efficient to compute -- Masaru On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Thank you for your explanation Do you know if it is a bug of intended behavior? I don't think deleted (marked as deleted) docs should be used at all 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com: Hi, Update is delete and add. I mean, instead of updating existing document, it deletes it and adds it as new document. And those deleted documents are just marked as deleted and aren’t actually removed from index until the segment merge. IDF doesn’t take those deleted-but-not-removed document into account (it counts those documents). That’s the reason you see different IDF score (you see both maxDocs and docFreq are incremented). Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID). But when the document is updated (delete + add), it got new ID 0 in new segment. So, I think it’s not possible to keep score when you update documents. You can run optimise with max_num_segments=1 every time you update documents but it’s not practical (and until optimise is done, you see different score) Masaru -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533-8TBoyPmfpqj12T_TVb4z%2BrgLKqtuOxRfReajti7WfA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Ignore a field in the scoring
Hi, I believe it's intended according to https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html . It says: -- Note that CollectionStatistics.maxDoc() is used instead of IndexReader#numDocs() because also TermStatistics.docFreq() is used, and when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in the same direction. In addition, CollectionStatistics.maxDoc() is more efficient to compute -- Masaru On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Thank you for your explanation Do you know if it is a bug of intended behavior? I don't think deleted (marked as deleted) docs should be used at all 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa haniomas...@gmail.com: Hi, Update is delete and add. I mean, instead of updating existing document, it deletes it and adds it as new document. And those deleted documents are just marked as deleted and aren’t actually removed from index until the segment merge. IDF doesn’t take those deleted-but-not-removed document into account (it counts those documents). That’s the reason you see different IDF score (you see both maxDocs and docFreq are incremented). Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID). But when the document is updated (delete + add), it got new ID 0 in new segment. So, I think it’s not possible to keep score when you update documents. You can run optimise with max_num_segments=1 every time you update documents but it’s not practical (and until optimise is done, you see different score) Masaru -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Ignore a field in the scoring
Hi, Update is delete and add. I mean, instead of updating existing document, it deletes it and adds it as new document. And those deleted documents are just marked as deleted and aren’t actually removed from index until the segment merge. IDF doesn’t take those deleted-but-not-removed document into account (it counts those documents). That’s the reason you see different IDF score (you see both maxDocs and docFreq are incremented). Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID). But when the document is updated (delete + add), it got new ID 0 in new segment. So, I think it’s not possible to keep score when you update documents. You can run optimise with max_num_segments=1 every time you update documents but it’s not practical (and until optimise is done, you see different score) Masaru -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local. For more options, visit https://groups.google.com/d/optout.
Re: Ignore a field in the scoring
Now I ran the query with explain = true. The results are the following: *Explain before the update:* details: [ { value: 5.752348, description: fieldWeight in 424, product of:, details: [ { value: 1, description: tf(freq=1.0), with freq of:, details: [ { value: 1, description: termFreq=1.0 } ] }, { value: 9.203756, description: idf(docFreq=201, maxDocs=738240) }, { value: 0.625, description: fieldNorm(doc=424) } ] } ] *Update script (scriptLang = groovy, profileId = 1):* if (ctx._source.bookmarked_by == null) { ctx._source.bookmarked_by = [profileId] } else if (ctx._source.bookmarked_by.contains(profileId)) { ctx.op = none } else { ctx._source.bookmarked_by += profileId } *Explain after the update:* details: [ { value: 5.749262, description: fieldWeight in 0, product of:, details: [ { value: 1, description: tf(freq=1.0), with freq of:, details: [ { value: 1, description: termFreq=1.0 } ] }, { value: 9.198819, description: idf(docFreq=202, maxDocs=738241) }, { value: 0.625, description: fieldNorm(doc=0) } ] } ] * Query used with the explain:* { query: { query_string: { fields: [ name ], query: roger } } } The inverse document frequency (idf) is changed after adding a new field that is not used in the query. Also, it changed the fieldWeight in 424 and fieldNorm(doc=424) to fieldWeight in 0 and fieldNorm(doc=0) (idk if it changes something) Can someone help me on how to not change the score of the document after running the update? Note that the update creates a new field if it was not found (== null), but this field is not used in the query 2015-01-05 13:35 GMT-02:00 Roger de Cordova Farias roger.far...@fontec.inf.br: The added field is an array of Integers, but we are not using it in the query at all We are not querying the _all field, it is disabled in our type mapping Our query is something like this: { query: { query_string: { fields: [ name ], query: roger } } } I ran this query. In the first result, I added a new field called bookmarked_by with a numeric value. Then I ran the same query again. The document in which I added the new field is no longer the first result 2014-12-26 17:34 GMT-02:00 Doug Turnbull dturnb...@opensourceconnections.com: Are you querying the _all field? How are you doing your searches? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html The _all field receives a copy of every field you index, so adding data here could impact scores regardless of the source field. Otherwise, fields are scored independently before being put together by other queries like boolean queries or dismax. Are you using boolean/dismax/etc over multiple fields? -Doug On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote: Use the field in a filter and not part of the query. Is this field free text? Ivan On Dec 23, 2014 9:12 PM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a user searches for documents, they can bookmark them (what means we add more metadata to the document), then in the next search with the same query the bookmarked document appears in a lower (worse) position Is there a way to completely ignore one or more specific fields in the scoring of every query? as in indexing time or something? Note that we are not using the metadata field in the query, but yet it lowers the score of every query We cannot set the index attribute of this field to no because we are gonna use it in other queries -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: Ignore a field in the scoring
The added field is an array of Integers, but we are not using it in the query at all We are not querying the _all field, it is disabled in our type mapping Our query is something like this: { query: { query_string: { fields: [ name ], query: roger } } } I ran this query. In the first result, I added a new field called bookmarked_by with a numeric value. Then I ran the same query again. The document in which I added the new field is no longer the first result 2014-12-26 17:34 GMT-02:00 Doug Turnbull dturnb...@opensourceconnections.com: Are you querying the _all field? How are you doing your searches? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html The _all field receives a copy of every field you index, so adding data here could impact scores regardless of the source field. Otherwise, fields are scored independently before being put together by other queries like boolean queries or dismax. Are you using boolean/dismax/etc over multiple fields? -Doug On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote: Use the field in a filter and not part of the query. Is this field free text? Ivan On Dec 23, 2014 9:12 PM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a user searches for documents, they can bookmark them (what means we add more metadata to the document), then in the next search with the same query the bookmarked document appears in a lower (worse) position Is there a way to completely ignore one or more specific fields in the scoring of every query? as in indexing time or something? Note that we are not using the metadata field in the query, but yet it lowers the score of every query We cannot set the index attribute of this field to no because we are gonna use it in other queries -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533UjpAz2dvNitdD-%3DaoXL9rrkZdd%3DzC3LZz8xWYvBAoFQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Ignore a field in the scoring
Use the field in a filter and not part of the query. Is this field free text? Ivan On Dec 23, 2014 9:12 PM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a user searches for documents, they can bookmark them (what means we add more metadata to the document), then in the next search with the same query the bookmarked document appears in a lower (worse) position Is there a way to completely ignore one or more specific fields in the scoring of every query? as in indexing time or something? Note that we are not using the metadata field in the query, but yet it lowers the score of every query We cannot set the index attribute of this field to no because we are gonna use it in other queries -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Ignore a field in the scoring
Are you querying the _all field? How are you doing your searches? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html The _all field receives a copy of every field you index, so adding data here could impact scores regardless of the source field. Otherwise, fields are scored independently before being put together by other queries like boolean queries or dismax. Are you using boolean/dismax/etc over multiple fields? -Doug On Fri, Dec 26, 2014 at 11:59 AM, Ivan Brusic i...@brusic.com wrote: Use the field in a filter and not part of the query. Is this field free text? Ivan On Dec 23, 2014 9:12 PM, Roger de Cordova Farias roger.far...@fontec.inf.br wrote: Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a user searches for documents, they can bookmark them (what means we add more metadata to the document), then in the next search with the same query the bookmarked document appears in a lower (worse) position Is there a way to completely ignore one or more specific fields in the scoring of every query? as in indexing time or something? Note that we are not using the metadata field in the query, but yet it lowers the score of every query We cannot set the index attribute of this field to no because we are gonna use it in other queries -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAKsYquQJMbfztJ%2Ba2_jpi-fVG%3DvcnXYHS-7bKvaOX4hA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL9ND_SWteSetZL9059WyGRZvJrO2k4PQ9FQ1zUFhjbsxw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Ignore a field in the scoring
Hello Our documents have metadata indexed with them, but we don't want the metadata to interfere in the scoring After a user searches for documents, they can bookmark them (what means we add more metadata to the document), then in the next search with the same query the bookmarked document appears in a lower (worse) position Is there a way to completely ignore one or more specific fields in the scoring of every query? as in indexing time or something? Note that we are not using the metadata field in the query, but yet it lowers the score of every query We cannot set the index attribute of this field to no because we are gonna use it in other queries -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533Rjjec4SwXe_p-0eHYkkyEegFyP9DUMGQfHhua8ZyMWQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.