[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120042#comment-13120042 ] Erick Erickson commented on LUCENE-2091: Should this be closed as duplicate of LUCENE-2959? > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/other >Reporter: Yuval Feinstein >Priority: Minor > Fix For: 4.0 > > Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, > persianlucene.jpg > > Original Estimate: 48h > Remaining Estimate: 48h > > http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of > Okapi-BM25 scoring in the Lucene framework, > as an alternative to the standard Lucene scoring (which is a version of mixed > boolean/TFIDF). > I have refactored this a bit, added unit tests and improved the runtime > somewhat. > I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050508#comment-13050508 ] ian towey commented on LUCENE-2091: --- Not sure am i using this BM25BooleanQuery correctly, getting variation in the number of hits when testing v QueryParser. Is there limitations to the query string that BM25BooleanQuery can deal with, e.g. "gas OR ((oil AND car) NOT ship)", the results returned by BM25BooleanQuery seem to be the all docs that don't contain the term "ship", (comparing BM25BooleanQuery v QueryParser) > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/other >Reporter: Yuval Feinstein >Priority: Minor > Fix For: 4.0 > > Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, > persianlucene.jpg > > Original Estimate: 48h > Remaining Estimate: 48h > > http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of > Okapi-BM25 scoring in the Lucene framework, > as an alternative to the standard Lucene scoring (which is a version of mixed > boolean/TFIDF). > I have refactored this a bit, added unit tests and improved the runtime > somewhat. > I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034815#comment-13034815 ] Shrinath commented on LUCENE-2091: -- Hi, Don't be harsh if I am asking this in a wrong place, but could someone tell me if the linked patch is better than http://nlp.uned.es/~jperezi/Lucene-BM25/ > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/other >Reporter: Yuval Feinstein >Priority: Minor > Fix For: 4.0 > > Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, > persianlucene.jpg > > Original Estimate: 48h > Remaining Estimate: 48h > > http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of > Okapi-BM25 scoring in the Lucene framework, > as an alternative to the standard Lucene scoring (which is a version of mixed > boolean/TFIDF). > I have refactored this a bit, added unit tests and improved the runtime > somewhat. > I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005688#comment-13005688 ] Robert Muir commented on LUCENE-2091: - {quote} your attachment (BM25SimilarityProvider) seems to rely on some other code (Stats.DocFieldStats) & AggregatesProvider .. which I guess is part of your DFR patch.. can you provide a pointer to that. {quote} Yeah this is from LUCENE-2392. Unfortunately it won't work with the most recent patch there, but both patches are just really exploration to see how we can divide into subtasks. For an update, the JIRA issues aren't well linked but we have actually made pretty good progress on some major portions (imo these are the most interesting): * Collection term stats: LUCENE-2862 * per-field similarity: LUCENE-2236 * termstate, to avoid redundant i/o for stats: LUCENE-2694 * norms cleanup: LUCENE-2771, LUCENE-2846 The next big step is to separate scoring from matching (see the latest patch on LUCENE-2392) so that similarity has full responsibility for all calculations, and so we get full integration with all queries, etc. This isn't that complicated: however, in order to do this, we need to first refactor Explanations, so that a Similarity has the capability (and responsibility!) to fully explain its calculations. So I think this is the next issue to resolve before going any further. > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >Reporter: Yuval Feinstein >Priority: Minor > Fix For: 4.0 > > Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, > persianlucene.jpg > > Original Estimate: 48h > Remaining Estimate: 48h > > http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of > Okapi-BM25 scoring in the Lucene framework, > as an alternative to the standard Lucene scoring (which is a version of mixed > boolean/TFIDF). > I have refactored this a bit, added unit tests and improved the runtime > somewhat. > I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005428#comment-13005428 ] Ian Holsman commented on LUCENE-2091: - Hi Rob. your attachment (BM25SimilarityProvider) seems to rely on some other code (Stats.DocFieldStats) & AggregatesProvider .. which I guess is part of your DFR patch.. can you provide a pointer to that.. TIA also I'm guessing that those rely on 2392, and provides an alternate implementation to this. Should we just close this as a duplicate to 2392 ? > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >Reporter: Yuval Feinstein >Priority: Minor > Fix For: 4.0 > > Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, > persianlucene.jpg > > Original Estimate: 48h > Remaining Estimate: 48h > > http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of > Okapi-BM25 scoring in the Lucene framework, > as an alternative to the standard Lucene scoring (which is a version of mixed > boolean/TFIDF). > I have refactored this a bit, added unit tests and improved the runtime > somewhat. > I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870605#action_12870605 ] Yuval Feinstein commented on LUCENE-2091: - @Vinay - I have this suggestion. I am unsure whether it will work. First, I would implement the BM25BooleanQuery, and use it to create a QueryWrapperFilter qwf. (See http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/search/QueryWrapperFilter.html) Next, I would create a Phrase query, and call search(phraseQuery, qwf, 50). This way, the scorer will first look for matches for the BM25 query, and later look among them for matches for the phrase query. Hope this is understandable. -- Yuval > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >Reporter: Yuval Feinstein >Priority: Minor > Fix For: 4.0 > > Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, > persianlucene.jpg > > Original Estimate: 48h > Remaining Estimate: 48h > > http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of > Okapi-BM25 scoring in the Lucene framework, > as an alternative to the standard Lucene scoring (which is a version of mixed > boolean/TFIDF). > I have refactored this a bit, added unit tests and improved the runtime > somewhat. > I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869064#action_12869064 ] Vinay Setty commented on LUCENE-2091: - @Joaquin, I want to use BM25 scoring for evaluating phrase queries, I have created a positional index in Lucene, but have no clue how to use it for evaluating phrase queries using BM25 scorer. I had a quick look at the code, by default the queries are boolean, and could not find a easy way to make it phrase query. Any ideas? > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >Reporter: Yuval Feinstein >Priority: Minor > Fix For: 4.0 > > Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, > persianlucene.jpg > > Original Estimate: 48h > Remaining Estimate: 48h > > http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of > Okapi-BM25 scoring in the Lucene framework, > as an alternative to the standard Lucene scoring (which is a version of mixed > boolean/TFIDF). > I have refactored this a bit, added unit tests and improved the runtime > somewhat. > I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org