[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene

2011-10-04 Thread Erick Erickson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120042#comment-13120042
 ] 

Erick Erickson commented on LUCENE-2091:


Should this be closed as duplicate of LUCENE-2959?

> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene

2011-06-16 Thread ian towey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050508#comment-13050508
 ] 

ian towey commented on LUCENE-2091:
---

Not sure am i using this BM25BooleanQuery correctly, getting variation in the 
number of hits when testing v QueryParser. Is there limitations to the query 
string that BM25BooleanQuery can deal with, e.g.  "gas OR ((oil AND car) NOT 
ship)", the results returned by BM25BooleanQuery seem to be the all docs that 
don't contain the term "ship", (comparing  BM25BooleanQuery v QueryParser)


> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene

2011-05-17 Thread Shrinath (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034815#comment-13034815
 ] 

Shrinath commented on LUCENE-2091:
--

Hi, 

Don't be harsh if I am asking this in a wrong place, 
but could someone tell me if the linked patch is better than 
http://nlp.uned.es/~jperezi/Lucene-BM25/ 


> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2011-03-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005688#comment-13005688
 ] 

Robert Muir commented on LUCENE-2091:
-

{quote}
your attachment (BM25SimilarityProvider) seems to rely on some other code 
(Stats.DocFieldStats) & AggregatesProvider .. which I guess is part of your DFR 
patch.. can you provide a pointer to that.
{quote}

Yeah this is from LUCENE-2392. Unfortunately it won't work with the most recent 
patch there, but both patches are just really exploration to see how we can 
divide into subtasks.

For an update, the JIRA issues aren't well linked but we have actually made 
pretty good progress on some major portions (imo these are the most 
interesting):
* Collection term stats: LUCENE-2862
* per-field similarity: LUCENE-2236
* termstate, to avoid redundant i/o for stats: LUCENE-2694
* norms cleanup: LUCENE-2771, LUCENE-2846

The next big step is to separate scoring from matching (see the latest patch on 
LUCENE-2392) so that similarity has full responsibility for all calculations, 
and so we get full integration with all queries, etc.

This isn't that complicated: however, in order to do this, we need to first 
refactor Explanations, so that a Similarity has the capability (and 
responsibility!) to fully explain its calculations. So I think this is the next 
issue to resolve before going any further.


> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2011-03-10 Thread Ian Holsman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005428#comment-13005428
 ] 

Ian Holsman commented on LUCENE-2091:
-

Hi Rob.
your attachment (BM25SimilarityProvider) seems to rely on some other code 
(Stats.DocFieldStats) & AggregatesProvider .. which I guess is part of your DFR 
patch.. can you provide a pointer to that..

TIA
also I'm guessing that those rely on 2392, and provides an alternate 
implementation to this. Should we just close this as a duplicate to 2392 ?

> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-05-24 Thread Yuval Feinstein (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870605#action_12870605
 ] 

Yuval Feinstein commented on LUCENE-2091:
-

@Vinay - I have this suggestion. I am unsure whether it will work. 
First, I would implement the BM25BooleanQuery, and use it to create a 
QueryWrapperFilter qwf.
(See 
http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/search/QueryWrapperFilter.html)
Next, I would create a Phrase query, and call search(phraseQuery, qwf, 50).
This way, the scorer will first look for matches for the BM25 query, and later 
look among them for matches for the phrase query.
Hope this is understandable.
-- Yuval 

> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-05-19 Thread Vinay Setty (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869064#action_12869064
 ] 

Vinay Setty commented on LUCENE-2091:
-

@Joaquin, I want to use BM25 scoring for evaluating phrase queries, I have 
created a positional index in Lucene, but have no clue how to use it for 
evaluating phrase queries using BM25 scorer. I had a quick look at the code, by 
default the queries are boolean, and could not find a easy way to make it 
phrase query. Any ideas?

> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org