[jira] [Updated] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated SOLR-2754: -- Attachment: SOLR-2754.patch Done. > create Solr similarity factories for new ranking algorithms > --- > > Key: SOLR-2754 > URL: https://issues.apache.org/jira/browse/SOLR-2754 > Project: Solr > Issue Type: New Feature >Affects Versions: 4.0 >Reporter: Robert Muir >Assignee: Robert Muir > Attachments: SOLR-2754.patch, SOLR-2754.patch > > > To make it easy to use some of the new ranking algorithms, we should add > factories to solr: > * for parametric models like LM and BM25 so that parameters can be set from > schema.xml > * for framework models like DFR and IB, so that different basic > models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107069#comment-13107069 ] David Mark Nemeskey commented on SOLR-2754: --- bq. Well, we can do both: we can provide these basic parameters as default values to be friendly, but at the same time in the test or example xml configurations that use these, our examples can have the parameters set. That's a good idea. I could modify the patch if you want to, and also break the long lines into two in the meantime. > create Solr similarity factories for new ranking algorithms > --- > > Key: SOLR-2754 > URL: https://issues.apache.org/jira/browse/SOLR-2754 > Project: Solr > Issue Type: New Feature >Affects Versions: 4.0 >Reporter: Robert Muir >Assignee: Robert Muir > Attachments: SOLR-2754.patch > > > To make it easy to use some of the new ranking algorithms, we should add > factories to solr: > * for parametric models like LM and BM25 so that parameters can be set from > schema.xml > * for framework models like DFR and IB, so that different basic > models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106795#comment-13106795 ] David Mark Nemeskey commented on SOLR-2754: --- bq. Alternative, another idea would be for all 'parametric' models to require the parameter? ... Part of me likes this solution: if you are using a parametric model then it requires you to think about it? I can understand the reasoning behind this idea. On the other hand, for some models, the parameter has a value that's optimal in a wide range of cases. In such cases, I think it we could make the life of the user easier by falling back to this value. (Actually, that's why {{LMJelinekMercerSimilarity}} does not have a default constructor; there is no single parameter value that is kind-of-optimal in all cases). bq. But i started thinking about this, say I created NormalizationRob, and it wants a bunch of parameters... Yes, I know, it'd be a bit difficult to support that... maybe if all Similarities and models had a constructor with a map as a parameter? I'm not sure we want that, though. bq. I think the intent here is to support all of lucene-core's capabilities? In that case let's forget reflection for now. > create Solr similarity factories for new ranking algorithms > --- > > Key: SOLR-2754 > URL: https://issues.apache.org/jira/browse/SOLR-2754 > Project: Solr > Issue Type: New Feature >Affects Versions: 4.0 >Reporter: Robert Muir >Assignee: Robert Muir > Attachments: SOLR-2754.patch > > > To make it easy to use some of the new ranking algorithms, we should add > factories to solr: > * for parametric models like LM and BM25 so that parameters can be set from > schema.xml > * for framework models like DFR and IB, so that different basic > models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106683#comment-13106683 ] David Mark Nemeskey commented on SOLR-2754: --- Robert, I've reviewed the patch. Even though I don't have any experience with Solr, the code is very clear, well documented and easy to understand. I have the following observations (or questions, more like): 1. {{LMDirichletSimilarity}} has a mu-less constructor. Maybe we could avoid defining a constant in two places if we used that? E.g. {code} mu = params.getFloat("mu"); ... LMDirichletSimilarity sim = (mu != null) ? new LMDirichletSimilarity(mu) : new LMDirichletSimilarity(); {code} Same goes for H3 and Z. 2. I think it is a nice feature of the new framework that the user can create new basic models, normalizations, distributions, etc. and just plug them in to {{DFRSimilarity}} or {{IBSimilarity}}. However, these factories can only handle those that we have defined ourselves. Wouldn't it be good if we could instantiate custom classes via reflection? It could work similarily as in Terrier: keep the current code for core models, and use reflection if the user specifies a (fully specified) classname. 3. I don't know the Lucene/Solr conventions for line length. There are some rather long lines in IB and DFR, but maybe its not a problem? > create Solr similarity factories for new ranking algorithms > --- > > Key: SOLR-2754 > URL: https://issues.apache.org/jira/browse/SOLR-2754 > Project: Solr > Issue Type: New Feature >Affects Versions: 4.0 >Reporter: Robert Muir >Assignee: Robert Muir > Attachments: SOLR-2754.patch > > > To make it easy to use some of the new ranking algorithms, we should add > factories to solr: > * for parametric models like LM and BM25 so that parameters can be set from > schema.xml > * for framework models like DFR and IB, so that different basic > models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090872#comment-13090872 ] David Mark Nemeskey commented on LUCENE-2959: - Hi Robert, I would very much like to run this test on the other sims as well. How do I do that? David > [GSoC] Implementing State of the Art Ranking for Lucene > --- > > Key: LUCENE-2959 > URL: https://issues.apache.org/jira/browse/LUCENE-2959 > Project: Lucene - Java > Issue Type: New Feature > Components: core/query/scoring, general/javadocs, modules/examples >Reporter: David Mark Nemeskey >Assignee: Robert Muir > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: flexscoring branch > > Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, > proposal.pdf > > > Lucene employs the Vector Space Model (VSM) to rank documents, which compares > unfavorably to state of the art algorithms, such as BM25. Moreover, the > architecture is > tailored specically to VSM, which makes the addition of new ranking functions > a non- > trivial task. > This project aims to bring state of the art ranking methods to Lucene and to > implement a > query architecture with pluggable ranking functions. > The wiki page for the project can be found at > http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089449#comment-13089449 ] David Mark Nemeskey commented on LUCENE-2959: - Robert: maybe we could resolve this issue as well? Once we decide what to do with 3173 -- perhaps a won'tfix? > [GSoC] Implementing State of the Art Ranking for Lucene > --- > > Key: LUCENE-2959 > URL: https://issues.apache.org/jira/browse/LUCENE-2959 > Project: Lucene - Java > Issue Type: New Feature > Components: core/query/scoring, general/javadocs, modules/examples >Reporter: David Mark Nemeskey >Assignee: Robert Muir > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: flexscoring branch > > Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, > proposal.pdf > > > Lucene employs the Vector Space Model (VSM) to rank documents, which compares > unfavorably to state of the art algorithms, such as BM25. Moreover, the > architecture is > tailored specically to VSM, which makes the addition of new ranking functions > a non- > trivial task. > This project aims to bring state of the art ranking methods to Lucene and to > implement a > query architecture with pluggable ranking functions. > The wiki page for the project can be found at > http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3387) Get javadoc for the similarities package in shape
[ https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089443#comment-13089443 ] David Mark Nemeskey commented on LUCENE-3387: - bq. This is because of an out of date regexp in the javadocs construction. I've found that, I just didn't know what to make of it. Since as far as I know a similarities package hadn't existed before I added the new sims, I assumed it was there on purpose. > Get javadoc for the similarities package in shape > - > > Key: LUCENE-3387 > URL: https://issues.apache.org/jira/browse/LUCENE-3387 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, javadoc > Fix For: flexscoring branch > > Attachments: LUCENE-3387.patch, LUCENE-3387.patch > > > 1. Create a package.html in the similarities package. > 2. Update the javadoc of the search package (package.html mentions > Similarity)? > 3. Compile the javadoc to see if there are any warnings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3393) Rename EasySimilarity to SimilarityBase
[ https://issues.apache.org/jira/browse/LUCENE-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3393: Attachment: LUCENE-3393.patch Renamed - EasySimilarity to SimilarityBase - EasyStats to BasicStats - Easy*DocScorer to Basic*DocScorer > Rename EasySimilarity to SimilarityBase > --- > > Key: LUCENE-3393 > URL: https://issues.apache.org/jira/browse/LUCENE-3393 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs, modules/examples >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Fix For: flexscoring branch > > Attachments: LUCENE-3393.patch > > Original Estimate: 1h > Remaining Estimate: 1h > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class
[ https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3391: Attachment: LUCENE-3391.patch Fixed the issues you mentioned. > Make EasySimilarityProvider a full-fledged class > - > > Key: LUCENE-3391 > URL: https://issues.apache.org/jira/browse/LUCENE-3391 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank, similarity > Fix For: flexscoring branch > > Attachments: LUCENE-3391.patch, LUCENE-3391.patch, LUCENE-3391.patch, > LUCENE-3391.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good > candidate for a full-fledged class. Both {{DefaultSimilarity}} and > {{BM25Similarity}} have their own providers, which are effectively the > same,so I don't see why we couldn't add one generic provider for convenience. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3393) Rename EasySimilarity to SimilarityBase
Rename EasySimilarity to SimilarityBase --- Key: LUCENE-3393 URL: https://issues.apache.org/jira/browse/LUCENE-3393 Project: Lucene - Java Issue Type: Sub-task Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class
[ https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088743#comment-13088743 ] David Mark Nemeskey commented on LUCENE-3391: - (1) I was also hesitant to add the generics, because I wasn't sure about the warnings it gave. So I'll remove that happily. (2) And I guess the method parameter in queryNorm? (3) I'm pretty bad at naming things, so I'd take your advice in this. :) Is BasicSimilarityProvider OK? > Make EasySimilarityProvider a full-fledged class > - > > Key: LUCENE-3391 > URL: https://issues.apache.org/jira/browse/LUCENE-3391 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank, similarity > Fix For: flexscoring branch > > Attachments: LUCENE-3391.patch, LUCENE-3391.patch, LUCENE-3391.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good > candidate for a full-fledged class. Both {{DefaultSimilarity}} and > {{BM25Similarity}} have their own providers, which are effectively the > same,so I don't see why we couldn't add one generic provider for convenience. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class
[ https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3391: Attachment: LUCENE-3391.patch Got rid of BM25SimilarityProvider. > Make EasySimilarityProvider a full-fledged class > - > > Key: LUCENE-3391 > URL: https://issues.apache.org/jira/browse/LUCENE-3391 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank, similarity > Fix For: flexscoring branch > > Attachments: LUCENE-3391.patch, LUCENE-3391.patch, LUCENE-3391.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good > candidate for a full-fledged class. Both {{DefaultSimilarity}} and > {{BM25Similarity}} have their own providers, which are effectively the > same,so I don't see why we couldn't add one generic provider for convenience. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3387) Get javadoc for the similarities package in shape
[ https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3387: Attachment: LUCENE-3387.patch Fixed a typo. > Get javadoc for the similarities package in shape > - > > Key: LUCENE-3387 > URL: https://issues.apache.org/jira/browse/LUCENE-3387 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, javadoc > Fix For: flexscoring branch > > Attachments: LUCENE-3387.patch, LUCENE-3387.patch > > > 1. Create a package.html in the similarities package. > 2. Update the javadoc of the search package (package.html mentions > Similarity)? > 3. Compile the javadoc to see if there are any warnings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class
[ https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3391: Attachment: LUCENE-3391.patch Hinted at EasySimilarityProvider in the package javadoc. > Make EasySimilarityProvider a full-fledged class > - > > Key: LUCENE-3391 > URL: https://issues.apache.org/jira/browse/LUCENE-3391 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank, similarity > Fix For: flexscoring branch > > Attachments: LUCENE-3391.patch, LUCENE-3391.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good > candidate for a full-fledged class. Both {{DefaultSimilarity}} and > {{BM25Similarity}} have their own providers, which are effectively the > same,so I don't see why we couldn't add one generic provider for convenience. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class
[ https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3391: Attachment: LUCENE-3391.patch EasySimilarityProvider added. > Make EasySimilarityProvider a full-fledged class > - > > Key: LUCENE-3391 > URL: https://issues.apache.org/jira/browse/LUCENE-3391 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank, similarity > Fix For: flexscoring branch > > Attachments: LUCENE-3391.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good > candidate for a full-fledged class. Both {{DefaultSimilarity}} and > {{BM25Similarity}} have their own providers, which are effectively the > same,so I don't see why we couldn't add one generic provider for convenience. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class
Make EasySimilarityProvider a full-fledged class - Key: LUCENE-3391 URL: https://issues.apache.org/jira/browse/LUCENE-3391 Project: Lucene - Java Issue Type: Sub-task Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class
[ https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3391: Component/s: (was: modules/examples) Description: The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good candidate for a full-fledged class. Both {{DefaultSimilarity}} and {{BM25Similarity}} have their own providers, which are effectively the same,so I don't see why we couldn't add one generic provider for convenience. Labels: gsoc gsoc2011 rank similarity (was: ) > Make EasySimilarityProvider a full-fledged class > - > > Key: LUCENE-3391 > URL: https://issues.apache.org/jira/browse/LUCENE-3391 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank, similarity > Fix For: flexscoring branch > > Original Estimate: 1h > Remaining Estimate: 1h > > The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good > candidate for a full-fledged class. Both {{DefaultSimilarity}} and > {{BM25Similarity}} have their own providers, which are effectively the > same,so I don't see why we couldn't add one generic provider for convenience. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3387) Get javadoc for the similarities package in shape
[ https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088622#comment-13088622 ] David Mark Nemeskey commented on LUCENE-3387: - The {{similarities}} package shows up in the 'core', even though it is classified as 'contrib' for javadocs-all. However, since the class {{Similarity}} is now in {{similarities}}, shouldn't it be core as well? > Get javadoc for the similarities package in shape > - > > Key: LUCENE-3387 > URL: https://issues.apache.org/jira/browse/LUCENE-3387 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, javadoc > Fix For: flexscoring branch > > Attachments: LUCENE-3387.patch > > > 1. Create a package.html in the similarities package. > 2. Update the javadoc of the search package (package.html mentions > Similarity)? > 3. Compile the javadoc to see if there are any warnings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3387) Get javadoc for the similarities package in shape
[ https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3387: Attachment: LUCENE-3387.patch 1. Fixed the javadoc warnings in EasySimilarity. 2. Okapi paper reference added to BM25Similarity. 3. Added package-level javadoc for the similarities package. 4. Moved the "Changing Similarities" part from search to similarities. > Get javadoc for the similarities package in shape > - > > Key: LUCENE-3387 > URL: https://issues.apache.org/jira/browse/LUCENE-3387 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, javadoc > Fix For: flexscoring branch > > Attachments: LUCENE-3387.patch > > > 1. Create a package.html in the similarities package. > 2. Update the javadoc of the search package (package.html mentions > Similarity)? > 3. Compile the javadoc to see if there are any warnings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088614#comment-13088614 ] David Mark Nemeskey commented on LUCENE-3386: - I decided agains step 5, at least for now, so I propose we resolve this issue. > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch, > LUCENE-3386.patch, LUCENE-3386.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Move the similarities package to src/ > 4. Move all sims (inc. Similarity) to similarities > 5. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3386: Attachment: LUCENE-3386.patch Moved all sims to similarities. > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch, > LUCENE-3386.patch, LUCENE-3386.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Move the similarities package to src/ > 4. Move all sims (inc. Similarity) to similarities > 5. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3387) Get javadoc for the similarities package in shape
[ https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3387: Component/s: (was: modules/examples) Due Date: 21/Aug/11 Description: 1. Create a package.html in the similarities package. 2. Update the javadoc of the search package (package.html mentions Similarity)? 3. Compile the javadoc to see if there are any warnings. Labels: gsoc gsoc2011 javadoc (was: ) > Get javadoc for the similarities package in shape > - > > Key: LUCENE-3387 > URL: https://issues.apache.org/jira/browse/LUCENE-3387 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, javadoc > Fix For: flexscoring branch > > > 1. Create a package.html in the similarities package. > 2. Update the javadoc of the search package (package.html mentions > Similarity)? > 3. Compile the javadoc to see if there are any warnings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3387) Get javadoc for the similarities package in shape
Get javadoc for the similarities package in shape - Key: LUCENE-3387 URL: https://issues.apache.org/jira/browse/LUCENE-3387 Project: Lucene - Java Issue Type: Sub-task Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3386: Attachment: LUCENE-3386.patch Moved the similarities package to src; only testing-related classes remain test. > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch, > LUCENE-3386.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Move the similarities package to src/ > 4. Move all sims (inc. Similarity) to similarities > 5. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088249#comment-13088249 ] David Mark Nemeskey edited comment on LUCENE-3386 at 8/20/11 7:10 PM: -- Moved the similarities package to src; only testing-related classes remain in test. was (Author: david_nemeskey): Moved the similarities package to src; only testing-related classes remain test. > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch, > LUCENE-3386.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Move the similarities package to src/ > 4. Move all sims (inc. Similarity) to similarities > 5. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3386: Description: Steps: 1. Decide if {{MockLMSimilarity}} is needed at all (we have {{LMDirichletSimilarity}}) 2. Move the classes to the similarities package 3. Move the similarities package to src/ 4. Move all sims (inc. Similarity) to similarities 5. Make MockBM25Similarity a subclass of EasySimilarity? was: Steps: 1. Decide if {{MockLMSimilarity}} is needed at all (we have {{LMDirichletSimilarity}}) 2. Move the classes to the similarities package 3. Make MockBM25Similarity a subclass of EasySimilarity? > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Move the similarities package to src/ > 4. Move all sims (inc. Similarity) to similarities > 5. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3386: Attachment: LUCENE-3386.patch Apparently mv doesn't refactor the code. Who would have thought...? > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3386: Attachment: LUCENE-3386.patch Renamed MockBM25Similarity and its provider to BM25... and moved them to the similarities package. All that's left is to decide whether they should be rebased on EasySimilarity or not. > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Attachments: LUCENE-3386.patch, LUCENE-3386.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3386: Attachment: LUCENE-3386.patch Removed MockLMSimilarity and its provider. > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Attachments: LUCENE-3386.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3385) EasySimilarity to interpret document length as float
[ https://issues.apache.org/jira/browse/LUCENE-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3385: Labels: gsoc gsoc2011 (was: ) > EasySimilarity to interpret document length as float > > > Key: LUCENE-3385 > URL: https://issues.apache.org/jira/browse/LUCENE-3385 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs, modules/examples >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Fix For: flexscoring branch > > Attachments: LUCENE-3385.patch > > Original Estimate: 1h > Remaining Estimate: 1h > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
[ https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3386: Component/s: (was: general/javadocs) (was: modules/examples) Description: Steps: 1. Decide if {{MockLMSimilarity}} is needed at all (we have {{LMDirichletSimilarity}}) 2. Move the classes to the similarities package 3. Make MockBM25Similarity a subclass of EasySimilarity? Labels: gsoc gsoc2011 rank (was: ) > Integrate MockBM25Similarity and MockLMSimilarity into the framework > > > Key: LUCENE-3386 > URL: https://issues.apache.org/jira/browse/LUCENE-3386 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011, rank > Fix For: flexscoring branch > > Original Estimate: 4h > Remaining Estimate: 4h > > Steps: > 1. Decide if {{MockLMSimilarity}} is needed at all (we have > {{LMDirichletSimilarity}}) > 2. Move the classes to the similarities package > 3. Make MockBM25Similarity a subclass of EasySimilarity? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework
Integrate MockBM25Similarity and MockLMSimilarity into the framework Key: LUCENE-3386 URL: https://issues.apache.org/jira/browse/LUCENE-3386 Project: Lucene - Java Issue Type: Sub-task Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3385) EasySimilarity to interpret document length as float
[ https://issues.apache.org/jira/browse/LUCENE-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3385: Attachment: LUCENE-3385.patch docLen changed from int to float. > EasySimilarity to interpret document length as float > > > Key: LUCENE-3385 > URL: https://issues.apache.org/jira/browse/LUCENE-3385 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, general/javadocs, modules/examples >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Fix For: flexscoring branch > > Attachments: LUCENE-3385.patch > > Original Estimate: 1h > Remaining Estimate: 1h > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Ah, I forgot to modify the explain() methods to handle the omitted norms case in the same way as score(). Fixed it now. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3385) EasySimilarity to interpret document length as float
EasySimilarity to interpret document length as float Key: LUCENE-3385 URL: https://issues.apache.org/jira/browse/LUCENE-3385 Project: Lucene - Java Issue Type: Sub-task Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088182#comment-13088182 ] David Mark Nemeskey commented on LUCENE-3357: - Robert: with this, all EasySimilarity-based classes have been tested. Do you think we could close this issue? > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Correctness tests added for the rest of the DFR sims. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087578#comment-13087578 ] David Mark Nemeskey edited comment on LUCENE-3357 at 8/19/11 6:59 AM: -- bq. I would just shoot for 'breadth' as far as across the different sims? What do you mean by 'breadth'? Unit and integration tests (well... the "heart" test) already cover all the sims, and this includes score vs explanation comparison. As for the correctness tests, both LM and IB sims are tested, as well as four DFR methods. I can write tests for the three missing DFR sims, but that is as much breadth as I can add. Or do you have something else in mind? was (Author: david_nemeskey): bq I would just shoot for 'breadth' as far as across the different sims? What do you mean by 'breadth'? Unit and integration tests (well... the "heart" test) already cover all the sims, and this includes score vs explanation comparison. As for the correctness tests, both LM and IB sims are tested, as well as four DFR methods. I can write tests for the three missing DFR sims, but that is as much breadth as I can add. Or do you have something else in mind? > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087578#comment-13087578 ] David Mark Nemeskey commented on LUCENE-3357: - bq I would just shoot for 'breadth' as far as across the different sims? What do you mean by 'breadth'? Unit and integration tests (well... the "heart" test) already cover all the sims, and this includes score vs explanation comparison. As for the correctness tests, both LM and IB sims are tested, as well as four DFR methods. I can write tests for the three missing DFR sims, but that is as much breadth as I can add. Or do you have something else in mind? > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch I've added the correctness tests (is there a better name for these?). Do you think that I should re-write the ones where the computation of the gold value is missing? Or the other way around? :) > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Fixed the omit norms case. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084096#comment-13084096 ] David Mark Nemeskey edited comment on LUCENE-3357 at 8/12/11 1:11 PM: -- D: good question, I think if F > tfn, then D > 0, but I guess I have to prove it (and fix it if it isn't). Could you tell me which sims were affected negatively? freq: I didn't know about that! Still, I want to provide not "plausible", but at least "safe" statistics in this case. You didn't touch docFreq and numberOfDocuments, so I assumed at least these two are filled with the actual values, is that so? was (Author: david_nemeskey): D: good question, I think if F > tfn, then D > 0, but I guess I have to prove it (and fix it if it isn't). Could you tell me which sims were affected negatively? freq: I didn't know about that! Still, I want to provide not "plausible", but at least "safe" statistics in this case. You didn't touch docFreq and numberOfDocuments, so I assumed at least these two are filled with actual values, is that so? > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084098#comment-13084098 ] David Mark Nemeskey commented on LUCENE-3220: - Robert: Since we use [LUCENE-3357|https://issues.apache.org/jira/browse/LUCENE-3357] for testing & bug fixing, I propose we close this issue. If we decide to implement other methods as well, we can do it under a new issue. Or do you have something else in mind (such as to rename EasySimilarity to SimilarityBase)? > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084096#comment-13084096 ] David Mark Nemeskey commented on LUCENE-3357: - D: good question, I think if F > tfn, then D > 0, but I guess I have to prove it (and fix it if it isn't). Could you tell me which sims were affected negatively? freq: I didn't know about that! Still, I want to provide not "plausible", but at least "safe" statistics in this case. You didn't touch docFreq and numberOfDocuments, so I assumed at least these two are filled with actual values, is that so? > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Robert: I modified the nocommits a bit to provide input to the Similarities that looks somewhat plausible. I think it's better to avoid situations where e.g. docLen < freq to minimize the chance of error. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083194#comment-13083194 ] David Mark Nemeskey edited comment on LUCENE-3357 at 8/11/11 4:08 PM: -- Robert: I modified the nocommits a bit to provide input to the Similarities that looks somewhat plausible. I think it's better to avoid situations where e.g. docLen < freq to minimize the chance of error. Please let me know what you think of these modifications; if they're OK, I'll nuke the nocommits. was (Author: david_nemeskey): Robert: I modified the nocommits a bit to provide input to the Similarities that looks somewhat plausible. I think it's better to avoid situations where e.g. docLen < freq to minimize the chance of error. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Did something so that D and P (the binomial models) return only positive scores, but neither is it theoretically sound, nor do I like it much. Robert: could you test D please, to see how the results are affected? > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Fixed {{LMDirichletSimilarity}} (see my last comment). > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083080#comment-13083080 ] David Mark Nemeskey commented on LUCENE-3357: - Apparently the Dirichlet method returns a negative score if the tf / docLen < corpusTf / corpusLen. Unfortunately the negative number can be arbitrarily large, so it's not as easy as adding a constant to the score. This of course makes sense if all documents are scored, as the function is monotone and consequently documents, whose tf is 0, will always be ranked lower than those that contain the word. But this is not how IR engines work. Having said that, I believe that we could simulate such a system. I don't know exactly how the query architecture works, but I presume the clauses that don't match a document are assigned a zero value. Now instead of this zero, the Scorer (or whatever class does this) could ask for a default value from the Similarity. In this case LMDirichletSimilarity could return score(stats, 0, Integer.MAX_VALUE), which is somewhere around -12. If we don't do this, we have three options: 1. add score(stats, 0, Integer.MAX_VALUE) to the score 2. if (score < 0) return 0 3. add corpusTf / corpusLen * docLen to tf All ensure a positive score, but also each has its own disadvantage. 1. adds a pretty big constant to the score, which may not play well with the other parts of the query 2. some documents that contain the term get the same 0 score as documents that don't (though I cannot say this is not in line with the LM approach) 3. this introduces a transformation that is difficult to characterize For the time being, I'll go with 2, but we have to discuss this. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082702#comment-13082702 ] David Mark Nemeskey edited comment on LUCENE-3357 at 8/10/11 9:51 PM: -- Fixed NaN and infinite scores in DFR and IB; all that's left is to fix the negative scores as well. ... and everything else discussed earlier. was (Author: david_nemeskey): Fixed NaN and infinite scores in DFR and IB; all that's left is to fix the negative scores as well. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Fixed NaN and infinite scores in DFR and IB; all that's left is to fix the negative scores as well. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082261#comment-13082261 ] David Mark Nemeskey commented on LUCENE-3357: - Robert: I'm on the Nan/Inf problems. As for the negative score, I'll leave it there for the time being, these Similarities should always return positive scores. I don't feel very confident about this test myself, so I guess I'll remove it (or at least make it optional) once all tests are successful. As for the PreFlex codec, I must admit I am not familiar with it, so I would be grateful for a few pointers. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Added a spoof version for all search-related classes that are necessary to properly fill the EasyStats object in EasySimilarity subclasses. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081942#comment-13081942 ] David Mark Nemeskey commented on LUCENE-3357: - Some of the tests fail at certain Similarities, so those have to be fixed as well. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Fixed a bug in TestEasySimilarity that prevented Similarities that use a subclass of EasyStats to be tested. Also, modified EasyStats so that totalBoost is set to the value of queryBoost in the constructor. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Fixed integer division bug in BasicModelG. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch License added. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Got rid of all but one nocommits. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Rebased the changes on the current state of trunk. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, > LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Added discountOverlaps to EasySimilarity. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Unit tests added. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch * EasySimilarity subclasses return their names in toString() * The two test cases return the name of the Similarity that failed the test. > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch, LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080416#comment-13080416 ] David Mark Nemeskey edited comment on LUCENE-3357 at 8/6/11 3:52 PM: - Integration tests added. There are two of them; however, ant test runs only one? was (Author: david_nemeskey): Integration tests added. There are two of them; however, ant test only runs one? > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Attachment: LUCENE-3357.patch Integration tests added. There are two of them; however, ant test only runs one? > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > Attachments: LUCENE-3357.patch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Added a short explanation on the parameter for the Jelinek-Mercer method. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Done. Actually, I wanted to implement the norm table in the way you said, but somehow forgot about it. Two questions remain on my side: * the one about discountOverlaps (see above) * what kind of index-time boosts do people usually use? Too big a boost might cause problems if we just divide the length with it. Maybe we should take the logarithm or sth like that? > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Removed reflection from IBSimilarity. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Deleted the accidentally forgot abstract modifier from the Distribution classes. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079555#comment-13079555 ] David Mark Nemeskey edited comment on LUCENE-3220 at 8/4/11 8:04 PM: - Deleted the accidentally forgotten abstract modifier from the Distribution classes. was (Author: david_nemeskey): Deleted the accidentally forgot abstract modifier from the Distribution classes. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch EasySimilarity now computes norms in the same way as DefaultSimilarity. Actually not exactly the same way, as I have not yet added the discountOverlaps property. I think it would be a good idea for EasySimilarity as well (it is for phrases, right), what do you reckon? I also wrote a quick test to see which norm (length directly or 1/sqrt) is closer to the original value and it seems that the direct one is usually much closer (RMSE is 0.09689688608375747 vs 0.23787634482532286). Of course, I know it is much more important that the new Similarities can use existing indices. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Labels: gsoc gsoc2011 (was: ) > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3357: Labels: gsoc gsoc2011 test (was: gsoc gsoc2011) > Unit and integration test cases for the new Similarities > > > Key: LUCENE-3357 > URL: https://issues.apache.org/jira/browse/LUCENE-3357 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Labels: gsoc, gsoc2011, test > Fix For: flexscoring branch > > > Write test cases to test the new Similarities added in > [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of > test cases will be created: > * unit tests, in which mock statistics are provided to the Similarities and > the score is validated against hand calculations; > * integration tests, in which a small collection is indexed and then > searched using the Similarities. > Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Component/s: core/query/scoring Labels: gsoc gsoc2011 (was: gsoc) > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/query/scoring, core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc, gsoc2011 > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3357) Unit and integration test cases for the new Similarities
Unit and integration test cases for the new Similarities Key: LUCENE-3357 URL: https://issues.apache.org/jira/browse/LUCENE-3357 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Priority: Minor Fix For: flexscoring branch Write test cases to test the new Similarities added in [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of test cases will be created: * unit tests, in which mock statistics are provided to the Similarities and the score is validated against hand calculations; * integration tests, in which a small collection is indexed and then searched using the Similarities. Performance tests will be performed in a separate issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Added norm decoding table to EasySimilarity, and removed sumTotalFreq. Sorry I could only upload this patch now but I didn't have time to work on Lucene the last week. As I see, all the problems you mentioned have been corrected, so maybe we can go on with the review? > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070654#comment-13070654 ] David Mark Nemeskey commented on LUCENE-3220: - I think I realized what I wanted with numberOfFieldTokens. I was afraid that sumTotalTermFreq is affected by norms / index time boost / etc, and I wanted to make numberOfFieldTokens to unaffected by those (I don't know now how); only I forgot to do so. But if sumTotalTermFreq is really just the number of tokens in the field, I will delete one of them. Not sure which, because for me numberOfFieldTokens seems a more descriptive name than sumTotalTermFreq, but the latter is used everywhere in Lucene. May I ask your opinion on this question? > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Fixed two of the issues you mentioned: * Apache license header added to all files in the similarities package; * cleaned up the constructor of DFRSimilarity and added a few new ones. I have not yet moved the NoNormalization and NoAfterEffect classes to their own files, because I feel a bit uncomfortable about the naming, since it's different from that of the other classes, e.g. NormalizationH2 vs NoNormalization. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Made the score() and explain() methods in Similarity components final. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Explanation added to LM models; query boost added. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Description: With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current "mock" implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ was: With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current "mock" implementation might be OK * _LM_ * _DFR_ Done: > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > Done: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Added LMSimilarity so that the two LM methods have a common parent. It also defines the CollectionModel interface which computes p(w|C) in a pluggable way (and only once per query, though I am not sure this improves performance as I need a cast in score()). > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch * Fixed #1 * Added a totalBoost to EasySimilarity, and a getter method -- noone uses it yet * Added basic implementations for the Jelinek-Mercer and the Dirichlet LM methods. As for the last one: the implementation is very basic now, I want to factor a few things out (e.g. p(w|C) to LMStats, possibly in a pluggable way so ppl can implement it however they want). It also doesn't seem right to have the same LM method implemented twice (both as MockLMSimilarity and here), so I'll take a look to see if I can merge those two. Finally, I am wondering whether I should implement the absolute discounting method, which, according to the paper, seems inferior to the Jelinek-Mercer and Dirichlet methods. Right now I am more on the "no" side. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch * log2() moved from DFRSimilarity to EasySimilarity, * changed DFRSimilarity so that it constructor does not use reflection. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Fixed a few things in MockBM25Similarity. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Information-based model framework due to Clinchant and Gaussier added. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Explanation-handling added to EasySimilarity and DFRSimilarity. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Made the signature of EasySimilarity.score() a bit saner. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Implementation of the DFR framework added. Lots of nocommits, though. I things to think about: * lots of (float) conversions. Maybe the inner API (BasicModel, etc.) could use doubles? According to my experience, double is faster anyway, at least on 64bit architectures * I am not overly happy about the naming scheme, i.e. BasicModelBE, etc. Maybe we should do it the same way as in Terrier, with a basicmodel package and class names like BE? * A regular SimilarityProvider implementation won't play well with DFRSimilarity, in case the user wants to use several different setups. Actually, this is a problem for all similarities that have parameters (e.g. BM25 with b and k). Also, I think we need that NormConverter we talked earlier on irc, so that the Similarities can run on any index. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch EasySimilarity added. Lots of questions and nocommit in the code. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Comment: was deleted (was: Done.) > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Done. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: (was: LUCENE-3220.patch) > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Done. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, > LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Oh, sorry, how lame of me :( Actually I am working now on a different machine than the one I usually do, so that's why I made those mistakes. Anyhow, I have fixed them. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052025#comment-13052025 ] David Mark Nemeskey commented on LUCENE-3220: - * I was wondering about that too -- actually docNo is a mistake, it should have been noDocs or noOfDocs anyway, but I guess I'll just go with numberOfDocuments. * I'll put a nocommit there for the time being, and if no sims use it, I'll just remove it from the Stats. Terrier has it, though, so I guess there should be at least one method that depends on it. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch EasyStats object added. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Issue Type: Sub-task (was: New Feature) Parent: LUCENE-2959 > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Description: With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current "mock" implementation might be OK * _LM_ * _DFR_ Done: was: With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * `EasyStats`: contains all statistics that might be relevant for a ranking algorithm * `EasySimilarity`: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current "mock" implementation might be OK * _LM_ * _DFR_ Done: > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3220) Implement various ranking models as Similarities
Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * `EasyStats`: contains all statistics that might be relevant for a ranking algorithm * `EasySimilarity`: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current "mock" implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org