[jira] [Updated] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-18 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated SOLR-2754:
--

Attachment: SOLR-2754.patch

Done.

> create Solr similarity factories for new ranking algorithms
> ---
>
> Key: SOLR-2754
> URL: https://issues.apache.org/jira/browse/SOLR-2754
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Attachments: SOLR-2754.patch, SOLR-2754.patch
>
>
> To make it easy to use some of the new ranking algorithms, we should add 
> factories to solr:
> * for parametric models like LM and BM25 so that parameters can be set from 
> schema.xml
> * for framework models like DFR and IB, so that different basic 
> models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-17 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107069#comment-13107069
 ] 

David Mark Nemeskey commented on SOLR-2754:
---

bq. Well, we can do both: we can provide these basic parameters as default 
values to be friendly, but at the same time in the test or example xml 
configurations that use these, our examples can have the parameters set.

That's a good idea. I could modify the patch if you want to, and also break the 
long lines into two in the meantime.

> create Solr similarity factories for new ranking algorithms
> ---
>
> Key: SOLR-2754
> URL: https://issues.apache.org/jira/browse/SOLR-2754
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Attachments: SOLR-2754.patch
>
>
> To make it easy to use some of the new ranking algorithms, we should add 
> factories to solr:
> * for parametric models like LM and BM25 so that parameters can be set from 
> schema.xml
> * for framework models like DFR and IB, so that different basic 
> models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-16 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106795#comment-13106795
 ] 

David Mark Nemeskey commented on SOLR-2754:
---

bq. Alternative, another idea would be for all 'parametric' models to require 
the parameter? ... Part of me likes this solution: if you are using a 
parametric model then it requires you to think about it?

I can understand the reasoning behind this idea. On the other hand, for some 
models, the parameter has a value that's optimal in a wide range of cases. In 
such cases, I think it we could make the life of the user easier by falling 
back to this value. (Actually, that's why {{LMJelinekMercerSimilarity}} does 
not have a default constructor; there is no single parameter value that is 
kind-of-optimal in all cases).

bq. But i started thinking about this, say I created NormalizationRob, and it 
wants a bunch of parameters...

Yes, I know, it'd be a bit difficult to support that... maybe if all 
Similarities and models had a constructor with a map as a parameter? I'm not 
sure we want that, though.

bq. I think the intent here is to support all of lucene-core's capabilities?

In that case let's forget reflection for now.

> create Solr similarity factories for new ranking algorithms
> ---
>
> Key: SOLR-2754
> URL: https://issues.apache.org/jira/browse/SOLR-2754
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Attachments: SOLR-2754.patch
>
>
> To make it easy to use some of the new ranking algorithms, we should add 
> factories to solr:
> * for parametric models like LM and BM25 so that parameters can be set from 
> schema.xml
> * for framework models like DFR and IB, so that different basic 
> models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-16 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106683#comment-13106683
 ] 

David Mark Nemeskey commented on SOLR-2754:
---

Robert, I've reviewed the patch. Even though I don't have any experience with 
Solr, the code is very clear, well documented and easy to understand. I have 
the following observations (or questions, more like):

1. {{LMDirichletSimilarity}} has a mu-less constructor. Maybe we could avoid 
defining a constant in two places if we used that? E.g.
{code}
mu = params.getFloat("mu");
...

LMDirichletSimilarity sim = (mu != null) ? new LMDirichletSimilarity(mu)
 : new LMDirichletSimilarity();
{code}
Same goes for H3 and Z.

2. I think it is a nice feature of the new framework that the user can create 
new basic models, normalizations, distributions, etc. and just plug them in to 
{{DFRSimilarity}} or {{IBSimilarity}}. However, these factories can only handle 
those that we have defined ourselves. Wouldn't it be good if we could 
instantiate custom classes via reflection? It could work similarily as in 
Terrier: keep the current code for core models, and use reflection if the user 
specifies a (fully specified) classname.

3. I don't know the Lucene/Solr conventions for line length. There are some 
rather long lines in IB and DFR, but maybe its not a problem?

> create Solr similarity factories for new ranking algorithms
> ---
>
> Key: SOLR-2754
> URL: https://issues.apache.org/jira/browse/SOLR-2754
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Attachments: SOLR-2754.patch
>
>
> To make it easy to use some of the new ranking algorithms, we should add 
> factories to solr:
> * for parametric models like LM and BM25 so that parameters can be set from 
> schema.xml
> * for framework models like DFR and IB, so that different basic 
> models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-08-25 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090872#comment-13090872
 ] 

David Mark Nemeskey commented on LUCENE-2959:
-

Hi Robert,

I would very much like to run this test on the other sims as well. How do I do 
that?

David



> [GSoC] Implementing State of the Art Ranking for Lucene
> ---
>
> Key: LUCENE-2959
> URL: https://issues.apache.org/jira/browse/LUCENE-2959
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/query/scoring, general/javadocs, modules/examples
>Reporter: David Mark Nemeskey
>Assignee: Robert Muir
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: flexscoring branch
>
> Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, 
> proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the 
> architecture is
> tailored specically to VSM, which makes the addition of new ranking functions 
> a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to 
> implement a
> query architecture with pluggable ranking functions.
> The wiki page for the project can be found at 
> http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-08-23 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089449#comment-13089449
 ] 

David Mark Nemeskey commented on LUCENE-2959:
-

Robert: maybe we could resolve this issue as well? Once we decide what to do 
with 3173 -- perhaps a won'tfix?

> [GSoC] Implementing State of the Art Ranking for Lucene
> ---
>
> Key: LUCENE-2959
> URL: https://issues.apache.org/jira/browse/LUCENE-2959
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/query/scoring, general/javadocs, modules/examples
>Reporter: David Mark Nemeskey
>Assignee: Robert Muir
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: flexscoring branch
>
> Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, 
> proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the 
> architecture is
> tailored specically to VSM, which makes the addition of new ranking functions 
> a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to 
> implement a
> query architecture with pluggable ranking functions.
> The wiki page for the project can be found at 
> http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3387) Get javadoc for the similarities package in shape

2011-08-23 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089443#comment-13089443
 ] 

David Mark Nemeskey commented on LUCENE-3387:
-

bq. This is because of an out of date regexp in the javadocs construction.

I've found that, I just didn't know what to make of it. Since as far as I know 
a similarities package hadn't existed before I added the new sims, I assumed it 
was there on purpose.

> Get javadoc for the similarities package in shape
> -
>
> Key: LUCENE-3387
> URL: https://issues.apache.org/jira/browse/LUCENE-3387
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, javadoc
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3387.patch, LUCENE-3387.patch
>
>
> 1. Create a package.html in the similarities package.
> 2. Update the javadoc of the search package (package.html mentions 
> Similarity)?
> 3. Compile the javadoc to see if there are any warnings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3393) Rename EasySimilarity to SimilarityBase

2011-08-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3393:


Attachment: LUCENE-3393.patch

Renamed
- EasySimilarity to SimilarityBase
- EasyStats to BasicStats
- Easy*DocScorer to Basic*DocScorer


> Rename EasySimilarity to SimilarityBase
> ---
>
> Key: LUCENE-3393
> URL: https://issues.apache.org/jira/browse/LUCENE-3393
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs, modules/examples
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3393.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class

2011-08-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3391:


Attachment: LUCENE-3391.patch

Fixed the issues you mentioned.

> Make EasySimilarityProvider a full-fledged class 
> -
>
> Key: LUCENE-3391
> URL: https://issues.apache.org/jira/browse/LUCENE-3391
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank, similarity
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3391.patch, LUCENE-3391.patch, LUCENE-3391.patch, 
> LUCENE-3391.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good 
> candidate for a full-fledged class. Both {{DefaultSimilarity}} and 
> {{BM25Similarity}} have their own providers, which are effectively the 
> same,so I don't see why we couldn't add one generic provider for convenience.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3393) Rename EasySimilarity to SimilarityBase

2011-08-22 Thread David Mark Nemeskey (JIRA)
Rename EasySimilarity to SimilarityBase
---

 Key: LUCENE-3393
 URL: https://issues.apache.org/jira/browse/LUCENE-3393
 Project: Lucene - Java
  Issue Type: Sub-task
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class

2011-08-22 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088743#comment-13088743
 ] 

David Mark Nemeskey commented on LUCENE-3391:
-

(1) I was also hesitant to add the generics, because I wasn't sure about the 
warnings it gave. So I'll remove that happily.
(2) And I guess the method parameter in queryNorm?
(3) I'm pretty bad at naming things, so I'd take your advice in this. :) Is 
BasicSimilarityProvider OK?

> Make EasySimilarityProvider a full-fledged class 
> -
>
> Key: LUCENE-3391
> URL: https://issues.apache.org/jira/browse/LUCENE-3391
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank, similarity
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3391.patch, LUCENE-3391.patch, LUCENE-3391.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good 
> candidate for a full-fledged class. Both {{DefaultSimilarity}} and 
> {{BM25Similarity}} have their own providers, which are effectively the 
> same,so I don't see why we couldn't add one generic provider for convenience.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class

2011-08-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3391:


Attachment: LUCENE-3391.patch

Got rid of BM25SimilarityProvider.

> Make EasySimilarityProvider a full-fledged class 
> -
>
> Key: LUCENE-3391
> URL: https://issues.apache.org/jira/browse/LUCENE-3391
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank, similarity
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3391.patch, LUCENE-3391.patch, LUCENE-3391.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good 
> candidate for a full-fledged class. Both {{DefaultSimilarity}} and 
> {{BM25Similarity}} have their own providers, which are effectively the 
> same,so I don't see why we couldn't add one generic provider for convenience.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3387) Get javadoc for the similarities package in shape

2011-08-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3387:


Attachment: LUCENE-3387.patch

Fixed a typo.

> Get javadoc for the similarities package in shape
> -
>
> Key: LUCENE-3387
> URL: https://issues.apache.org/jira/browse/LUCENE-3387
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, javadoc
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3387.patch, LUCENE-3387.patch
>
>
> 1. Create a package.html in the similarities package.
> 2. Update the javadoc of the search package (package.html mentions 
> Similarity)?
> 3. Compile the javadoc to see if there are any warnings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class

2011-08-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3391:


Attachment: LUCENE-3391.patch

Hinted at EasySimilarityProvider in the package javadoc.

> Make EasySimilarityProvider a full-fledged class 
> -
>
> Key: LUCENE-3391
> URL: https://issues.apache.org/jira/browse/LUCENE-3391
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank, similarity
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3391.patch, LUCENE-3391.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good 
> candidate for a full-fledged class. Both {{DefaultSimilarity}} and 
> {{BM25Similarity}} have their own providers, which are effectively the 
> same,so I don't see why we couldn't add one generic provider for convenience.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class

2011-08-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3391:


Attachment: LUCENE-3391.patch

EasySimilarityProvider added.

> Make EasySimilarityProvider a full-fledged class 
> -
>
> Key: LUCENE-3391
> URL: https://issues.apache.org/jira/browse/LUCENE-3391
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank, similarity
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3391.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good 
> candidate for a full-fledged class. Both {{DefaultSimilarity}} and 
> {{BM25Similarity}} have their own providers, which are effectively the 
> same,so I don't see why we couldn't add one generic provider for convenience.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class

2011-08-22 Thread David Mark Nemeskey (JIRA)
Make EasySimilarityProvider a full-fledged class 
-

 Key: LUCENE-3391
 URL: https://issues.apache.org/jira/browse/LUCENE-3391
 Project: Lucene - Java
  Issue Type: Sub-task
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3391) Make EasySimilarityProvider a full-fledged class

2011-08-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3391:


Component/s: (was: modules/examples)
Description: The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would 
be a good candidate for a full-fledged class. Both {{DefaultSimilarity}} and 
{{BM25Similarity}} have their own providers, which are effectively the same,so 
I don't see why we couldn't add one generic provider for convenience.
 Labels: gsoc gsoc2011 rank similarity  (was: )

> Make EasySimilarityProvider a full-fledged class 
> -
>
> Key: LUCENE-3391
> URL: https://issues.apache.org/jira/browse/LUCENE-3391
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank, similarity
> Fix For: flexscoring branch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The {{EasySimilarityProvider}} in {{TestEasySimilarity}} would be a good 
> candidate for a full-fledged class. Both {{DefaultSimilarity}} and 
> {{BM25Similarity}} have their own providers, which are effectively the 
> same,so I don't see why we couldn't add one generic provider for convenience.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3387) Get javadoc for the similarities package in shape

2011-08-22 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088622#comment-13088622
 ] 

David Mark Nemeskey commented on LUCENE-3387:
-

The {{similarities}} package shows up in the 'core', even though it is 
classified as 'contrib' for javadocs-all. However, since the class 
{{Similarity}} is now in {{similarities}}, shouldn't it be core as well?

> Get javadoc for the similarities package in shape
> -
>
> Key: LUCENE-3387
> URL: https://issues.apache.org/jira/browse/LUCENE-3387
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, javadoc
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3387.patch
>
>
> 1. Create a package.html in the similarities package.
> 2. Update the javadoc of the search package (package.html mentions 
> Similarity)?
> 3. Compile the javadoc to see if there are any warnings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3387) Get javadoc for the similarities package in shape

2011-08-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3387:


Attachment: LUCENE-3387.patch

1. Fixed the javadoc warnings in EasySimilarity.
2. Okapi paper reference added to BM25Similarity.
3. Added package-level javadoc for the similarities package.
4. Moved the "Changing Similarities" part from search to similarities.


> Get javadoc for the similarities package in shape
> -
>
> Key: LUCENE-3387
> URL: https://issues.apache.org/jira/browse/LUCENE-3387
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, javadoc
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3387.patch
>
>
> 1. Create a package.html in the similarities package.
> 2. Update the javadoc of the search package (package.html mentions 
> Similarity)?
> 3. Compile the javadoc to see if there are any warnings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-22 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088614#comment-13088614
 ] 

David Mark Nemeskey commented on LUCENE-3386:
-

I decided agains step 5, at least for now, so I propose we resolve this issue.

> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch, 
> LUCENE-3386.patch, LUCENE-3386.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Move the similarities package to src/
> 4. Move all sims (inc. Similarity) to similarities
> 5. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3386:


Attachment: LUCENE-3386.patch

Moved all sims to similarities.

> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch, 
> LUCENE-3386.patch, LUCENE-3386.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Move the similarities package to src/
> 4. Move all sims (inc. Similarity) to similarities
> 5. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3387) Get javadoc for the similarities package in shape

2011-08-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3387:


Component/s: (was: modules/examples)
   Due Date: 21/Aug/11
Description: 
1. Create a package.html in the similarities package.
2. Update the javadoc of the search package (package.html mentions Similarity)?
3. Compile the javadoc to see if there are any warnings.
 Labels: gsoc gsoc2011 javadoc  (was: )

> Get javadoc for the similarities package in shape
> -
>
> Key: LUCENE-3387
> URL: https://issues.apache.org/jira/browse/LUCENE-3387
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, javadoc
> Fix For: flexscoring branch
>
>
> 1. Create a package.html in the similarities package.
> 2. Update the javadoc of the search package (package.html mentions 
> Similarity)?
> 3. Compile the javadoc to see if there are any warnings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3387) Get javadoc for the similarities package in shape

2011-08-21 Thread David Mark Nemeskey (JIRA)
Get javadoc for the similarities package in shape
-

 Key: LUCENE-3387
 URL: https://issues.apache.org/jira/browse/LUCENE-3387
 Project: Lucene - Java
  Issue Type: Sub-task
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3386:


Attachment: LUCENE-3386.patch

Moved the similarities package to src; only testing-related classes remain test.

> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch, 
> LUCENE-3386.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Move the similarities package to src/
> 4. Move all sims (inc. Similarity) to similarities
> 5. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-20 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088249#comment-13088249
 ] 

David Mark Nemeskey edited comment on LUCENE-3386 at 8/20/11 7:10 PM:
--

Moved the similarities package to src; only testing-related classes remain in 
test.

  was (Author: david_nemeskey):
Moved the similarities package to src; only testing-related classes remain 
test.
  
> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch, 
> LUCENE-3386.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Move the similarities package to src/
> 4. Move all sims (inc. Similarity) to similarities
> 5. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3386:


Description: 
Steps:
1. Decide if {{MockLMSimilarity}} is needed at all (we have 
{{LMDirichletSimilarity}})
2. Move the classes to the similarities package
3. Move the similarities package to src/
4. Move all sims (inc. Similarity) to similarities
5. Make MockBM25Similarity a subclass of EasySimilarity?

  was:
Steps:
1. Decide if {{MockLMSimilarity}} is needed at all (we have 
{{LMDirichletSimilarity}})
2. Move the classes to the similarities package
3. Make MockBM25Similarity a subclass of EasySimilarity?


> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Move the similarities package to src/
> 4. Move all sims (inc. Similarity) to similarities
> 5. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3386:


Attachment: LUCENE-3386.patch

Apparently mv doesn't refactor the code. Who would have thought...?

> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3386.patch, LUCENE-3386.patch, LUCENE-3386.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3386:


Attachment: LUCENE-3386.patch

Renamed MockBM25Similarity and its provider to BM25... and moved them to the 
similarities package. All that's left is to decide whether they should be 
rebased on EasySimilarity or not.

> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3386.patch, LUCENE-3386.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3386:


Attachment: LUCENE-3386.patch

Removed MockLMSimilarity and its provider.

> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3386.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3385) EasySimilarity to interpret document length as float

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3385:


Labels: gsoc gsoc2011  (was: )

> EasySimilarity to interpret document length as float
> 
>
> Key: LUCENE-3385
> URL: https://issues.apache.org/jira/browse/LUCENE-3385
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs, modules/examples
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3385.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3386:


Component/s: (was: general/javadocs)
 (was: modules/examples)
Description: 
Steps:
1. Decide if {{MockLMSimilarity}} is needed at all (we have 
{{LMDirichletSimilarity}})
2. Move the classes to the similarities package
3. Make MockBM25Similarity a subclass of EasySimilarity?
 Labels: gsoc gsoc2011 rank  (was: )

> Integrate MockBM25Similarity and MockLMSimilarity into the framework
> 
>
> Key: LUCENE-3386
> URL: https://issues.apache.org/jira/browse/LUCENE-3386
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011, rank
> Fix For: flexscoring branch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Steps:
> 1. Decide if {{MockLMSimilarity}} is needed at all (we have 
> {{LMDirichletSimilarity}})
> 2. Move the classes to the similarities package
> 3. Make MockBM25Similarity a subclass of EasySimilarity?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3386) Integrate MockBM25Similarity and MockLMSimilarity into the framework

2011-08-20 Thread David Mark Nemeskey (JIRA)
Integrate MockBM25Similarity and MockLMSimilarity into the framework


 Key: LUCENE-3386
 URL: https://issues.apache.org/jira/browse/LUCENE-3386
 Project: Lucene - Java
  Issue Type: Sub-task
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3385) EasySimilarity to interpret document length as float

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3385:


Attachment: LUCENE-3385.patch

docLen changed from int to float.

> EasySimilarity to interpret document length as float
> 
>
> Key: LUCENE-3385
> URL: https://issues.apache.org/jira/browse/LUCENE-3385
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, general/javadocs, modules/examples
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3385.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Ah, I forgot to modify the explain() methods to handle the omitted norms case 
in the same way as score(). Fixed it now.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3385) EasySimilarity to interpret document length as float

2011-08-20 Thread David Mark Nemeskey (JIRA)
EasySimilarity to interpret document length as float


 Key: LUCENE-3385
 URL: https://issues.apache.org/jira/browse/LUCENE-3385
 Project: Lucene - Java
  Issue Type: Sub-task
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-20 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088182#comment-13088182
 ] 

David Mark Nemeskey commented on LUCENE-3357:
-

Robert: with this, all EasySimilarity-based classes have been tested. Do you 
think we could close this issue?

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Correctness tests added for the rest of the DFR sims.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-19 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087578#comment-13087578
 ] 

David Mark Nemeskey edited comment on LUCENE-3357 at 8/19/11 6:59 AM:
--

bq. I would just shoot for 'breadth' as far as across the different sims?
What do you mean by 'breadth'? Unit and integration tests (well... the "heart" 
test) already cover all the sims, and this includes score vs explanation 
comparison. As for the correctness tests, both LM and IB sims are tested, as 
well as four DFR methods. I can write tests for the three missing DFR sims, but 
that is as much breadth as I can add. Or do you have something else in mind?

  was (Author: david_nemeskey):
bq I would just shoot for 'breadth' as far as across the different sims?
What do you mean by 'breadth'? Unit and integration tests (well... the "heart" 
test) already cover all the sims, and this includes score vs explanation 
comparison. As for the correctness tests, both LM and IB sims are tested, as 
well as four DFR methods. I can write tests for the three missing DFR sims, but 
that is as much breadth as I can add. Or do you have something else in mind?
  
> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-18 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087578#comment-13087578
 ] 

David Mark Nemeskey commented on LUCENE-3357:
-

bq I would just shoot for 'breadth' as far as across the different sims?
What do you mean by 'breadth'? Unit and integration tests (well... the "heart" 
test) already cover all the sims, and this includes score vs explanation 
comparison. As for the correctness tests, both LM and IB sims are tested, as 
well as four DFR methods. I can write tests for the three missing DFR sims, but 
that is as much breadth as I can add. Or do you have something else in mind?

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-18 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

I've added the correctness tests (is there a better name for these?). Do you 
think that I should re-write the ones where the computation of the gold value 
is missing? Or the other way around? :)

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-12 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Fixed the omit norms case.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-12 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084096#comment-13084096
 ] 

David Mark Nemeskey edited comment on LUCENE-3357 at 8/12/11 1:11 PM:
--

D: good question, I think if F > tfn, then D > 0, but I guess I have to prove 
it (and fix it if it isn't).

Could you tell me which sims were affected negatively?

freq: I didn't know about that! Still, I want to provide not "plausible", but 
at least "safe" statistics in this case. You didn't touch docFreq and 
numberOfDocuments, so I assumed at least these two are filled with the actual 
values, is that so?

  was (Author: david_nemeskey):
D: good question, I think if F > tfn, then D > 0, but I guess I have to 
prove it (and fix it if it isn't).

Could you tell me which sims were affected negatively?

freq: I didn't know about that! Still, I want to provide not "plausible", but 
at least "safe" statistics in this case. You didn't touch docFreq and 
numberOfDocuments, so I assumed at least these two are filled with actual 
values, is that so?
  
> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-12 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084098#comment-13084098
 ] 

David Mark Nemeskey commented on LUCENE-3220:
-

Robert: Since we use 
[LUCENE-3357|https://issues.apache.org/jira/browse/LUCENE-3357] for testing & 
bug fixing, I propose we close this issue. If we decide to implement other 
methods as well, we can do it under a new issue. Or do you have something else 
in mind (such as to rename EasySimilarity to SimilarityBase)?

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-12 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084096#comment-13084096
 ] 

David Mark Nemeskey commented on LUCENE-3357:
-

D: good question, I think if F > tfn, then D > 0, but I guess I have to prove 
it (and fix it if it isn't).

Could you tell me which sims were affected negatively?

freq: I didn't know about that! Still, I want to provide not "plausible", but 
at least "safe" statistics in this case. You didn't touch docFreq and 
numberOfDocuments, so I assumed at least these two are filled with actual 
values, is that so?

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-11 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Robert: I modified the nocommits a bit to provide input to the Similarities 
that looks somewhat plausible. I think it's better to avoid situations where 
e.g. docLen < freq to minimize the chance of error.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-11 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083194#comment-13083194
 ] 

David Mark Nemeskey edited comment on LUCENE-3357 at 8/11/11 4:08 PM:
--

Robert: I modified the nocommits a bit to provide input to the Similarities 
that looks somewhat plausible. I think it's better to avoid situations where 
e.g. docLen < freq to minimize the chance of error.

Please let me know what you think of these modifications; if they're OK, I'll 
nuke the nocommits.

  was (Author: david_nemeskey):
Robert: I modified the nocommits a bit to provide input to the Similarities 
that looks somewhat plausible. I think it's better to avoid situations where 
e.g. docLen < freq to minimize the chance of error.
  
> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-11 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Did something so that D and P (the binomial models) return only positive 
scores, but neither is it theoretically sound, nor do I like it much.

Robert: could you test D please, to see how the results are affected?

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-11 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Fixed {{LMDirichletSimilarity}} (see my last comment).

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-11 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083080#comment-13083080
 ] 

David Mark Nemeskey commented on LUCENE-3357:
-

Apparently the Dirichlet method returns a negative score if the tf / docLen < 
corpusTf / corpusLen. Unfortunately the negative number can be arbitrarily 
large, so it's not as easy as adding a constant to the score. This of course 
makes sense if all documents are scored, as the function is monotone and 
consequently documents, whose tf is 0, will always be ranked lower than those 
that contain the word. But this is not how IR engines work.

Having said that, I believe that we could simulate such a system. I don't know 
exactly how the query architecture works, but I presume the clauses that don't 
match a document are assigned a zero value. Now instead of this zero, the 
Scorer (or whatever class does this) could ask for a default value from the 
Similarity. In this case LMDirichletSimilarity could return score(stats, 0, 
Integer.MAX_VALUE), which is somewhere around -12.

If we don't do this, we have three options:
1. add score(stats, 0, Integer.MAX_VALUE) to the score
2. if (score < 0) return 0
3. add corpusTf / corpusLen * docLen to tf

All ensure a positive score, but also each has its own disadvantage.
1. adds a pretty big constant to the score, which may not play well with the 
other parts of the query
2. some documents that contain the term get the same 0 score as documents that 
don't (though I cannot say this is not in line with the LM approach)
3. this introduces a transformation that is difficult to characterize

For the time being, I'll go with 2, but we have to discuss this.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-10 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082702#comment-13082702
 ] 

David Mark Nemeskey edited comment on LUCENE-3357 at 8/10/11 9:51 PM:
--

Fixed NaN and infinite scores in DFR and IB; all that's left is to fix the 
negative scores as well. ... and everything else discussed earlier.

  was (Author: david_nemeskey):
Fixed NaN and infinite scores in DFR and IB; all that's left is to fix the 
negative scores as well.
  
> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-10 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Fixed NaN and infinite scores in DFR and IB; all that's left is to fix the 
negative scores as well.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-10 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082261#comment-13082261
 ] 

David Mark Nemeskey commented on LUCENE-3357:
-

Robert: I'm on the Nan/Inf problems. As for the negative score, I'll leave it 
there for the time being, these Similarities should always return positive 
scores. I don't feel very confident about this test myself, so I guess I'll 
remove it (or at least make it optional) once all tests are successful.

As for the PreFlex codec, I must admit I am not familiar with it, so I would be 
grateful for a few pointers.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-10 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Added a spoof version for all search-related classes that are necessary to 
properly fill the EasyStats object in EasySimilarity subclasses.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-09 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081942#comment-13081942
 ] 

David Mark Nemeskey commented on LUCENE-3357:
-

Some of the tests fail at certain Similarities, so those have to be fixed as 
well.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-09 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Fixed a bug in TestEasySimilarity that prevented Similarities that use a 
subclass of EasyStats to be tested. Also, modified EasyStats so that totalBoost 
is set to the value of queryBoost in the constructor.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-09 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Fixed integer division bug in BasicModelG.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-09 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

License added.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-09 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Got rid of all but one nocommits.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-09 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Rebased the changes on the current state of trunk.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, 
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-08 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Added discountOverlaps to EasySimilarity.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-08 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Unit tests added.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-08 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

 * EasySimilarity subclasses return their names in toString()
 * The two test cases return the name of the Similarity that failed the test.

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-06 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080416#comment-13080416
 ] 

David Mark Nemeskey edited comment on LUCENE-3357 at 8/6/11 3:52 PM:
-

Integration tests added. There are two of them; however, ant test runs only one?

  was (Author: david_nemeskey):
Integration tests added. There are two of them; however, ant test only runs 
one?
  
> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-06 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Attachment: LUCENE-3357.patch

Integration tests added. There are two of them; however, ant test only runs one?

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-06 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Added a short explanation on the parameter for the Jelinek-Mercer method.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-06 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Done. Actually, I wanted to implement the norm table in the way you said, but 
somehow forgot about it.

Two questions remain on my side:
 * the one about discountOverlaps (see above)
 * what kind of index-time boosts do people usually use? Too big a boost might 
cause problems if we just divide the length with it. Maybe we should take the 
logarithm or sth like that?

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-04 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Removed reflection from IBSimilarity.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-04 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Deleted the accidentally forgot abstract modifier from the Distribution classes.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-04 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079555#comment-13079555
 ] 

David Mark Nemeskey edited comment on LUCENE-3220 at 8/4/11 8:04 PM:
-

Deleted the accidentally forgotten abstract modifier from the Distribution 
classes.

  was (Author: david_nemeskey):
Deleted the accidentally forgot abstract modifier from the Distribution 
classes.
  
> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-02 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

EasySimilarity now computes norms in the same way as DefaultSimilarity.

Actually not exactly the same way, as I have not yet added the discountOverlaps 
property. I think it would be a good idea for EasySimilarity as well (it is for 
phrases, right), what do you reckon?

I also wrote a quick test to see which norm (length directly or 1/sqrt) is 
closer to the original value and it seems that the direct one is usually much 
closer (RMSE is 0.09689688608375747 vs 0.23787634482532286). Of course, I know 
it is much more important that the new Similarities can use existing indices.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-02 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Labels: gsoc gsoc2011  (was: )

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-02 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3357:


Labels: gsoc gsoc2011 test  (was: gsoc gsoc2011)

> Unit and integration test cases for the new Similarities
> 
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
>  Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
>
> Write test cases to test the new Similarities added in 
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
> test cases will be created:
>  * unit tests, in which mock statistics are provided to the Similarities and 
> the score is validated against hand calculations;
>  * integration tests, in which a small collection is indexed and then 
> searched using the Similarities.
> Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-02 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Component/s: core/query/scoring
 Labels: gsoc gsoc2011  (was: gsoc)

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/query/scoring, core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc, gsoc2011
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3357) Unit and integration test cases for the new Similarities

2011-08-02 Thread David Mark Nemeskey (JIRA)
Unit and integration test cases for the new Similarities


 Key: LUCENE-3357
 URL: https://issues.apache.org/jira/browse/LUCENE-3357
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/query/scoring
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
 Fix For: flexscoring branch


Write test cases to test the new Similarities added in 
[LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of 
test cases will be created:
 * unit tests, in which mock statistics are provided to the Similarities and 
the score is validated against hand calculations;
 * integration tests, in which a small collection is indexed and then searched 
using the Similarities.

Performance tests will be performed in a separate issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-02 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Added norm decoding table to EasySimilarity, and removed sumTotalFreq. Sorry I 
could only upload this patch now but I didn't have time to work on Lucene the 
last week.

As I see, all the problems you mentioned have been corrected, so maybe we can 
go on with the review?

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-25 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070654#comment-13070654
 ] 

David Mark Nemeskey commented on LUCENE-3220:
-

I think I realized what I wanted with numberOfFieldTokens. I was afraid that 
sumTotalTermFreq is affected by norms / index time boost / etc, and I wanted to 
make numberOfFieldTokens to unaffected by those (I don't know now how); only I 
forgot to do so.

But if sumTotalTermFreq is really just the number of tokens in the field, I 
will delete one of them. Not sure which, because for me numberOfFieldTokens 
seems a more descriptive name than sumTotalTermFreq, but the latter is used 
everywhere in Lucene. May I ask your opinion on this question?

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Fixed two of the issues you mentioned:
 * Apache license header added to all files in the similarities package;
 * cleaned up the constructor of DFRSimilarity and added a few new ones.

I have not yet moved the NoNormalization and NoAfterEffect classes to their own 
files, because I feel a bit uncomfortable about the naming, since it's 
different from that of the other classes, e.g. NormalizationH2 vs 
NoNormalization. 

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-14 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Made the score() and explain() methods in Similarity components final.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-14 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Explanation added to LM models; query boost added.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-13 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Description: 
With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
can finally work on implementing the standard ranking models. Currently DFR, 
BM25 and LM are on the menu.

Done:
 * {{EasyStats}}: contains all statistics that might be relevant for a ranking 
algorithm
 * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
DocScorers and as much implementation detail as possible
 * _BM25_: the current "mock" implementation might be OK
 * _LM_
 * _DFR_
 * The so-called _Information-Based Models_



  was:
With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
can finally work on implementing the standard ranking models. Currently DFR, 
BM25 and LM are on the menu.

TODO:
 * {{EasyStats}}: contains all statistics that might be relevant for a ranking 
algorithm
 * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
DocScorers and as much implementation detail as possible
 * _BM25_: the current "mock" implementation might be OK
 * _LM_
 * _DFR_

Done:


> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> Done:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
>  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-13 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Added LMSimilarity so that the two LM methods have a common parent. It also 
defines the CollectionModel interface which computes p(w|C) in a pluggable way 
(and only once per query, though I am not sure this improves performance as I 
need a cast in score()).

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-10 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

 * Fixed #1
 * Added a totalBoost to EasySimilarity, and a getter method -- noone uses it 
yet
 * Added basic implementations for the Jelinek-Mercer and the Dirichlet LM 
methods.

As for the last one: the implementation is very basic now, I want to factor a 
few things out (e.g. p(w|C) to LMStats, possibly in a pluggable way so ppl can 
implement it however they want). It also doesn't seem right to have the same LM 
method implemented twice (both as MockLMSimilarity and here), so I'll take a 
look to see if I can merge those two. Finally, I am wondering whether I should 
implement the absolute discounting method, which, according to the paper, seems 
inferior to the Jelinek-Mercer and Dirichlet methods. Right now I am more on 
the "no" side.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-05 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

 * log2() moved from DFRSimilarity to EasySimilarity,
 * changed DFRSimilarity so that it constructor does not use reflection.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-04 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Fixed a few things in MockBM25Similarity.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-27 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Information-based model framework due to Clinchant and Gaussier added.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-27 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Explanation-handling added to EasySimilarity and DFRSimilarity.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-26 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Made the signature of EasySimilarity.score() a bit saner.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-25 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Implementation of the DFR framework added. Lots of nocommits, though. I things 
to think about:
 * lots of (float) conversions. Maybe the inner API (BasicModel, etc.) could 
use doubles? According to my experience, double is faster anyway, at least on 
64bit architectures
 * I am not overly happy about the naming scheme, i.e. BasicModelBE, etc. Maybe 
we should do it the same way as in Terrier, with a basicmodel package and class 
names like BE?
 * A regular SimilarityProvider implementation won't play well with 
DFRSimilarity, in case the user wants to use several different setups. 
Actually, this is a problem for all similarities that have parameters (e.g. 
BM25 with b and k).

Also, I think we need that NormConverter we talked earlier on irc, so that the 
Similarities can run on any index.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

EasySimilarity added. Lots of questions and nocommit in the code.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Comment: was deleted

(was: Done.)

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Done.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: (was: LUCENE-3220.patch)

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Done.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
> LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Oh, sorry, how lame of me :( Actually I am working now on a different machine 
than the one I usually do, so that's why I made those mistakes. Anyhow, I have 
fixed them.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052025#comment-13052025
 ] 

David Mark Nemeskey commented on LUCENE-3220:
-

 * I was wondering about that too -- actually docNo is a mistake, it should 
have been noDocs or noOfDocs anyway, but I guess I'll just go with 
numberOfDocuments.
 * I'll put a nocommit there for the time being, and if no sims use it, I'll 
just remove it from the Stats. Terrier has it, though, so I guess there should 
be at least one method that depends on it.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

EasyStats object added.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Issue Type: Sub-task  (was: New Feature)
Parent: LUCENE-2959

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Description: 
With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
can finally work on implementing the standard ranking models. Currently DFR, 
BM25 and LM are on the menu.

TODO:
 * {{EasyStats}}: contains all statistics that might be relevant for a ranking 
algorithm
 * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
DocScorers and as much implementation detail as possible
 * _BM25_: the current "mock" implementation might be OK
 * _LM_
 * _DFR_

Done:

  was:
With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
can finally work on implementing the standard ranking models. Currently DFR, 
BM25 and LM are on the menu.

TODO:
 * `EasyStats`: contains all statistics that might be relevant for a ranking 
algorithm
 * `EasySimilarity`: the ancestor of all the other similarities. Hides the 
DocScorers and as much implementation detail as possible
 * _BM25_: the current "mock" implementation might be OK
 * _LM_
 * _DFR_

Done:


> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread David Mark Nemeskey (JIRA)
Implement various ranking models as Similarities


 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey


With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
can finally work on implementing the standard ranking models. Currently DFR, 
BM25 and LM are on the menu.

TODO:
 * `EasyStats`: contains all statistics that might be relevant for a ranking 
algorithm
 * `EasySimilarity`: the ancestor of all the other similarities. Hides the 
DocScorers and as much implementation detail as possible
 * _BM25_: the current "mock" implementation might be OK
 * _LM_
 * _DFR_

Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >