[ 
https://issues.apache.org/jira/browse/LUCENE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033801#comment-13033801
 ] 

Martijn van Groningen edited comment on LUCENE-3098 at 5/15/11 7:58 PM:
------------------------------------------------------------------------

{quote}
 * Maybe only one ctor for TopGroups?  (Ie, we just pass in null as
   totalGroupCount).  I'm wary of ctor explosion over time...

 * In TestGrouping, you don't need a separate uniqueGroupCount int?
   Can't you just use knownGroups.size() in the end?

 * For TotalGroupCountCollector, in the jdocs for the ctor maybe
   state that caller should set initialSize to rough estimate of how
   many uniuqe groups are expected, but that this uses up 4 bytes *
   initialSize?  Maybe we should also add a ctor that sets a default
   for this (128?) and mark the other ctor as expert?
{quote}
I agree. I've updated the patch.

{quote}
Hmm, it's a little odd to have TopGroups hold the totalGroupCount?
Ie, it's only the test case that makes use of this, because the 2nd
pass collector just sets it to null?  It'd be nice to find some way to
have 2nd pass collector be able to set this...
{quote}
That would be nice. Future collectors might need something similar. I'm 
currently think about a TopGroupsEnrich interface that collectors can 
implement. This allows them to add data to the TopGroups like total group 
count. The SecondPassGroupingCollector has a list of collectors that implement 
the TopGroupsEnrich interface. When the getTopGroups() method is executed it 
iterates of the these collectors and the TopGroups is enriched with data. 
Downside is that the fields inside TopGroups can't be final and properly we 
need setters. I think if we do something like this we should do this in a new 
Jira issue. 

      was (Author: martijn.v.groningen):
    {quote}
 * Maybe only one ctor for TopGroups?  (Ie, we just pass in null as
   totalGroupCount).  I'm wary of ctor explosion over time...

 * In TestGrouping, you don't need a separate uniqueGroupCount int?
   Can't you just use knownGroups.size() in the end?

 * For TotalGroupCountCollector, in the jdocs for the ctor maybe
   state that caller should set initialSize to rough estimate of how
   many uniuqe groups are expected, but that this uses up 4 bytes *
   initialSize?  Maybe we should also add a ctor that sets a default
   for this (128?) and mark the other ctor as expert?
{quote}
I agree. I've updated the patch.

{quote}
Hmm, it's a little odd to have TopGroups hold the totalGroupCount?
Ie, it's only the test case that makes use of this, because the 2nd
pass collector just sets it to null?  It'd be nice to find some way to
have 2nd pass collector be able to set this...
{quote}
That would be nice. Future collectors might need something similar. I'm 
currently think about a TopGroupsEnrich interface that collectors can 
implement. This allows them to add data to the TopGroups like total group 
count. The SecondPassGroupingCollector has a list of collectors that implement 
the TopGroupsEnrich interface. When the getTopGroups() method is executed it 
iterates of the these collectors and the TopGroups is enriched with data. 
Downside is that the fields inside TopGroups can't be final and properly we 
need setters. I think if we do something like this we should this in a new Jira 
issue. 
  
> Grouped total count
> -------------------
>
>                 Key: LUCENE-3098
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3098
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Martijn van Groningen
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3098.patch, LUCENE-3098.patch, LUCENE-3098.patch, 
> LUCENE-3098.patch
>
>
> When grouping currently you can get two counts:
> * Total hit count. Which counts all documents that matched the query.
> * Total grouped hit count. Which counts all documents that have been grouped 
> in the top N groups.
> Since the end user gets groups in his search result instead of plain 
> documents with grouping. The total number of groups as total count makes more 
> sense in many situations. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to