[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064144#comment-13064144
 ] 

Yonik Seeley commented on SOLR-2242:
------------------------------------

This issue was a bit tricky to review, given that the output doesn't seem to 
quite match the examples.
I also wasn't exactly sure what the latest patch was, so I just looked at the 
patch uploaded on 28/Jun/11.

Here's my summary on what the patch currently does:

If you add facet.facetTermCounts=2 to a faceting request, you get the following:

{code}
<lst name="facet_fields">
  <lst name="text">
    <int name="electronics">14</int>
    <int name="inc">8</int>
    <int name="2.0">5</int>
    <int name="lcd">5</int>
    <int name="memory">5</int>
    <int name="numFacetTerms">385</int>
  </lst>
</lst>
{code}

If you add facet.facetTermCounts=1 to a faceting request, you get the following:

{code}
<lst name="facet_fields">
  <lst name="text">
    <int name="numFacetTerms">385</int>
  </lst>
</lst>
{code}

w.r.t. the interface, I agree with a number of Lance's observations.

- facet.numFacetTerms name: the second "Facet" is a bit redundant.  And we 
probably should be talking in terms of "constraints" instead of "terms".  
Perhaps facet.numConstraints or (facet.nconstraints to be consistent with 
group.ngroups).
- facet.nconstraints should just be a boolean... no need for "1" or "2".  If 
the user doesn't want to see any constraints, then they can set facet.limit=0.  
This is also consistent with grouping.
- we're mixing units in the same list, and that's probably not a great idea?  
Constraints have units of documents (number of documents that matched that 
constraint) while "numFacetTerms" has units of number of constraints.
- I think this also breaks distributed faceting due to mixing of units?  The 
distributed faceting code thinks that numFacetTerms is a constraint.
- We need to figure out what we are going to do in distributed mode... it 
doesn't seem easy to actually figure out the number of constraints without 
streaming them *all* back and merging (i.e. you can't just add up the numbers)
- I also agree that we should not built the entire list in memory just to get 
the size of that list.

It seems like rather than adding more magic names to the list (and risk a real 
collision with the actual name of a constraint), we should add more structure 
to the response, as previously discussed.

So if we added facet.nconstraints=true, we would get
{code}
<lst name="facet_fields">
  <lst name="text">
    <int name="numFacetTerms">385</int>
    <lst name="counts">
      <int name="electronics">14</int>
      <int name="inc">8</int>
      <int name="2.0">5</int>
      <int name="lcd">5</int>
      <int name="memory">5</int>
   </lst>
  </lst>
</lst>
{code}

And when we use this new format, we should consider using a separate "missing" 
name for facet.missing=true instead of using the null name in with the counts.

This format change is where we need to be careful about back compat - this 
interface is one of the widest used and with all the 3rd party clients and 
libraries out there, we should still support the old format via a facet.format 
parameter or something.

Bill: You originally opened this issue for use with grouping to get the total 
number of groups. Are you aware of the group.ngroups parameter that was added 
that does this?


> Get distinct count of names for a facet field
> ---------------------------------------------
>
>                 Key: SOLR-2242
>                 URL: https://issues.apache.org/jira/browse/SOLR-2242
>             Project: Solr
>          Issue Type: New Feature
>          Components: Response Writers
>    Affects Versions: 4.0
>            Reporter: Bill Bell
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: NumFacetTermsFacetsTest.java, 
> SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.patch, 
> SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
> SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1.patch, 
> SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch
>
>
> When returning facet.field=<name of field> you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=2&facet.limit=-1&facet.field=price
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=0&facet.limit=-1&facet.field=price
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=1&facet.limit=-1&facet.field=price
> This currently only works on facet.field.
> {code}
> <lst name="facet_fields">
>   <lst name="price">
>     <int name="numFacetTerms">14</int>
>     <int name="0.0">3</int><int name="11.5">1</int><int 
> name="19.95">1</int><int name="74.99">1</int><int name="92.0">1</int><int 
> name="179.99">1</int><int name="185.0">1</int><int name="279.95">1</int><int 
> name="329.95">1</int><int name="350.0">1</int><int name="399.0">1</int><int 
> name="479.95">1</int><int name="649.99">1</int><int name="2199.0">1</int>
>   </lst>
> </lst>
> {code} 
> Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to