[ 
https://issues.apache.org/jira/browse/SOLR-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522177#comment-14522177
 ] 

Crawdaddy commented on SOLR-7214:
---------------------------------

Yonik, I think I found a JSON faceting bug when sub-faceting a field on 
unique(another_field).  As part of the upgrade from HS to Solr 5.1, I wanted to 
A/B test my queries between the two. I setup two identical 5-shard Solr 
installs, 35M docs each - one running HS 0.09 and and the other Solr 5.1.  
Issuing my facet query, I noticed that the unique counts were different between 
the two.  

This query, issued to my Solr 5.1 instance, demonstrates the inconsistency 
between native facets and JSON facets (limits set low enough to repro the 
issue):

rows=0&q="John Lennon"&fq=keywords:[* TO 
*]&facet=true&facet.pivot=keywords,top_private_domain_s&facet.limit=10&
json.facet={
  keywords:{
    terms:{
      field:keywords,
      limit: 2,
      facet:{       
           unique_domains: 'unique(top_private_domain_s)'
      }
    }
  }
}

A snippet of the results shows that the native facets return at least 10 unique 
values (there are more) for the keyword "Paul McCartney":

   "facet_pivot":{
      "keywords,top_private_domain_s":[{
          "field":"keywords",
          "value":"Paul McCartney",
          "count":602,
          "pivot":[{
              "field":"top_private_domain_s",
              "value":"taringa.net",
              "count":35},
            {
              "field":"top_private_domain_s",
              "value":"dailymail.co.uk",
              "count":34},
            {
              "field":"top_private_domain_s",
              "value":"beatlesbible.com",
              "count":33},
            {
              "field":"top_private_domain_s",
              "value":"examiner.com",
              "count":22},
            {
              "field":"top_private_domain_s",
              "value":"blogspot.com",
              "count":14},
            {
              "field":"top_private_domain_s",
              "value":"musicradar.com",
              "count":13},
            {
              "field":"top_private_domain_s",
              "value":"liverpoolecho.co.uk",
              "count":11},
            {
              "field":"top_private_domain_s",
              "value":"rollingstone.com",
              "count":11},
            {
              "field":"top_private_domain_s",
              "value":"about.com",
              "count":9},
            {
              "field":"top_private_domain_s",
              "value":"answers.com",
              "count":8}]},

...

But the JSON facets say there's only 4 unique values:

 "facets":{
    "count":11859,
    "keywords":{
      "buckets":[{
          "val":"Paul McCartney",
          "count":602,
          "unique_domains":4}]}}}

The results are correct when issuing the same search in Heliosearch:

"facets":{
    "count":11859,
    "keywords":{
      "buckets":[{
          "val":"Paul McCartney",
          "count":602,
          "unique_domains":228}]}}}

In all cases the doc count (602) is the same so I know it's hitting the same 
documents.

Any advice you can offer as to whether you think this is a bug, or if the 
behavior is intentionally different between the two systems, would be much 
appreciated.  If it is a bug but you think there's a workaround, that'd be 
great to know too.




> JSON Facet API
> --------------
>
>                 Key: SOLR-7214
>                 URL: https://issues.apache.org/jira/browse/SOLR-7214
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>             Fix For: 5.1
>
>         Attachments: SOLR-7214.patch
>
>
> Overview is here: http://yonik.com/json-facet-api/
> The structured nature of nested sub-facets are more naturally expressed in a 
> nested structure like JSON rather than the flat structure that normal query 
> parameters provide.
> Goals:
> - First class JSON support
> - Easier programmatic construction of complex nested facet commands
> - Support a much more canonical response format that is easier for clients to 
> parse
> - First class analytics support
> - Support a cleaner way to do distributed faceting
> - Support better integration with other search features



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to