[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993233#comment-13993233
 ] 

Hoss Man commented on SOLR-2894:
--------------------------------


I still haven't had a chance to really dig into the implementation details of 
the patch, but i wanted to spend some time testing things out from a user 
perspective...

----

One of the first things i noticed, is that the refinement requests seem to be 
extra verbose.  For example, given this user request (using the example data, 
with a 2 shard cloud setup):

http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,inStock&facet.limit=3&facet.pivot=manu_id_s,inStock

This is what the refinement requests in the logs of each shard looked like...

{noformat}
3434041 [qtp1282186295-19] INFO  org.apache.solr.core.SolrCore  – [collection1] 
webapp=/solr path=/select 
params={manu_id_s,inStock_8__terms=samsung&facet=true&sort=id+desc&facet.limit=3&manu_id_s,inStock_9__terms=viewsonic&distrib=false&cat,inStock_1__terms=search&wt=javabin&version=2&rows=0&manu_id_s,inStock_6__terms=maxtor&manu_id_s,inStock_7__terms=nor&NOW=1399584452682&shard.url=http://127.0.1.1:8983/solr/collection1/&df=text&cat,inStock_2__terms=software&q=*:*&manu_id_s,inStock_3__terms=canon&manu_id_s,inStock_4__terms=ati&facet.pivot.mincount=-1&isShard=true&cat,inStock_0__terms=hard+drive&facet.pivot={!terms%3D$cat,inStock_0__terms}cat,inStock&facet.pivot={!terms%3D$cat,inStock_1__terms}cat,inStock&facet.pivot={!terms%3D$cat,inStock_2__terms}cat,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_3__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_4__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_5__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_6__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_7__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_8__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_9__terms}manu_id_s,inStock&manu_id_s,inStock_5__terms=eu}
 hits=14 status=0 QTime=3 

3424918 [qtp1282186295-16] INFO  org.apache.solr.core.SolrCore  – [collection1] 
webapp=/solr path=/select 
params={cat,inStock_10__terms=memory&manu_id_s,inStock_15__terms=dell&facet=true&manu_id_s,inStock_12__terms=apple&sort=id+desc&facet.limit=3&manu_id_s,inStock_13__terms=asus&manu_id_s,inStock_16__terms=uk&distrib=false&wt=javabin&manu_id_s,inStock_14__terms=boa&version=2&rows=0&NOW=1399584452682&shard.url=http://127.0.1.1:7574/solr/collection1/&df=text&manu_id_s,inStock_11__terms=corsair&q=*:*&facet.pivot.mincount=-1&isShard=true&facet.pivot={!terms%3D$cat,inStock_10__terms}cat,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_11__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_12__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_13__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_14__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_15__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_16__terms}manu_id_s,inStock}
 hits=18 status=0 QTime=2 
{noformat}

Or if we prune that down to just the interesting params (as far as pivot 
faceting goes)...

{noformat:title=shard1}
facet.pivot.mincount=-1
cat,inStock_0__terms=hard+drive
cat,inStock_1__terms=search
cat,inStock_2__terms=software
manu_id_s,inStock_3__terms=canon
manu_id_s,inStock_4__terms=ati
manu_id_s,inStock_5__terms=eu
manu_id_s,inStock_6__terms=maxtor
manu_id_s,inStock_7__terms=nor
manu_id_s,inStock_8__terms=samsung
manu_id_s,inStock_9__terms=viewsonic
facet.pivot={!terms=$cat,inStock_0__terms}cat,inStock
facet.pivot={!terms=$cat,inStock_1__terms}cat,inStock
facet.pivot={!terms=$cat,inStock_2__terms}cat,inStock
facet.pivot={!terms=$manu_id_s,inStock_3__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_4__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_5__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_6__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_7__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_8__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_9__terms}manu_id_s,inStock
{noformat}

{noformat:title=shard2}
facet.pivot.mincount=-1
cat,inStock_10__terms=memory
manu_id_s,inStock_11__terms=corsair
manu_id_s,inStock_12__terms=apple
manu_id_s,inStock_13__terms=asus
manu_id_s,inStock_14__terms=boa
manu_id_s,inStock_15__terms=dell
manu_id_s,inStock_16__terms=uk
facet.pivot={!terms=$cat,inStock_10__terms}cat,inStock
facet.pivot={!terms=$manu_id_s,inStock_11__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_12__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_13__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_14__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_15__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_16__terms}manu_id_s,inStock
{noformat}

I believe that what's going on here is basically:

* top level params are being used for the individual terms that need refined 
(which is smart, helps eliminate risk of terms needing special escaping with 
local params)
* the top level param names for these terms that need refined use a 
per-(user)request "global" counter to ensure that they are unique (+1)
* the top level term param names also include the facet.pivot spec they are 
needed for -- this seems redundant since the counter is clearly global (even 
across multiple "facet.pivot" specs)
* these top leve term param names are then added only to the shard requests 
where refinement is actually needed for those terms (+1) and are referenced as 
variables in facet.pivot commands using the "terms" local param (which the 
shards evidently look for to know when this is a refinement request)
* because many terms may need refinement, that means each user specified 
{{facet.pivot=X,Y}} param results in *many* shard params of 
{{facet.pivot=\{!terms=$N\}X,Y}}

I realize that local params don't play nice with multi-valued params at all, 
let alone make it easy to use a single variable to refer to a multi-valued 
param -- But wouldn't it be simpler (and less verbose over the wire) to just 
ignore Solr's built in param variable derefrencing and instead generate 1 
unique param name to use for *all* the terms we care about (for each unique 
pivot spec), and then refer to that name once in a local param for a _single_ 
facet.pivot param (which the pivot facet could would then go and explicitly 
fetch from the top level SolrParams as a multi-value) 

The result being, that instead of the refinement requests shows above, the 
refinement requests for each shard could be something much simpler like...

{noformat:title=shard1_proposed}
facet.pivot.mincount=-1
_fpt_1=hard+drive
_fpt_1=search
_fpt_1=software
_fpt_2=canon
_fpt_2=ati
_fpt_2=eu
_fpt_2=maxtor
_fpt_2=nor
_fpt_2=samsung
_fpt_2=viewsonic
facet.pivot={!fpt=1}cat,inStock
facet.pivot={!fpt=2}manu_id_s,inStock
{noformat}

{noformat:title=shard2_proposed}
facet.pivot.mincount=-1
_fpt_1=memory
_fpt_2=corsair
_fpt_2=apple
_fpt_2=asus
_fpt_2=boa
_fpt_2=dell
_fpt_2=uk
facet.pivot={!fpt=1}cat,inStock
facet.pivot={!fpt=2}manu_id_s,inStock
{noformat}

(where {{\_fpt\_}} is just a short prefix for "facet pivot terms" that i pulled 
out of my ass)

----

Another thing I noticed is that with my 2 shard exampledocs setup, the 
following URL seems to send the pivot faceting into an infinite loop of 
refinement requests (note the typo: there's a space embeded in a field name 
{{manu_id_s != manu_+id_s}} )...

http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,manu_+id_s,inStock&facet.limit=3

...not clear what's going on there, but definitely something that needs fixed 
before committing ("garbage in -> garbage out" is one thing, "garbage in -> 
crash your cluster" is another)

----

the multi-level refinement is sooooooo sweet.


> Implement distributed pivot faceting
> ------------------------------------
>
>                 Key: SOLR-2894
>                 URL: https://issues.apache.org/jira/browse/SOLR-2894
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erik Hatcher
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> dateToObject.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to