[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2894:
---------------------------

    Attachment: SOLR-2894.patch

I'm going to spend some time this week reviewing the state of things.

First up, some minor tweaks to the latest patch...

* fixed a typo in TestDistributedSearch ("facet.fiedl" -> "facet.field")
** this is my biggest anoyance about most of our existing distributed serach 
tests -- they just assert that queries return the same thing as single node 
tests, but don't assert anything about the response, so mistakes in the input, 
or mistakes in indexing hte docs, resulting in a useless test aren't caught)
** this also relates to marks comment about removing "ndate" since that field 
no longer exists in the test configs - using tdate_a & tdate_b here should be 
fine
* removed the "nocommit" mark mentioned regarding DateField - that method moved 
to TrieDateField so his fix is correct.


Some comments/questions based on what I've reviewed so far (note: many of these 
comments/questions come from a place of genuine ignorance since i've only 
reviewed about 30% of the patch so far)...

* even at a glance, it's obvious the SimpleFacets changes are a simple 
refactoring and totally fine.
* In FacetComponent - Setting asside the core pivot facet changes...
** Most of the other changes in seem like straight forward (and much needed!) 
variable renaming (+1) to help eliminate ambiguity between the existing field 
faceting refinement and the new pivot faceting refinement
** the new "fieldsToOverRequestOn" Map confuses me in a few ways...
*** As is, i don't understand why this is a Map and not a Set.
*** Some odd conditional logic is used when iterating over this "Set" to 
determine the overrequest limit - i'm still trying to wrap my head arround this 
but in particular the comment {{// dff has the info, the params have been 
scrubbed}} confuses me -- where are these params "scrubbed" ?
*** I like these new explict overrequest count/ratio params, and i get that the 
end-game here is that they can be used to affect the amount of overrequest done 
for both fact.field and facet.pivot -- but i'm not understanding the value of 
building up this "fieldsToOverRequestOn" set of names (for every shard request) 
and then iterating over it and consulting *either* the DFF or the params to 
decide which limit value to use on the shard requests, and then 
(conditionally?) removing the limit/offset/overrequest params from the shard 
requests.  Wouldn't it be simplier to have modifyRequest *always* remove the 
limit/offset/overrequest params from the shard params, and then have the 
individual code paths (for both facet.field & facet.pivot) take responsibility 
for adding back the new limit params based on the overrequest calculations 
using the *original* request params (ie: {{rb.req.getParams()}}).
*** My chief concern here being that (at first glance) this change seems like 
it adds a small amount of overhead to the overrequest limit calculations, and 
makes this bit of code more confusing, w/o any obvious (to me) advantage.
* I don't yet understand the need for the new "PURPOSE_REFINE_PIVOT_FACETS" 
stage of shard requests? ... can someone clarify why  pivot facets can't just 
be refined during the existing "PURPOSE_REFINE_FACETS" stage?
* I notice that the new DistributedFacetPivotTest directly extends 
BaseDistributedSearchTestCase and uses a fixed shard count, and indexes some 
docs directly to certain clients
** is there something about the functionality (or about the test) that requires 
certain data locality (ie: certain docs on same shard) to work?
** if not: is there any other reason we can't switch this over to a Cloud based 
test with a variable numbers of shards and compairons against the control 
collection?



> Implement distributed pivot faceting
> ------------------------------------
>
>                 Key: SOLR-2894
>                 URL: https://issues.apache.org/jira/browse/SOLR-2894
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erik Hatcher
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> dateToObject.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to