[jira] [Commented] (SOLR-13399) compositeId support for shard splitting

Hoss Man (JIRA) Thu, 25 Jul 2019 08:48:06 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892894#comment-16892894
 ]


Hoss Man commented on SOLR-13399:
---------------------------------

bq. (unless you mean we've generally moved to doing doc it as part of the 
initial commit? If so, I missed that.)

yes, that's the entire value add of keeping the ref-guide in the same repo as 
the source, and having it as part of the main build w/precommit.

we've been trying to move to having the "code release process" and the 
"ref-guide release process" be a single process, with a single vote -- and 
we're getting close -- but the main hold up is people who add features w/o docs 
and then forcing a scramble during the release process to back fill docs on new 
features.

> compositeId support for shard splitting
> ---------------------------------------
>
>                 Key: SOLR-13399
>                 URL: https://issues.apache.org/jira/browse/SOLR-13399
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Major
>             Fix For: 8.3
>
>         Attachments: SOLR-13399.patch, SOLR-13399.patch
>
>
> Shard splitting does not currently have a way to automatically take into 
> account the actual distribution (number of documents) in each hash bucket 
> created by using compositeId hashing.
> We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* 
> command that would look at the number of docs sharing each compositeId prefix 
> and use that to create roughly equal sized buckets by document count rather 
> than just assuming an equal distribution across the entire hash range.
> Like normal shard splitting, we should bias against splitting within hash 
> buckets unless necessary (since that leads to larger query fanout.) . Perhaps 
> this warrants a parameter that would control how much of a size mismatch is 
> tolerable before resorting to splitting within a bucket. 
> *allowedSizeDifference*?
> To more quickly calculate the number of docs in each bucket, we could index 
> the prefix in a different field.  Iterating over the terms for this field 
> would quickly give us the number of docs in each (i.e lucene keeps track of 
> the doc count for each term already.)  Perhaps the implementation could be a 
> flag on the *id* field... something like *indexPrefixes* and poly-fields that 
> would cause the indexing to be automatically done and alleviate having to 
> pass in an additional field during indexing and during the call to 
> *SPLITSHARD*.  This whole part is an optimization though and could be split 
> off into its own issue if desired.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13399) compositeId support for shard splitting

Reply via email to