[
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253820#comment-17253820
]
Noble Paul commented on SOLR-15052:
-----------------------------------
{quote}Then the {{R5}} update is also going to read the directory listing and
execute.
{quote}
R5 would have gotten a callback and it would've updated the per-replica-states
anyway. So, all that we are doing is an extra {{stat}} read , which is
extremely cheap.
{quote}With 500 children znodes, getChildren took on my laptop about 10-15ms
while getData on a single file with equivalent amount of text took longer at
~20ms. This came as a surprise to me.
{quote}
Reads are not such a big deal. Even writes are not a big deal. But, CAS writes
are a big deal. We would like to minimize contention while doing CAS writes.
{quote}The multi operation (delete znode, create znode) took about 40ms while
the CAS of the text file was faster at 30ms,
{quote}
CAS in itself is not slow. As the no:of of parallel writes grow, the
performance degrades dramatically. If you have 1000's of replicas trying to
update using CAS, the performance is going to be unacceptably low. Where as,
the {{multi}} approach on individual nodes will perform same irrespective of
whether we have 2 replicas or 20000 replicas.
{quote}The implementation in the PR could easily avoid systematically
re-reading the znode children list by attempting the multi operation on the
cached PerReplicaStates of the DocCollection
{quote}
It already uses the cached data. Yes, it does an extra version check, but
that's cheap
> Reducing overseer bottlenecks using per-replica states
> ------------------------------------------------------
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Ishan Chattopadhyaya
> Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer
> bottlenecks by avoiding replica state updates from going to the state.json
> via the overseer. However, the approach taken here is different from
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here:
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the
> state.json. It has the name that encodes the replica name, state, leadership
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we
> needed a shard state provider abstraction and introduce it everywhere in the
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul].
> [[email protected]], [~noble.paul] and I have collaborated on this
> effort. The reference branch takes a conceptually similar (but not identical)
> approach.
> I shall attach a PR and performance benchmarks shortly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]