[
https://issues.apache.org/jira/browse/SOLR-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241580#comment-15241580
]
Janmejay Singh commented on SOLR-8973:
--------------------------------------
No, there is a difference in what overseer and core-api (on a different node)
see at the same instant. Some ZK nodes may be lagging (ZK does not ensure
visibility of changes across all nodes at the same time), when clients can't
tolerate delay in visibility of changes, they need to execute sync operation
before read.
Overseer's session may be connected to a zk-node that is ahead of the zk-node
that the core-node is connected to. So while overseer sees the change,
core-node will not (unless it executes sync before read).
If all nodes saw the same version as overseer, the race wouldn't exist at all.
We can change the patch to lazily setup watch for a collection that is fetched
using active(on-demand) fetcher. In this model, once the fetch is done
successfully, it will setup watch for the collection before returning the
fetched collection-def.
> TX-frenzy on Zookeeper when collection is put to use
> ----------------------------------------------------
>
> Key: SOLR-8973
> URL: https://issues.apache.org/jira/browse/SOLR-8973
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, master, 5.6
> Reporter: Janmejay Singh
> Assignee: Shalin Shekhar Mangar
> Labels: collections, patch-available, solrcloud, zookeeper
> Attachments: SOLR-8973.patch
>
>
> This is to do with a distributed data-race. Core-creation happens at a time
> when collection is not yet visible to the node. In this case a fallback
> code-path is used which de-references collection-state lazily (on demand) as
> opposed to setting a watch and keeping it cached locally.
> Due to this, as requests towards the core mount, it generates ZK fetch for
> collection proportionately. On a large solr-cloud cluster, this generates
> several Gbps of TX traffic on ZK nodes. This affects indexing
> throughput(which floors) in addition to running ZK node out of network
> bandwidth.
> On smaller solr-cloud clusters its hard to run into, because probability of
> this race materializing reduces.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]