[ https://issues.apache.org/jira/browse/SOLR-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241923#comment-15241923 ]
Scott Blum edited comment on SOLR-8973 at 4/14/16 9:16 PM: ----------------------------------------------------------- [~shalinmangar] I've come to the conclusion that ZkStateReader isn't doing as well as it could be. Adding watchers in constructState() seems (retroactively) like a hack. It doesn't correctly cover the case where a collection parent node exists (e.g. /solr/collections/coll1) but no state.json child yet appears. I believe I have a patch and test to fix this. Attached it to this JIRA, but not sure if I should create a new one. was (Author: dragonsinth): [~shalinmangar] I've come to the conclusion that ZkStateReader isn't doing as well as it could be. Adding watchers in constructState() seems (retroactively) like a hack. It doesn't correctly cover the case where a collection parent node exists (e.g. /solr/collections/coll1) but no state.json child yet appears. I believe I have a patch and test to fix this. Not sure whether I should attach to this JIRA or create a new one. > TX-frenzy on Zookeeper when collection is put to use > ---------------------------------------------------- > > Key: SOLR-8973 > URL: https://issues.apache.org/jira/browse/SOLR-8973 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, master, 5.6 > Reporter: Janmejay Singh > Assignee: Shalin Shekhar Mangar > Labels: collections, patch-available, solrcloud, zookeeper > Attachments: SOLR-8973-ZkStateReader.patch, SOLR-8973.patch > > > This is to do with a distributed data-race. Core-creation happens at a time > when collection is not yet visible to the node. In this case a fallback > code-path is used which de-references collection-state lazily (on demand) as > opposed to setting a watch and keeping it cached locally. > Due to this, as requests towards the core mount, it generates ZK fetch for > collection proportionately. On a large solr-cloud cluster, this generates > several Gbps of TX traffic on ZK nodes. This affects indexing > throughput(which floors) in addition to running ZK node out of network > bandwidth. > On smaller solr-cloud clusters its hard to run into, because probability of > this race materializing reduces. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org