Re: Distributed commits in CloudSolrServer
>Are distributed commits also done in parallel across shards? I meant 'sequentially' across shards. On Wed, Apr 16, 2014 at 9:08 AM, Peter Keegan wrote: > Are distributed commits also done in parallel across shards? > > Peter > > > On Tue, Apr 15, 2014 at 3:50 PM, Mark Miller wrote: > >> Inline responses below. >> -- >> Mark Miller >> about.me/markrmiller >> >> On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com) >> wrote: >> >> I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 >> ZKs. The Solr indexes are behind a load balancer. There is one >> CloudSolrServer client updating the indexes. The index schema includes 3 >> ExternalFileFields. When the CloudSolrServer client issues a hard commit, >> I >> observe that the commits occur sequentially, not in parallel, on the >> leader >> and replica. The duration of each commit is about a minute. Most of this >> time is spent reloading the 3 ExternalFileField files. Because of the >> sequential commits, there is a period of time (1 minute+) when the index >> searchers will return different results, which can cause a bad user >> experience. This will get worse as replicas are added to handle >> auto-scaling. The goal is to keep all replicas in sync w.r.t. the user >> queries. >> >> My questions: >> >> 1. Is there a reason that the distributed commits are done in sequence, >> not >> in parallel? Is there a way to change this behavior? >> >> >> The reason is that updates are currently done this way - it’s the only >> safe way to do it without solving some more problems. I don’t think you can >> easily change this. I think we should probably file a JIRA issue to track a >> better solution for commit handling. I think there are some complications >> because of how commits can be added on update requests, but its something >> we probably want to try and solve before tackling *all* updates to replicas >> in parallel with the leader. >> >> >> >> 2. If instead, the commits were done in parallel by a separate client via >> a >> GET to each Solr instance, how would this client get the host/port values >> for each Solr instance from zookeeper? Are there any downsides to doing >> commits this way? >> >> Not really, other than the extra management. >> >> >> >> >> >> Thanks, >> Peter >> > >
Re: Distributed commits in CloudSolrServer
Are distributed commits also done in parallel across shards? Peter On Tue, Apr 15, 2014 at 3:50 PM, Mark Miller wrote: > Inline responses below. > -- > Mark Miller > about.me/markrmiller > > On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com) > wrote: > > I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 > ZKs. The Solr indexes are behind a load balancer. There is one > CloudSolrServer client updating the indexes. The index schema includes 3 > ExternalFileFields. When the CloudSolrServer client issues a hard commit, > I > observe that the commits occur sequentially, not in parallel, on the > leader > and replica. The duration of each commit is about a minute. Most of this > time is spent reloading the 3 ExternalFileField files. Because of the > sequential commits, there is a period of time (1 minute+) when the index > searchers will return different results, which can cause a bad user > experience. This will get worse as replicas are added to handle > auto-scaling. The goal is to keep all replicas in sync w.r.t. the user > queries. > > My questions: > > 1. Is there a reason that the distributed commits are done in sequence, > not > in parallel? Is there a way to change this behavior? > > > The reason is that updates are currently done this way - it’s the only > safe way to do it without solving some more problems. I don’t think you can > easily change this. I think we should probably file a JIRA issue to track a > better solution for commit handling. I think there are some complications > because of how commits can be added on update requests, but its something > we probably want to try and solve before tackling *all* updates to replicas > in parallel with the leader. > > > > 2. If instead, the commits were done in parallel by a separate client via > a > GET to each Solr instance, how would this client get the host/port values > for each Solr instance from zookeeper? Are there any downsides to doing > commits this way? > > Not really, other than the extra management. > > > > > > Thanks, > Peter >
Re: Distributed commits in CloudSolrServer
Inline responses below. -- Mark Miller about.me/markrmiller On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com) wrote: I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 ZKs. The Solr indexes are behind a load balancer. There is one CloudSolrServer client updating the indexes. The index schema includes 3 ExternalFileFields. When the CloudSolrServer client issues a hard commit, I observe that the commits occur sequentially, not in parallel, on the leader and replica. The duration of each commit is about a minute. Most of this time is spent reloading the 3 ExternalFileField files. Because of the sequential commits, there is a period of time (1 minute+) when the index searchers will return different results, which can cause a bad user experience. This will get worse as replicas are added to handle auto-scaling. The goal is to keep all replicas in sync w.r.t. the user queries. My questions: 1. Is there a reason that the distributed commits are done in sequence, not in parallel? Is there a way to change this behavior? The reason is that updates are currently done this way - it’s the only safe way to do it without solving some more problems. I don’t think you can easily change this. I think we should probably file a JIRA issue to track a better solution for commit handling. I think there are some complications because of how commits can be added on update requests, but its something we probably want to try and solve before tackling *all* updates to replicas in parallel with the leader. 2. If instead, the commits were done in parallel by a separate client via a GET to each Solr instance, how would this client get the host/port values for each Solr instance from zookeeper? Are there any downsides to doing commits this way? Not really, other than the extra management. Thanks, Peter
Distributed commits in CloudSolrServer
I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 ZKs. The Solr indexes are behind a load balancer. There is one CloudSolrServer client updating the indexes. The index schema includes 3 ExternalFileFields. When the CloudSolrServer client issues a hard commit, I observe that the commits occur sequentially, not in parallel, on the leader and replica. The duration of each commit is about a minute. Most of this time is spent reloading the 3 ExternalFileField files. Because of the sequential commits, there is a period of time (1 minute+) when the index searchers will return different results, which can cause a bad user experience. This will get worse as replicas are added to handle auto-scaling. The goal is to keep all replicas in sync w.r.t. the user queries. My questions: 1. Is there a reason that the distributed commits are done in sequence, not in parallel? Is there a way to change this behavior? 2. If instead, the commits were done in parallel by a separate client via a GET to each Solr instance, how would this client get the host/port values for each Solr instance from zookeeper? Are there any downsides to doing commits this way? Thanks, Peter