Re: SOLR Cloud - Disable Transaction Logs
Right, NRT is not tied to cloud, but it is tied to the update log. And you bring up an interesting issue when you talk about avilibility zones. SolrCloud is fairly chatty in that all of the nodes need to talk to all the other nodes in the network and they will. If the nodes are separated by an expensive connection (however you measure expensive, latency or cost to use or) then this may well be a bottleneck. For instance, the leader needs to talk to every one of its followers for an update. Imagine a leader in zone1 and all 15 replicas in zone2. Now the expensive pipe will be used 15 times to send the update. Same for queries, there's an internal software load balancer that sends queries to one node in each shard with no control over what zone it's in. The same argument applies to separate physical data centers FWIW. We're largely speculating that this may lead to bottlenecks, but it's something to keep in mind. There are thoughts about making SolrCloud rack aware in a way that will ameliorate this, but nobody has had time to work on this yet. We'd _love_ to hear about any real-life experience in this area! Best Erick On Tue, Jun 18, 2013 at 4:37 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Erick, We at AOL mail have been using SOLR for quiet a while and our system is pretty write heavy and disk I/O is one of our bottlenecks. At present we use regular SOLR in the lotsOfCore configuration and I am in the process of benchmarking SOLR cloud for our use case. I don't have concrete data that tLogs are placing lot of load on the system, but for a large scale system like ours even minimal load gets magnified. From the Cloud design, for a properly set up cluster, usually you have replicas at different availability zones . Probablity of losing more than 1 availability zone at any given time should be pretty low. Why have tLogs if all replicas on an update get the request anyway, In theory 1 replica must be able to commit eventually. NRT is an optional feature and probably not tied to Cloud, correct? Thanks, Rishi. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 18, 2013 4:07 pm Subject: Re: SOLR Cloud - Disable Transaction Logs bq: the replica can take over and maintain a durable state of my index This is not true. On an update, all the nodes in a slice have already written the data to the tlog, not just the leader. So if a leader goes down, the replicas have enough local info to insure that data is not lost. Without tlogs this would not be true since documents are not durably saved until a hard commit. tlogs save data between hard commits. As Yonik explained to me once, soft commits are about visibility, hard commits are about durability and tlogs fill up the gap between hard commits. So to reinforce Shalin's comment yes, you can disable tlogs if 1 you don't want any of SolrCloud's HA/DR capabilities 2 NRT is unimportant IOW if you're using 4.x just like you would 3.x in terms of replication, HA/DR, etc. This is perfectly reasonable, but don't get hung up on disabling tlogs. And you haven't told us _why_ you want to do this. They don't consume much memory or disk space unless you have configured your hard commits (with openSearcher true or false) to be quite long. Do you have any proof at all that the tlogs are placing enough load on the system to go down this road? Best Erick On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran rishi.easwa...@aol.com wrote: SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can be avoided by parallel requests. You are only as slow as your slowest responding server, which could be your single leader with the current set up. Wouldn't this lessen the burden of the leader, as he does not have to maintain transaction logs or distribute to replicas? -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 18, 2013 2:05 am Subject: Re: SOLR Cloud - Disable Transaction Logs Yes, but at what cost? You are thinking of replacing disk IO with even more slower network IO. The transaction log is a append-only log -- it is not pretty cheap especially so if you compare it with the indexing process. Plus your write request/sec will drop a lot once you start doing synchronous replication. On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote: Shalin, Just some thoughts. Near Real time replication- don't we use solrCmdDistributor, which send requests immediately to replicas with a clonedRequest, as an option can't we achieve something similar form CloudSolrserver in Solrj instead of leader doing it. As long as 2 nodes receive writes and acknowledge. durability should be high. Peer-Sync and Recovery - Can we achieve that merging indexes from leader as needed
Re: SOLR Cloud - Disable Transaction Logs
Yes, but at what cost? You are thinking of replacing disk IO with even more slower network IO. The transaction log is a append-only log -- it is not pretty cheap especially so if you compare it with the indexing process. Plus your write request/sec will drop a lot once you start doing synchronous replication. On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote: Shalin, Just some thoughts. Near Real time replication- don't we use solrCmdDistributor, which send requests immediately to replicas with a clonedRequest, as an option can't we achieve something similar form CloudSolrserver in Solrj instead of leader doing it. As long as 2 nodes receive writes and acknowledge. durability should be high. Peer-Sync and Recovery - Can we achieve that merging indexes from leader as needed, instead of replaying the transaction logs? Rishi. -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 3:43 pm Subject: Re: SOLR Cloud - Disable Transaction Logs It is also necessary for near real-time replication, peer sync and recovery. On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi, Is there a way to disable transaction logs in SOLR cloud. As far as I can tell no. Just curious why do we need transaction logs, seems like an I/O intensive operation. As long as I have replicatonFactor 1, if a node (leader) goes down, the replica can take over and maintain a durable state of my index. I understand from the previous discussions, that it was intended for update durability and realtime get. But, unless I am missing something an ability to disable it in SOLR cloud if not needed would be good. Thanks, Rishi. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: SOLR Cloud - Disable Transaction Logs
SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can be avoided by parallel requests. You are only as slow as your slowest responding server, which could be your single leader with the current set up. Wouldn't this lessen the burden of the leader, as he does not have to maintain transaction logs or distribute to replicas? -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 18, 2013 2:05 am Subject: Re: SOLR Cloud - Disable Transaction Logs Yes, but at what cost? You are thinking of replacing disk IO with even more slower network IO. The transaction log is a append-only log -- it is not pretty cheap especially so if you compare it with the indexing process. Plus your write request/sec will drop a lot once you start doing synchronous replication. On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote: Shalin, Just some thoughts. Near Real time replication- don't we use solrCmdDistributor, which send requests immediately to replicas with a clonedRequest, as an option can't we achieve something similar form CloudSolrserver in Solrj instead of leader doing it. As long as 2 nodes receive writes and acknowledge. durability should be high. Peer-Sync and Recovery - Can we achieve that merging indexes from leader as needed, instead of replaying the transaction logs? Rishi. -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 3:43 pm Subject: Re: SOLR Cloud - Disable Transaction Logs It is also necessary for near real-time replication, peer sync and recovery. On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi, Is there a way to disable transaction logs in SOLR cloud. As far as I can tell no. Just curious why do we need transaction logs, seems like an I/O intensive operation. As long as I have replicatonFactor 1, if a node (leader) goes down, the replica can take over and maintain a durable state of my index. I understand from the previous discussions, that it was intended for update durability and realtime get. But, unless I am missing something an ability to disable it in SOLR cloud if not needed would be good. Thanks, Rishi. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: SOLR Cloud - Disable Transaction Logs
bq: the replica can take over and maintain a durable state of my index This is not true. On an update, all the nodes in a slice have already written the data to the tlog, not just the leader. So if a leader goes down, the replicas have enough local info to insure that data is not lost. Without tlogs this would not be true since documents are not durably saved until a hard commit. tlogs save data between hard commits. As Yonik explained to me once, soft commits are about visibility, hard commits are about durability and tlogs fill up the gap between hard commits. So to reinforce Shalin's comment yes, you can disable tlogs if 1 you don't want any of SolrCloud's HA/DR capabilities 2 NRT is unimportant IOW if you're using 4.x just like you would 3.x in terms of replication, HA/DR, etc. This is perfectly reasonable, but don't get hung up on disabling tlogs. And you haven't told us _why_ you want to do this. They don't consume much memory or disk space unless you have configured your hard commits (with openSearcher true or false) to be quite long. Do you have any proof at all that the tlogs are placing enough load on the system to go down this road? Best Erick On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran rishi.easwa...@aol.com wrote: SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can be avoided by parallel requests. You are only as slow as your slowest responding server, which could be your single leader with the current set up. Wouldn't this lessen the burden of the leader, as he does not have to maintain transaction logs or distribute to replicas? -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 18, 2013 2:05 am Subject: Re: SOLR Cloud - Disable Transaction Logs Yes, but at what cost? You are thinking of replacing disk IO with even more slower network IO. The transaction log is a append-only log -- it is not pretty cheap especially so if you compare it with the indexing process. Plus your write request/sec will drop a lot once you start doing synchronous replication. On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote: Shalin, Just some thoughts. Near Real time replication- don't we use solrCmdDistributor, which send requests immediately to replicas with a clonedRequest, as an option can't we achieve something similar form CloudSolrserver in Solrj instead of leader doing it. As long as 2 nodes receive writes and acknowledge. durability should be high. Peer-Sync and Recovery - Can we achieve that merging indexes from leader as needed, instead of replaying the transaction logs? Rishi. -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 3:43 pm Subject: Re: SOLR Cloud - Disable Transaction Logs It is also necessary for near real-time replication, peer sync and recovery. On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi, Is there a way to disable transaction logs in SOLR cloud. As far as I can tell no. Just curious why do we need transaction logs, seems like an I/O intensive operation. As long as I have replicatonFactor 1, if a node (leader) goes down, the replica can take over and maintain a durable state of my index. I understand from the previous discussions, that it was intended for update durability and realtime get. But, unless I am missing something an ability to disable it in SOLR cloud if not needed would be good. Thanks, Rishi. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: SOLR Cloud - Disable Transaction Logs
Erick, We at AOL mail have been using SOLR for quiet a while and our system is pretty write heavy and disk I/O is one of our bottlenecks. At present we use regular SOLR in the lotsOfCore configuration and I am in the process of benchmarking SOLR cloud for our use case. I don't have concrete data that tLogs are placing lot of load on the system, but for a large scale system like ours even minimal load gets magnified. From the Cloud design, for a properly set up cluster, usually you have replicas at different availability zones . Probablity of losing more than 1 availability zone at any given time should be pretty low. Why have tLogs if all replicas on an update get the request anyway, In theory 1 replica must be able to commit eventually. NRT is an optional feature and probably not tied to Cloud, correct? Thanks, Rishi. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 18, 2013 4:07 pm Subject: Re: SOLR Cloud - Disable Transaction Logs bq: the replica can take over and maintain a durable state of my index This is not true. On an update, all the nodes in a slice have already written the data to the tlog, not just the leader. So if a leader goes down, the replicas have enough local info to insure that data is not lost. Without tlogs this would not be true since documents are not durably saved until a hard commit. tlogs save data between hard commits. As Yonik explained to me once, soft commits are about visibility, hard commits are about durability and tlogs fill up the gap between hard commits. So to reinforce Shalin's comment yes, you can disable tlogs if 1 you don't want any of SolrCloud's HA/DR capabilities 2 NRT is unimportant IOW if you're using 4.x just like you would 3.x in terms of replication, HA/DR, etc. This is perfectly reasonable, but don't get hung up on disabling tlogs. And you haven't told us _why_ you want to do this. They don't consume much memory or disk space unless you have configured your hard commits (with openSearcher true or false) to be quite long. Do you have any proof at all that the tlogs are placing enough load on the system to go down this road? Best Erick On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran rishi.easwa...@aol.com wrote: SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can be avoided by parallel requests. You are only as slow as your slowest responding server, which could be your single leader with the current set up. Wouldn't this lessen the burden of the leader, as he does not have to maintain transaction logs or distribute to replicas? -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 18, 2013 2:05 am Subject: Re: SOLR Cloud - Disable Transaction Logs Yes, but at what cost? You are thinking of replacing disk IO with even more slower network IO. The transaction log is a append-only log -- it is not pretty cheap especially so if you compare it with the indexing process. Plus your write request/sec will drop a lot once you start doing synchronous replication. On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote: Shalin, Just some thoughts. Near Real time replication- don't we use solrCmdDistributor, which send requests immediately to replicas with a clonedRequest, as an option can't we achieve something similar form CloudSolrserver in Solrj instead of leader doing it. As long as 2 nodes receive writes and acknowledge. durability should be high. Peer-Sync and Recovery - Can we achieve that merging indexes from leader as needed, instead of replaying the transaction logs? Rishi. -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 3:43 pm Subject: Re: SOLR Cloud - Disable Transaction Logs It is also necessary for near real-time replication, peer sync and recovery. On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi, Is there a way to disable transaction logs in SOLR cloud. As far as I can tell no. Just curious why do we need transaction logs, seems like an I/O intensive operation. As long as I have replicatonFactor 1, if a node (leader) goes down, the replica can take over and maintain a durable state of my index. I understand from the previous discussions, that it was intended for update durability and realtime get. But, unless I am missing something an ability to disable it in SOLR cloud if not needed would be good. Thanks, Rishi. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
SOLR Cloud - Disable Transaction Logs
Hi, Is there a way to disable transaction logs in SOLR cloud. As far as I can tell no. Just curious why do we need transaction logs, seems like an I/O intensive operation. As long as I have replicatonFactor 1, if a node (leader) goes down, the replica can take over and maintain a durable state of my index. I understand from the previous discussions, that it was intended for update durability and realtime get. But, unless I am missing something an ability to disable it in SOLR cloud if not needed would be good. Thanks, Rishi.
Re: SOLR Cloud - Disable Transaction Logs
It is also necessary for near real-time replication, peer sync and recovery. On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.comwrote: Hi, Is there a way to disable transaction logs in SOLR cloud. As far as I can tell no. Just curious why do we need transaction logs, seems like an I/O intensive operation. As long as I have replicatonFactor 1, if a node (leader) goes down, the replica can take over and maintain a durable state of my index. I understand from the previous discussions, that it was intended for update durability and realtime get. But, unless I am missing something an ability to disable it in SOLR cloud if not needed would be good. Thanks, Rishi. -- Regards, Shalin Shekhar Mangar.
Re: SOLR Cloud - Disable Transaction Logs
Shalin, Just some thoughts. Near Real time replication- don't we use solrCmdDistributor, which send requests immediately to replicas with a clonedRequest, as an option can't we achieve something similar form CloudSolrserver in Solrj instead of leader doing it. As long as 2 nodes receive writes and acknowledge. durability should be high. Peer-Sync and Recovery - Can we achieve that merging indexes from leader as needed, instead of replaying the transaction logs? Rishi. -Original Message- From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 3:43 pm Subject: Re: SOLR Cloud - Disable Transaction Logs It is also necessary for near real-time replication, peer sync and recovery. On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.comwrote: Hi, Is there a way to disable transaction logs in SOLR cloud. As far as I can tell no. Just curious why do we need transaction logs, seems like an I/O intensive operation. As long as I have replicatonFactor 1, if a node (leader) goes down, the replica can take over and maintain a durable state of my index. I understand from the previous discussions, that it was intended for update durability and realtime get. But, unless I am missing something an ability to disable it in SOLR cloud if not needed would be good. Thanks, Rishi. -- Regards, Shalin Shekhar Mangar.