Re: SOLR Cloud - Disable Transaction Logs

2013-06-19 Thread Erick Erickson
Right, NRT is not tied to cloud, but it is tied to the update log.

And you bring up an interesting issue when you talk about avilibility zones.
SolrCloud is fairly chatty in that all of the nodes need to talk to all the
other nodes in the network and they will. If the nodes are separated by
an expensive connection (however you measure expensive, latency
or cost to use or) then this may well be a bottleneck. For instance,
the leader needs to talk to every one of its followers for an update. Imagine
a leader in zone1 and all 15 replicas in zone2. Now the expensive pipe
will be used 15 times to send the update.

Same for queries, there's an internal software load balancer that sends
queries to one node in each shard with no control over what zone it's
in.

The same argument applies to separate physical data centers FWIW.

We're largely speculating that this may lead to bottlenecks, but it's
something to keep in mind. There are thoughts about making SolrCloud
rack aware in a way that will ameliorate this, but nobody has had
time to work on this yet.

We'd _love_ to hear about any real-life experience in this area!

Best
Erick

On Tue, Jun 18, 2013 at 4:37 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 Erick,

 We at AOL mail have been using SOLR for quiet a while and our system is 
 pretty write heavy and disk I/O is one of our bottlenecks. At present we use 
 regular SOLR in the lotsOfCore configuration and I am in  the process of 
 benchmarking SOLR cloud for our use case. I don't have concrete data that 
 tLogs are placing lot of load on the system, but for a large scale system 
 like ours even minimal load gets magnified.


 From the Cloud design, for a properly set up cluster, usually you have 
 replicas at different availability zones . Probablity of losing more than 1 
 availability zone at any given time should be pretty low. Why have tLogs if 
 all replicas on an update get the request anyway, In theory 1 replica must be 
 able to commit eventually.

 NRT is an optional feature and probably not tied to Cloud, correct?


 Thanks,

 Rishi.







 -Original Message-
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, Jun 18, 2013 4:07 pm
 Subject: Re: SOLR Cloud - Disable Transaction Logs


 bq: the replica can take over and maintain a durable
 state of my index

 This is not true. On an update, all the nodes in a slice
 have already written the data to the tlog, not just the
 leader. So if a leader goes down, the replicas have
 enough local info to insure that data is not lost. Without
 tlogs this would not be true since documents are not
 durably saved until a hard commit.

 tlogs save data between hard commits. As Yonik
 explained to me once, soft commits are about
 visibility, hard commits are about durability and
 tlogs fill up the gap between hard commits.

 So to reinforce Shalin's comment yes, you can disable tlogs
 if
 1 you don't want any of SolrCloud's HA/DR capabilities
 2 NRT is unimportant

 IOW if you're using 4.x just like you would 3.x in terms
 of replication, HA/DR, etc. This is perfectly reasonable,
 but don't get hung up on disabling tlogs.

 And you haven't told us _why_ you want to do this. They
 don't consume much memory or disk space unless you
 have configured your hard commits (with openSearcher
 true or false) to be quite long. Do you have any proof at
 all that the tlogs are placing enough load on the system
 to go down this road?

 Best
 Erick

 On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran rishi.easwa...@aol.com 
 wrote:
 SolrJ already has access to zookeeper cluster state. Network I/O bottleneck
 can be avoided by parallel requests.
 You are only as slow as your slowest responding server, which could be your
 single leader with the current set up.

 Wouldn't this lessen the burden of the leader, as he does not have to 
 maintain
 transaction logs or distribute to replicas?







 -Original Message-
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, Jun 18, 2013 2:05 am
 Subject: Re: SOLR Cloud - Disable Transaction Logs


 Yes, but at what cost? You are thinking of replacing disk IO with even more
 slower network IO. The transaction log is a append-only log -- it is not
 pretty cheap especially so if you compare it with the indexing process.
 Plus your write request/sec will drop a lot once you start doing
 synchronous replication.


 On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran 
 rishi.easwa...@aol.comwrote:

 Shalin,

 Just some thoughts.

 Near Real time replication- don't we use solrCmdDistributor, which send
 requests immediately to replicas with a clonedRequest, as an option can't
 we achieve something similar form CloudSolrserver in Solrj instead of
 leader doing it. As long as 2 nodes receive writes and acknowledge.
 durability should be high.
 Peer-Sync and Recovery - Can we achieve that merging indexes from leader
 as needed

Re: SOLR Cloud - Disable Transaction Logs

2013-06-18 Thread Shalin Shekhar Mangar
Yes, but at what cost? You are thinking of replacing disk IO with even more
slower network IO. The transaction log is a append-only log -- it is not
pretty cheap especially so if you compare it with the indexing process.
Plus your write request/sec will drop a lot once you start doing
synchronous replication.


On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote:

 Shalin,

 Just some thoughts.

 Near Real time replication- don't we use solrCmdDistributor, which send
 requests immediately to replicas with a clonedRequest, as an option can't
 we achieve something similar form CloudSolrserver in Solrj instead of
 leader doing it. As long as 2 nodes receive writes and acknowledge.
 durability should be high.
 Peer-Sync and Recovery - Can we achieve that merging indexes from leader
 as needed, instead of replaying the transaction logs?

 Rishi.







 -Original Message-
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jun 17, 2013 3:43 pm
 Subject: Re: SOLR Cloud - Disable Transaction Logs


 It is also necessary for near real-time replication, peer sync and
 recovery.


 On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.com
 wrote:

  Hi,
 
  Is there a way to disable transaction logs in SOLR cloud. As far as I can
  tell no.
  Just curious why do we need transaction logs, seems like an I/O intensive
  operation.
  As long as I have replicatonFactor 1, if a node (leader) goes down, the
  replica can take over and maintain a durable state of my index.
 
  I understand from the previous discussions, that it was intended for
  update durability and realtime get.
  But, unless I am missing something an ability to disable it in SOLR cloud
  if not needed would be good.
 
  Thanks,
 
  Rishi.
 
 


 --
 Regards,
 Shalin Shekhar Mangar.





-- 
Regards,
Shalin Shekhar Mangar.


Re: SOLR Cloud - Disable Transaction Logs

2013-06-18 Thread Rishi Easwaran
SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can 
be avoided by parallel requests. 
You are only as slow as your slowest responding server, which could be your 
single leader with the current set up.

Wouldn't this lessen the burden of the leader, as he does not have to maintain 
transaction logs or distribute to replicas? 

 

 

 

-Original Message-
From: Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Jun 18, 2013 2:05 am
Subject: Re: SOLR Cloud - Disable Transaction Logs


Yes, but at what cost? You are thinking of replacing disk IO with even more
slower network IO. The transaction log is a append-only log -- it is not
pretty cheap especially so if you compare it with the indexing process.
Plus your write request/sec will drop a lot once you start doing
synchronous replication.


On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote:

 Shalin,

 Just some thoughts.

 Near Real time replication- don't we use solrCmdDistributor, which send
 requests immediately to replicas with a clonedRequest, as an option can't
 we achieve something similar form CloudSolrserver in Solrj instead of
 leader doing it. As long as 2 nodes receive writes and acknowledge.
 durability should be high.
 Peer-Sync and Recovery - Can we achieve that merging indexes from leader
 as needed, instead of replaying the transaction logs?

 Rishi.







 -Original Message-
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jun 17, 2013 3:43 pm
 Subject: Re: SOLR Cloud - Disable Transaction Logs


 It is also necessary for near real-time replication, peer sync and
 recovery.


 On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.com
 wrote:

  Hi,
 
  Is there a way to disable transaction logs in SOLR cloud. As far as I can
  tell no.
  Just curious why do we need transaction logs, seems like an I/O intensive
  operation.
  As long as I have replicatonFactor 1, if a node (leader) goes down, the
  replica can take over and maintain a durable state of my index.
 
  I understand from the previous discussions, that it was intended for
  update durability and realtime get.
  But, unless I am missing something an ability to disable it in SOLR cloud
  if not needed would be good.
 
  Thanks,
 
  Rishi.
 
 


 --
 Regards,
 Shalin Shekhar Mangar.





-- 
Regards,
Shalin Shekhar Mangar.

 


Re: SOLR Cloud - Disable Transaction Logs

2013-06-18 Thread Erick Erickson
bq: the replica can take over and maintain a durable
state of my index

This is not true. On an update, all the nodes in a slice
have already written the data to the tlog, not just the
leader. So if a leader goes down, the replicas have
enough local info to insure that data is not lost. Without
tlogs this would not be true since documents are not
durably saved until a hard commit.

tlogs save data between hard commits. As Yonik
explained to me once, soft commits are about
visibility, hard commits are about durability and
tlogs fill up the gap between hard commits.

So to reinforce Shalin's comment yes, you can disable tlogs
if
1 you don't want any of SolrCloud's HA/DR capabilities
2 NRT is unimportant

IOW if you're using 4.x just like you would 3.x in terms
of replication, HA/DR, etc. This is perfectly reasonable,
but don't get hung up on disabling tlogs.

And you haven't told us _why_ you want to do this. They
don't consume much memory or disk space unless you
have configured your hard commits (with openSearcher
true or false) to be quite long. Do you have any proof at
all that the tlogs are placing enough load on the system
to go down this road?

Best
Erick

On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran rishi.easwa...@aol.com wrote:
 SolrJ already has access to zookeeper cluster state. Network I/O bottleneck 
 can be avoided by parallel requests.
 You are only as slow as your slowest responding server, which could be your 
 single leader with the current set up.

 Wouldn't this lessen the burden of the leader, as he does not have to 
 maintain transaction logs or distribute to replicas?







 -Original Message-
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, Jun 18, 2013 2:05 am
 Subject: Re: SOLR Cloud - Disable Transaction Logs


 Yes, but at what cost? You are thinking of replacing disk IO with even more
 slower network IO. The transaction log is a append-only log -- it is not
 pretty cheap especially so if you compare it with the indexing process.
 Plus your write request/sec will drop a lot once you start doing
 synchronous replication.


 On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote:

 Shalin,

 Just some thoughts.

 Near Real time replication- don't we use solrCmdDistributor, which send
 requests immediately to replicas with a clonedRequest, as an option can't
 we achieve something similar form CloudSolrserver in Solrj instead of
 leader doing it. As long as 2 nodes receive writes and acknowledge.
 durability should be high.
 Peer-Sync and Recovery - Can we achieve that merging indexes from leader
 as needed, instead of replaying the transaction logs?

 Rishi.







 -Original Message-
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jun 17, 2013 3:43 pm
 Subject: Re: SOLR Cloud - Disable Transaction Logs


 It is also necessary for near real-time replication, peer sync and
 recovery.


 On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.com
 wrote:

  Hi,
 
  Is there a way to disable transaction logs in SOLR cloud. As far as I can
  tell no.
  Just curious why do we need transaction logs, seems like an I/O intensive
  operation.
  As long as I have replicatonFactor 1, if a node (leader) goes down, the
  replica can take over and maintain a durable state of my index.
 
  I understand from the previous discussions, that it was intended for
  update durability and realtime get.
  But, unless I am missing something an ability to disable it in SOLR cloud
  if not needed would be good.
 
  Thanks,
 
  Rishi.
 
 


 --
 Regards,
 Shalin Shekhar Mangar.





 --
 Regards,
 Shalin Shekhar Mangar.




Re: SOLR Cloud - Disable Transaction Logs

2013-06-18 Thread Rishi Easwaran

Erick,

We at AOL mail have been using SOLR for quiet a while and our system is pretty 
write heavy and disk I/O is one of our bottlenecks. At present we use regular 
SOLR in the lotsOfCore configuration and I am in  the process of benchmarking 
SOLR cloud for our use case. I don't have concrete data that tLogs are placing 
lot of load on the system, but for a large scale system like ours even minimal 
load gets magnified. 


From the Cloud design, for a properly set up cluster, usually you have 
replicas at different availability zones . Probablity of losing more than 1 
availability zone at any given time should be pretty low. Why have tLogs if 
all replicas on an update get the request anyway, In theory 1 replica must be 
able to commit eventually.

NRT is an optional feature and probably not tied to Cloud, correct?


Thanks,

Rishi.



 

 

-Original Message-
From: Erick Erickson erickerick...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Jun 18, 2013 4:07 pm
Subject: Re: SOLR Cloud - Disable Transaction Logs


bq: the replica can take over and maintain a durable
state of my index

This is not true. On an update, all the nodes in a slice
have already written the data to the tlog, not just the
leader. So if a leader goes down, the replicas have
enough local info to insure that data is not lost. Without
tlogs this would not be true since documents are not
durably saved until a hard commit.

tlogs save data between hard commits. As Yonik
explained to me once, soft commits are about
visibility, hard commits are about durability and
tlogs fill up the gap between hard commits.

So to reinforce Shalin's comment yes, you can disable tlogs
if
1 you don't want any of SolrCloud's HA/DR capabilities
2 NRT is unimportant

IOW if you're using 4.x just like you would 3.x in terms
of replication, HA/DR, etc. This is perfectly reasonable,
but don't get hung up on disabling tlogs.

And you haven't told us _why_ you want to do this. They
don't consume much memory or disk space unless you
have configured your hard commits (with openSearcher
true or false) to be quite long. Do you have any proof at
all that the tlogs are placing enough load on the system
to go down this road?

Best
Erick

On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran rishi.easwa...@aol.com wrote:
 SolrJ already has access to zookeeper cluster state. Network I/O bottleneck 
can be avoided by parallel requests.
 You are only as slow as your slowest responding server, which could be your 
single leader with the current set up.

 Wouldn't this lessen the burden of the leader, as he does not have to 
 maintain 
transaction logs or distribute to replicas?







 -Original Message-
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, Jun 18, 2013 2:05 am
 Subject: Re: SOLR Cloud - Disable Transaction Logs


 Yes, but at what cost? You are thinking of replacing disk IO with even more
 slower network IO. The transaction log is a append-only log -- it is not
 pretty cheap especially so if you compare it with the indexing process.
 Plus your write request/sec will drop a lot once you start doing
 synchronous replication.


 On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran rishi.easwa...@aol.comwrote:

 Shalin,

 Just some thoughts.

 Near Real time replication- don't we use solrCmdDistributor, which send
 requests immediately to replicas with a clonedRequest, as an option can't
 we achieve something similar form CloudSolrserver in Solrj instead of
 leader doing it. As long as 2 nodes receive writes and acknowledge.
 durability should be high.
 Peer-Sync and Recovery - Can we achieve that merging indexes from leader
 as needed, instead of replaying the transaction logs?

 Rishi.







 -Original Message-
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jun 17, 2013 3:43 pm
 Subject: Re: SOLR Cloud - Disable Transaction Logs


 It is also necessary for near real-time replication, peer sync and
 recovery.


 On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.com
 wrote:

  Hi,
 
  Is there a way to disable transaction logs in SOLR cloud. As far as I can
  tell no.
  Just curious why do we need transaction logs, seems like an I/O intensive
  operation.
  As long as I have replicatonFactor 1, if a node (leader) goes down, the
  replica can take over and maintain a durable state of my index.
 
  I understand from the previous discussions, that it was intended for
  update durability and realtime get.
  But, unless I am missing something an ability to disable it in SOLR cloud
  if not needed would be good.
 
  Thanks,
 
  Rishi.
 
 


 --
 Regards,
 Shalin Shekhar Mangar.





 --
 Regards,
 Shalin Shekhar Mangar.



 



SOLR Cloud - Disable Transaction Logs

2013-06-17 Thread Rishi Easwaran
Hi,

Is there a way to disable transaction logs in SOLR cloud. As far as I can tell 
no.
Just curious why do we need transaction logs, seems like an I/O intensive 
operation.
As long as I have replicatonFactor 1, if a node (leader) goes down, the 
replica can take over and maintain a durable state of my index.

I understand from the previous discussions, that it was intended for update 
durability and realtime get.
But, unless I am missing something an ability to disable it in SOLR cloud if 
not needed would be good.

Thanks,

Rishi.  



Re: SOLR Cloud - Disable Transaction Logs

2013-06-17 Thread Shalin Shekhar Mangar
It is also necessary for near real-time replication, peer sync and recovery.


On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.comwrote:

 Hi,

 Is there a way to disable transaction logs in SOLR cloud. As far as I can
 tell no.
 Just curious why do we need transaction logs, seems like an I/O intensive
 operation.
 As long as I have replicatonFactor 1, if a node (leader) goes down, the
 replica can take over and maintain a durable state of my index.

 I understand from the previous discussions, that it was intended for
 update durability and realtime get.
 But, unless I am missing something an ability to disable it in SOLR cloud
 if not needed would be good.

 Thanks,

 Rishi.




-- 
Regards,
Shalin Shekhar Mangar.


Re: SOLR Cloud - Disable Transaction Logs

2013-06-17 Thread Rishi Easwaran
Shalin,
 
Just some thoughts.

Near Real time replication- don't we use solrCmdDistributor, which send 
requests immediately to replicas with a clonedRequest, as an option can't we 
achieve something similar form CloudSolrserver in Solrj instead of leader doing 
it. As long as 2 nodes receive writes and acknowledge. durability should be 
high.
Peer-Sync and Recovery - Can we achieve that merging indexes from leader as 
needed, instead of replaying the transaction logs?

Rishi.

 

 

 

-Original Message-
From: Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jun 17, 2013 3:43 pm
Subject: Re: SOLR Cloud - Disable Transaction Logs


It is also necessary for near real-time replication, peer sync and recovery.


On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran rishi.easwa...@aol.comwrote:

 Hi,

 Is there a way to disable transaction logs in SOLR cloud. As far as I can
 tell no.
 Just curious why do we need transaction logs, seems like an I/O intensive
 operation.
 As long as I have replicatonFactor 1, if a node (leader) goes down, the
 replica can take over and maintain a durable state of my index.

 I understand from the previous discussions, that it was intended for
 update durability and realtime get.
 But, unless I am missing something an ability to disable it in SOLR cloud
 if not needed would be good.

 Thanks,

 Rishi.




-- 
Regards,
Shalin Shekhar Mangar.