Re: [ovs-dev] [PATCH 1/4] docs: OVSDB replication design document

2016-04-18 Thread Marcelo E. Magallon
Hi,

 sorry about the delay in responding. I was actually catching up with
 emails on the mailing list to try to gauge if we are indeed trying to
 accomplish the same thing or not.

On Mon, Apr 11, 2016 at 03:44:09PM -0700, Ben Pfaff wrote:
> On Fri, Apr 01, 2016 at 10:52:26AM -0700, Ben Pfaff wrote:
> > I don't think it makes sense to stack replication and Raft-based HA.
> > 
> > Thinking about OpenSwitch, I guess that your use case is something
> > like this: an OpenSwitch instance maintains, on-box, an
> > authoritative database instance, and then the replication feature
> > allows that database's content to be backed up somewhere else.  I
> > see how that differs from the use case for Raft, where there is no
> > preference for a particular database server to be authoritative.
> > What I don't see yet is how or why that's useful.  What is the use
> > case?
> 
> In case it wasn't clear, I didn't mean my message above to sound like
> a "no, we won't take this".  Instead, I'm trying to understand the use
> case better.  Perhaps there is room for both replication and HA in
> OVSDB, but before I say "yes" to that, I want to understand both
> cases.

 Yes, that's totally fair.

 We do not have a need for only 1+1 redundancy. We have a need in which
 we have to remain operational with less than a quantum of instances in
 operation, which raft can’t do unless you introduce modifications to
 the algorithm (e.g. etcd or consul, I can't remember which one
 exactly).

 Also, raft assumes that everybody's vote is equal. If you’re treating
 multiple instances of OVS as one large virtual switch, you are not
 running a separate version of OSPF on each instance, each feeding its
 own version of the routing table into the database.  You have one OSPF
 instance on a "stack commander" feeding the entire routing table into
 the database. This is the "correct" state, no matter how many raft
 members have voted on it. We grow to more than 2 members by setting up
 multiple one way replications, all originating from the "commander". In
 future patches, we will also implement two way replication so that the
 member can write to his local database to reflect state that the
 commander cannot know about (like port state) ... until that happens
 daemons on a "member" can connect directly to the commander's OVSDB
 instance and update the commander's state directly.

 This work is done in the conetxt of OpenSwitch (http://openswitch.net/,
 probaly http://openswitch.net/documents/user/architecture is more
 relevant to this discussion).  With the proposed patch we can have two
 OVSDB instances each running on a TOR switch. One of the switches is
 active and the other is a stand-by. The stand-by instance is constantly
 replicating the active one. In case of a failure in the active, the
 stand-by can take over and the control plane can be rebuilt from the
 state stored in the database.

 I don't think the two approaches are in conflict with each other,
 actually the complement each other. What I'm trying to figure out is
 where they overlap (from a code point of view).

 Marcelo
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/4] docs: OVSDB replication design document

2016-03-30 Thread Marcelo E. Magallon

Hi Ben,

On 03/30/2016 05:27 PM, Ben Pfaff wrote:

I'm in the midst of implementing high availability for OVSDB, based on
the Raft algorithm.  When I'm done, it should be possible to set up
OVSDB clusters with automatic failover.  Is this the same use case as
your code?


No, in this case the replication works by way of making the second 
server an OVSDB client, so it gets notified about all the changes in the 
remote server.


In your case, how many OVSDB servers are required for replication to 
work? Raft has a quorum requirement of (N/2)+1 servers to be available 
in order for consensus to be reached. That makes the minimum number of 
servers 3, if you want to allow one to become unavailable. If you have 2 
servers and one goes down, there's no quorum.


The patch Mario is proposing has failover characteristics, not 
load-distribution characteristics: if one of the servers goes down, the 
other one can take over because it has a copy of the data up to the last 
notification the server was able to sent. Also, in the proposed patch 
transactions are not delayed until consensus is reached, but instead 
clients are talking only to the active server, which is applying 
transactions immediately. The active server is notifying the standby 
about the new data just like it's doing with any other client.


Marcelo

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/4] docs: OVSDB replication design document

2016-03-31 Thread Marcelo E. Magallon

Hi Ben,

On 03/30/2016 06:13 PM, Ben Pfaff wrote:

I understand the technical differences between the approaches. My 
question is whether high availability is your actual goal. If it is, 
then it probably does not make sense to have multiple implementations. 
If you are trying to accomplish something else, then it could be that 
there is something complementary about the two implementations. 


I believe the two approaches are complementary.

Like I said, the proposed patch aims at having a stand by database 
available, but since there's no proxy or anything like that, if the 
active database goes down, clients would have to reconnect. Ideally, 
after failover, the stand-by database becomes the active and clients can 
reuse the same connection parameters, but a reconnection must happen. If 
someone is interested in filling that gap, haproxy is an option, but I 
have not yet tested it. Same thing applies to a Raft-based solution.


What you are doing with Raft is complementary in the sense that you can 
have six database servers, expose three of them to the clients and the 
other three become stand-bys for the three active ones. If any of the 
three actives go down, the corresponding stand-by steps up. With Raft, 
with 6 active databases, you can loose 2 (4 are needed for consensus). 
With this approach you only have 3 databases in service, but you can 
loose all three. Obviously you can come up with other topologies like 
3+1, or 5+1, etc. The proposed patch is the "+" part in that design.


Marcelo

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev