Hi Ben,

On 13.04.2017 20:53, Ben Pfaff wrote:
On Wed, Apr 12, 2017 at 06:09:28PM +0500, Valentine Sinitsyn wrote:
Hi,

On 04.04.2017 15:29, Valentine Sinitsyn wrote:
On 03.04.2017 20:29, Valentine Sinitsyn wrote:
Hi Ben,

On 23.03.2017 08:11, Ben Pfaff wrote:
Hello everyone.  I am not sure whether I am going to be able to attend
the OVN meeting tomorrow, because I will be in another possibly
distracting meeting, so I'm going to give my report here.

Toward the end of last week I did a full pass of reviews through
patchwork.  The most notable result, I think, is that I applied patches
that add 802.1ad support.  For OVN, this makes it more reasonable to
consider adding support for tagged logical ports--currently, OVN drops
all tagged logical packets--which I've heard requested once or twice,
because it means that they can now be gatewayed to physical ports within
an outer VLAN.  I don't have any plans to work on that, but I think that
it is worth pointing out.

The OVS "Open Source Day" talks have been scheduled at OpenStack
Boston.  They are all on Wednesday:
https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135

I've been spending what dev time I have on database clustering.  Today,
I managed to get it working, with many caveats.  It will take weeks or
months longer to get it finished, tested, and ready for posting.  (If
you want what I have, check out the raft3 branch in my ovs-reviews repo
at github.)
I've checked out your raft3 branch, and even learned how to create an
OVSDB cluster. Thanks for the docs!

What I don't get though is how do I instruct IDL to connect to the
cluster now? Do I just connect to a random server, or there should be
some dispatcher, or whatever?
OK I see this is an ongoing work in your branch.

I had some time to play with raft3 branch last week.

I added very basic and hacky replica set support to IDL and brought up an
OVN setup with clustered southbound database. It works to some extent, yet
if I try to throw several hundreds of logical ports into the mix, the
database becomes inconsistent. The reason is probably the race window
between when the raft leader appends a log entry to other nodes (so a client
such as ovn-northd already sees it) and the entry really appears in the
leader's log itself. Not sure if it is my bug or not. The original code had
some minor issues as well (which is absolutely normal for WIP) - I can send
my (rather trivial) patches if there is any interest.

I'm not surprised that there are inconsistency bugs.  The testing I've
done so far is really sketchy.  Let me assure you that I will implement
much more thorough testing before I will propose anything to be merged.
Sure, I didn't expect it to be bug free either.


Is there some design outline for the missing implementation bits?
Specifically, it would be good to know the following:

1. With clustered OVSDB, a client such as IDL needs two JSON RPC
connections: to the leader (to commit transactions), and a read-only one to
an arbitrary replica set (scaling reads). Will it be implemented on
ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems
natural yet multiple remotes support went to jsonrpc_session already.

There are multiple possible approaches here.  The one that I am planning
to try out first is to have a client connect to only one randomly
selected server, and then have that server be responsible for relaying
write transactions to the leader.
Yes, this is an option. However, our tests suggest that ovsdb-server doesn't scale well with respect to (hundreds to thousands) connections. This relay approach adds at most one new connection within the cluster per new client connection, which could be a bottleneck.

Thanks,
Valentine


2. How does the client know which replica set member is currently a leader?
I just loop over remotes until one accepts the transaction (which is an
awful idea). It would be nice to send some sort of cluster metadata snapshot
to JSON RPC client during initial handshake. Alternatively, one can extend
the "not leader" error object with a leader URL.

If we do adopt the idea that followers relay write transactions to the
leader, then the client doesn't need to know the leader.  But if that
isn't practical, then the Raft thesis, section 6.2, suggests the same
idea as you did, of having the follower point to the leader if it knows
it.

3. For eventual consistency reasons, if an IDL reads from one member (A) but
writes to another one (B), it can try to delete a row not yet in A's
database. This would make all further requests fail with "inconsistent data"
error and basically is what I observe in my tests. How do you plan to
overcome this?

This sounds like a bug in the existing code (not too surprising).  What
is supposed to happen is that the client waits until it receives updated
data from the server, which it knows will eventually arrive because it
knows that its write was against an inconsistent copy.  Then, it
recomposes its change against the updated database and sends a new
transaction.  This is similar to what the clients already do when their
transactions fail because another client has simultaneously made a
conflicting change, so it should not be difficult for the clients.

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to