Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ivan Kelly
> - Zookeeper.  The issues here are similar to those for etcd.
>   Also, Zookeeper transactions don't seem to be isolated.
Zookeeper transactions can be isolated depending on what level of
isolation you need.
A setData on a node operation can contain a version, so that it fails
if that node has changed since the version. This means with a multi[1]
of setData operations, you can effectively get a snapshot isolation
level of isolation. For serializable, you could probably shoehorn it
in by rewriting all nodes that you've written.

Regarding the notification issue, it sounds like what you'd want is to
access the transaction log. ZooKeeper kinda does this with observer
nodes, where the transaction log is shipped to read only nodes to
scale out reads. OVN could do something similar where by each replica
tails the transaction log directly, and applies the updates to their
local copy. Note that the transaction log isn't officially exposed
right now, but it's easy to get at, since observers already do it.

-Ivan
[1] zookeeper term for transaction
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Liran Schour
I'd like to raise the following issues for discussion:

1. That the client side is abstracted from the specific choice of 
server-side database by using a db-abstraction layer on the client side. 
We already have some kind of an abstraction layer in the code: ovsdb-idl. 
Maybe we can start from there. 
2. I think if clients receive all updates this will pose a scalability 
concern. I'd like to propose adding a pub/sub-like subsystem to be used 
for keeping clients up-to-date about updates in the DB. This can serve as 
a mechanism for table tracking and could also enable clients to receive 
updates only on a small subset of changes instead of all changes. This 
would greatly improve scalability in number of clients.
3. Do we really need an on-disk DB for the southbound DB? I think an 
in-memory DB for the Southbound is worth discussing.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Dan Mihai Dumitriu
These are great points Liran. These points are also very closely related to
one another.

I agree that the SB DB could be entirely in memory - of course, for high
availability of course it should be replicated. As a bonus, replication of
an in-memory data structure is easier than of a durable data structure -
atomic broadcast could be used. There are questions of what happens in the
extremely rare eventuality that the entire SB cluster goes down and its
state is lost - I'm not sure how realistic that is. There are also
questions about how to do an upgrade of the SB cluster. I'm assuming that
the NB2SB translations could be regenerated, but there may also be data in
the SB DB that comes from (is written by) the ovn-controller agents, or
some other agents, like ovn-controller-vtep. It may be possible to "replay"
that as well, in case of a disconnect. In other words, northd (the
translator) is the authoritative owner of the entries in writes into the SB
DB. There may be other processes that write into the SB DB, such as the
ovn-controller-vtep, and we could consider such processes as authoritative
for that state, so that they could replay it into the SB DB.

Ben indicated that the size of the data isn't too large, but I agree that
sending updates of everything to thousands of clients could become
problematic, particularly if the churn in the system is high, such as in a
container environment. A pub/sub like mechanism could definitely work here
- it wouldn't even need to be too fancy, just per table interest, perhaps.
Another thing to add to the protocol would be per table versioning, so that
when a client gets disconnected, if it happens to reconnect to another
server in the cluster, it can exchange table versions and resync, coming up
to speed by getting changes from a changelog, without necessarily
downloading a full snapshot. Perhaps OVSDB already works like this and I'm
demonstrating my ignorance. :)

On Thu, Mar 10, 2016 at 11:15 PM, Liran Schour  wrote:

> I'd like to raise the following issues for discussion:
>
> 1. That the client side is abstracted from the specific choice of
> server-side database by using a db-abstraction layer on the client side.
> We already have some kind of an abstraction layer in the code: ovsdb-idl.
> Maybe we can start from there.
> 2. I think if clients receive all updates this will pose a scalability
> concern. I'd like to propose adding a pub/sub-like subsystem to be used
> for keeping clients up-to-date about updates in the DB. This can serve as
> a mechanism for table tracking and could also enable clients to receive
> updates only on a small subset of changes instead of all changes. This
> would greatly improve scalability in number of clients.
> 3. Do we really need an on-disk DB for the southbound DB? I think an
> in-memory DB for the Southbound is worth discussing.
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Dan Mihai Dumitriu
Great writeup Ben.

The NB DB does need HA and ACID transactions, but it has few clients, so
it's probably not a very hard problem - could even use BDB with log
shipping -
http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index-085366.html
.

However, one more potential requirement for the NB DB is secondary indices,
because the NB clients may expect to query the NB models in various ways
that weren't considered a priori. I bring this up because in the OpenStack
context the NB DB could be used to store the Neutron data model entirely,
thus obviating the need for the Neutron DB, and eliminating the "syncing
problem" between Neutron and the NB DB. I could see the same applying in
the context of containers.

Regarding the SB DB, as Liran pointed out, it doesn't necessarily need
durable persistence. It would be possible to make the whole thing work with
an in memory SB DB. (I am waiting for you to start shooting holes in this
hypothesis, but I'm reasonably confident those holes can be filled.) That
said, it does need to be replicated for HA - luckily the replication of an
in memory data structure is easier and more performant than that of a
durably persistent data structure. In order to support efficient syncing
with clients (ovn-controller agents) the in memory replication should be a
form of log shipping, so that clients that disconnect from one SB DB
instance and reconnect to different SB DB instance can do a resync without
a full table download. Is this premature optimization?

On Thu, Mar 10, 2016 at 4:11 PM, Ben Pfaff  wrote:

> Requirements
> 
>
> OVN uses two databases, the "northbound" and "southbound" databases,
> in a somewhat idiosyncratic manner.  Each client of one of these
> databases maintains an in-memory replica of the database (or some
> subset of it), and the server sends it updates to this replica as they
> are committed.  Thus, at any given time, a client has a consistent
> snapshot of the database, although it might be old if the database has
> changed but the updates have not yet made it from the server to the
> client.
>
> Beyond supporting this usage model, the basic requirements for the OVN
> use case are:
>
> - Size: 20 MB to 100 MB of data (estimated database size to hold
>   data for our target scale of 1,000 hypervisors and 20,000
>   logical ports).
>
> - Scale: The northbound database has only a single-digit number of
>   clients.  Each hypervisor is a client to the southbound
>   database, so about 1,000 clients for our target scale of 1,000
>   hypervisors.
>
> - Performance: Hundreds of transactions per second.  (Because of
>   the usage model described above, all transactions are write
>   transactions; clients read from their local replicas.)
>
> - Transactions: Clients expect atomic, consistent, isolated
>   transactions.
>
>   Durability is not essential, because the clients will reissue
>   lost transactions (up to and including completely refilling an
>   empty database, although this can be slow).
>
> - High availability: If the database server goes down, then this
>   freezes the OVN configuration.  This is OK briefly for running
>   clients--the existing configuration continues to work, it just
>   can't be updated--but it prevents new clients or clients that
>   restart from using OVN at all.
>
>   For the same reason that durability is not essential, it is
>   acceptable if an occasional fail-over between database servers
>   loses a few transactions, though of course it's best to minimize
>   the probability and the amount of data lost.
>
> - Open source.  Some "open source" databases only provide high
>   availability and transactions as proprietary extensions; that's
>   undesirable.
>
> Desirable features:
>
> - C client, since OVN is written in C; otherwise, we'll likely
>   have to write one.  (We've had suggestions that OVN should be
>   written in another language, such as Java, but we have not
>   decided to change the language yet.)
>
> - Python client, since OVS includes tools written in Python.
>
> - Table structured.  We could layer tables on top of a key-value
>   store if necessary.
>
> - Schema support, with referential integrity constraints.  We find
>   this helpful for increasing our confidence in the system.  This
>   is something that we could leave out or layer on top.
>
> - Network protocol.  Some databases are just designed for local
>   access.  If such a database were otherwise just right, we could
>   wrap it for distributed use.  The analysis below mostly ignores
>   databases that are local-only or in which remote access appears
>   to be an afterthought.
>
>
> Options
> ===
>
> Each entry has the columns listed below.  In general, all-caps answers
> are problematic for the OVN use case.
>
> - Database: The database 

Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Dan Mihai Dumitriu
On Fri, Mar 11, 2016 at 12:55 AM, Dan Mihai Dumitriu 
wrote:

> Great writeup Ben.
>
> The NB DB does need HA and ACID transactions, but it has few clients, so
> it's probably not a very hard problem - could even use BDB with log
> shipping -
> http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index-085366.html
> .
>
> However, one more potential requirement for the NB DB is secondary
> indices, because the NB clients may expect to query the NB models in
> various ways that weren't considered a priori. I bring this up because in
> the OpenStack context the NB DB could be used to store the Neutron data
> model entirely, thus obviating the need for the Neutron DB, and eliminating
> the "syncing problem" between Neutron and the NB DB. I could see the same
> applying in the context of containers.
>

My colleague Ivan pointed out that ZK could be used for the NB DB. I think
that could be a reasonable choice actually.



> Regarding the SB DB, as Liran pointed out, it doesn't necessarily need
> durable persistence. It would be possible to make the whole thing work with
> an in memory SB DB. (I am waiting for you to start shooting holes in this
> hypothesis, but I'm reasonably confident those holes can be filled.) That
> said, it does need to be replicated for HA - luckily the replication of an
> in memory data structure is easier and more performant than that of a
> durably persistent data structure. In order to support efficient syncing
> with clients (ovn-controller agents) the in memory replication should be a
> form of log shipping, so that clients that disconnect from one SB DB
> instance and reconnect to different SB DB instance can do a resync without
> a full table download. Is this premature optimization?
>
> On Thu, Mar 10, 2016 at 4:11 PM, Ben Pfaff  wrote:
>
>> Requirements
>> 
>>
>> OVN uses two databases, the "northbound" and "southbound" databases,
>> in a somewhat idiosyncratic manner.  Each client of one of these
>> databases maintains an in-memory replica of the database (or some
>> subset of it), and the server sends it updates to this replica as they
>> are committed.  Thus, at any given time, a client has a consistent
>> snapshot of the database, although it might be old if the database has
>> changed but the updates have not yet made it from the server to the
>> client.
>>
>> Beyond supporting this usage model, the basic requirements for the OVN
>> use case are:
>>
>> - Size: 20 MB to 100 MB of data (estimated database size to hold
>>   data for our target scale of 1,000 hypervisors and 20,000
>>   logical ports).
>>
>> - Scale: The northbound database has only a single-digit number of
>>   clients.  Each hypervisor is a client to the southbound
>>   database, so about 1,000 clients for our target scale of 1,000
>>   hypervisors.
>>
>> - Performance: Hundreds of transactions per second.  (Because of
>>   the usage model described above, all transactions are write
>>   transactions; clients read from their local replicas.)
>>
>> - Transactions: Clients expect atomic, consistent, isolated
>>   transactions.
>>
>>   Durability is not essential, because the clients will reissue
>>   lost transactions (up to and including completely refilling an
>>   empty database, although this can be slow).
>>
>> - High availability: If the database server goes down, then this
>>   freezes the OVN configuration.  This is OK briefly for running
>>   clients--the existing configuration continues to work, it just
>>   can't be updated--but it prevents new clients or clients that
>>   restart from using OVN at all.
>>
>>   For the same reason that durability is not essential, it is
>>   acceptable if an occasional fail-over between database servers
>>   loses a few transactions, though of course it's best to minimize
>>   the probability and the amount of data lost.
>>
>> - Open source.  Some "open source" databases only provide high
>>   availability and transactions as proprietary extensions; that's
>>   undesirable.
>>
>> Desirable features:
>>
>> - C client, since OVN is written in C; otherwise, we'll likely
>>   have to write one.  (We've had suggestions that OVN should be
>>   written in another language, such as Java, but we have not
>>   decided to change the language yet.)
>>
>> - Python client, since OVS includes tools written in Python.
>>
>> - Table structured.  We could layer tables on top of a key-value
>>   store if necessary.
>>
>> - Schema support, with referential integrity constraints.  We find
>>   this helpful for increasing our confidence in the system.  This
>>   is something that we could leave out or layer on top.
>>
>> - Network protocol.  Some databases are just designed for local
>>   access.  If such a database were otherwise just right, we could
>>   wrap it for distributed use.  

Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Russell Bryant
On Thu, Mar 10, 2016 at 2:11 AM, Ben Pfaff  wrote:

> Database   txn  ACID  consist  trk   HA   OSC   Py  format
> -  ---    ---  ---  ---  ---  ---  ---  --
> ActorDByes  ACID   strong   NO  yes  yes  yes  yes sql
> Aerospike  yes  ACID   strong   NO  yes  yes  yes  yes   db/KV
> Cassandra   NO  -C-D  tunable   NO  yes  yes   NO  yes   table
> Cockroach DB   yes  ACID   strong   NO  yes  yes   ??  sql
> Couchbase   NO      NO  yes  NO?  yes  yesJSON
> CrateIO NO    EVNTUAL   NO  yes  yes   NO  yes sql
> etcdNO  ACID   strong  yes? yes  yes  yes  yes  KV
> Gigaspaces XAP yes  ACID   strong  yes  yes   NO   NO   NO   multi
> HBase   NO  ACID   strong   NO  yes  yes   NO  yes   table
> Hyperdex   yes  ACID   strong   NO  yes   NO  yes  yes  KV
> Hypertable  NO      NO  yes  yes   NO  yes   table
> MongoDB NO  ACID   strong   ??  yes  yes  yes  yesJSON
> RAMCloud   yes     strong   NO  yes  yes   NO  yes  KV
> Redis  yes  -C?D    NO  yes  yes  yes  yes  KV
> RiakNO  ---D  EVNTUAL   NO  yes  yes  yes  yes  KV
> Scalaris   yes  ACI-   strong   NO  yes  yes   NO  yes  KV
> ScyllaDBNO  -C-D  tunable   NO  yes  yes   NO  yes   table
> Voldemort   NO    EVNTUAL   NO  yes  yes   NO  yes  KV
> Zookeeper  yes  AC-D   strong  yes  yes  yes  yes  yes  KV
>
> OVSDB  yes  ACID   strong  yes   NO  yes  yes  yes   table
>

I've shared this message with a few people that I work with that have more
experience evaluating these things than I do to gather some feedback.

Julien Danjou offered the following alternative that I felt was worth
adding to the conversation:

> Database   txn  ACID  consist  trk   HA   OSC   Py  format
> -  ---    ---  ---  ---  ---  ---  ---  --
> PostgreSQL yes  ACID   strong  yes  yes  yes  yes  yes sql
>
> PostgreSQL is way more powerful than most people know, and I think it
> offers all the features you need:
>
> - Size: not an issue, it can handle terabytes and you only seem to need
>   a few hundred of megabytes
> - Scale: not an issue too, can handle thousands of connection per
>   seconds
> - Performance: again, not an issue, can handle very large amounts of it
> - Transactions: yup
> - HA: this can be done, though it requires probably a bit more manual
>   work to be put in place. This can automated in some way, but it's easy
>   to do stream replication with a hot stand by server to recover.
>   Recent versions of PostgreSQL offer tools for that.
> - Open source, as you know :)
> - C client
> - Python client (psycopg2 is pretty good, used it a lot)
> - Table obviously
> - Schema but you know it
> - Network, amirite?
> - Easy to have a lot of read-only replicas
>
> So yes, it has everything OVN needs. It can push notifications to
> clients via the NOTIFY¹ command (that you can use in any
> procedure/trigger). For example, you could imagine creating a trigger
> that sends a JSON payload for each new update/insert in the database.
> That's literally 10 lines of PL/SQL.
>
> ¹  http://www.postgresql.org/docs/9.5/static/sql-notify.html
>
> I think that PostgreSQL would be the safer bet in this move, as:
> - building something on top of etcd would seem weak w.r.t your
> schema/table requirements
> - investing in OVSDB (though keep in mind I don't know it :-) would
> probably end up in redoing a job PostgreSQL people already have done
> better than you would ;-)
>
> The only questions that this raises to me are:
> - whether PostgreSQL is too large/complex to deploy for OVN. Seeing the
>   list of candidates that were evaluated, I wouldn't think so, but there
>   can be a lot of different opinions on that based on different
>   perception of PostgreSQL. And since you're targeting a network DB, you
>   definitely need a daemon configured and set-up so I'm only partially
>   worried here. :)

I wouldn't think so, but it sounds like I need to study the HA model.

Specific to the OVN+OpenStack use case, I imagine a frequent question would
be, "why do I have to use MariaDB+Galera AND PostgreSQL in the same
environment?!"  I suppose OpenStack works with PostgreSQL, too, and it's
just a deployment choice that most people seem to be using MariaDB+Galera.

Further on that note, I take it MariaDB+Galera is lacking against these
requirements?  Or should it be on this list?

> - if the HA model(s) provided by PostgreSQL can fit the requirement of
>   OVN. I would think so, but I'm not sure on how exactly OVN is on the
>   scale of "resilient vs integrity".

Based on this info, PostgreSQL sounds worth a close look.  (Thanks, Julien!)

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ben Pfaff
On Thu, Mar 10, 2016 at 11:26:29AM +0100, Ivan Kelly wrote:
> > - Zookeeper.  The issues here are similar to those for etcd.
> >   Also, Zookeeper transactions don't seem to be isolated.
> Zookeeper transactions can be isolated depending on what level of
> isolation you need.
> A setData on a node operation can contain a version, so that it fails
> if that node has changed since the version. This means with a multi[1]
> of setData operations, you can effectively get a snapshot isolation
> level of isolation. For serializable, you could probably shoehorn it
> in by rewriting all nodes that you've written.

Thanks, that helps.  (I couldn't spend too much time on each database
given the number I looked at.)

> Regarding the notification issue, it sounds like what you'd want is to
> access the transaction log. ZooKeeper kinda does this with observer
> nodes, where the transaction log is shipped to read only nodes to
> scale out reads. OVN could do something similar where by each replica
> tails the transaction log directly, and applies the updates to their
> local copy. Note that the transaction log isn't officially exposed
> right now, but it's easy to get at, since observers already do it.

Thanks.  I've updated the paragraph:

- Zookeeper.  The ZK model is similar to etc so it may have
  similar issues.  Also, ZK makes the transaction log available,
  for use by observer nodes to scale out reads, and this may be
  another way for the clients to track table changes.

It sounds like you know ZK well, so maybe you can answer a question for
me.  The documentation I read for ZK talks about how it's often used to
manage pointers to other resources.  This implies, to my mind, that it's
not suitable for handling significant volumes of data but.  Is that a
true inference?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ivan Kelly
> - Zookeeper.  The ZK model is similar to etc so it may have
>   similar issues.  Also, ZK makes the transaction log available,
>   for use by observer nodes to scale out reads, and this may be
>   another way for the clients to track table changes.
Not quite. It does make the log available in that there's nothing
stopping you from connecting to the leader and reading the log and
using it as you like, but it's not an officially supported usecase, so
they don't actively encourage you to do it (they don't discourage
either in fact).

> The documentation I read for ZK talks about how it's often used to
> manage pointers to other resources.  This implies, to my mind, that it's
> not suitable for handling significant volumes of data but.  Is that a
> true inference?
Yes and No. Zookeeper is often used to point to larger data resources,
but when they're talking large, they're talking HBase, HDFS, etc.
Large data stores with multiple petabytes of data. In the case of OVN,
I don't think it applies. It can easily handle at least a gig of data,
and from your original email, I think we're talking a fraction of
that.

-Ivan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ben Pfaff
On Thu, Mar 10, 2016 at 04:15:13PM +0200, Liran Schour wrote:
> I'd like to raise the following issues for discussion:
> 
> 1. That the client side is abstracted from the specific choice of 
> server-side database by using a db-abstraction layer on the client side. 
> We already have some kind of an abstraction layer in the code: ovsdb-idl. 
> Maybe we can start from there. 

I think that this issue is orthogonal to the question of which database
or databases are suitable for OVN, and can be discussed separately.

> 2. I think if clients receive all updates this will pose a scalability 
> concern. I'd like to propose adding a pub/sub-like subsystem to be used 
> for keeping clients up-to-date about updates in the DB. This can serve as 
> a mechanism for table tracking and could also enable clients to receive 
> updates only on a small subset of changes instead of all changes. This 
> would greatly improve scalability in number of clients.

If clients receive updates that they do not need, that can be a
scalability problem.  Currently, OVN clients do receive some updates
that they do not need.  However, the "conditional monitoring" patches
(currently under review) should allow us to fix that.  I don't know how
that differs from what you're proposing; to me, it sounds the same.

> 3. Do we really need an on-disk DB for the southbound DB? I think an 
> in-memory DB for the Southbound is worth discussing.

OVSDB is an in-memory DB.

I think that this my summary covers this:

- Transactions: Clients expect atomic, consistent, isolated
  transactions.

  Durability is not essential, because the clients will reissue
  lost transactions (up to and including completely refilling an
  empty database, although this can be slow).

- High availability: If the database server goes down, then this
  freezes the OVN configuration.  This is OK briefly for running
  clients--the existing configuration continues to work, it just
  can't be updated--but it prevents new clients or clients that
  restart from using OVN at all.

  For the same reason that durability is not essential, it is
  acceptable if an occasional fail-over between database servers
  loses a few transactions, though of course it's best to minimize
  the probability and the amount of data lost.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ben Pfaff
On Fri, Mar 11, 2016 at 12:43:19AM +0900, Dan Mihai Dumitriu wrote:
> Another thing to add to the protocol would be per table versioning, so that
> when a client gets disconnected, if it happens to reconnect to another
> server in the cluster, it can exchange table versions and resync, coming up
> to speed by getting changes from a changelog, without necessarily
> downloading a full snapshot. Perhaps OVSDB already works like this and I'm
> demonstrating my ignorance. :)

It doesn't already work like that, but there is a to-do item (in
ovn/TODO) that proposes one approach.  Andy Zhou prototyped this
approach but I don't recall whether he posted any RFC patches.

The approach you mention is also a possibility of course.

** Reducing startup time.

   As-is, if ovsdb-server restarts, every client will fetch a fresh
   copy of the part of the database that it cares about.  With
   hundreds of clients, this could cause heavy CPU load on
   ovsdb-server and use excessive network bandwidth.  It would be
   better to allow incremental updates even across connection loss.
   One way might be to use "Difference Digests" as described in
   Epstein et al., "What's the Difference? Efficient Set
   Reconciliation Without Prior Context".  (I'm not yet aware of
   previous non-academic use of this technique.)
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ben Pfaff
On Fri, Mar 11, 2016 at 12:55:54AM +0900, Dan Mihai Dumitriu wrote:
> The NB DB does need HA and ACID transactions, but it has few clients, so
> it's probably not a very hard problem - could even use BDB with log
> shipping -
> http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index-085366.html
> .

I don't think that BDB is a great idea for new use in an open source
project, because Oracle is increasingly hostile to open source and due
to recent license changes some distributions, including Debian, have
elected not to package BDB 6.0:
https://lwn.net/Articles/557820/

> However, one more potential requirement for the NB DB is secondary indices,
> because the NB clients may expect to query the NB models in various ways
> that weren't considered a priori. I bring this up because in the OpenStack
> context the NB DB could be used to store the Neutron data model entirely,
> thus obviating the need for the Neutron DB, and eliminating the "syncing
> problem" between Neutron and the NB DB. I could see the same applying in
> the context of containers.

I had not considered the possibility that the NB DB might become the
database of record for Neutron.  This might be hard to do while allowing
OVN to work gracefully with a variety of CMSes, rather than making it
OpenStack/Neutron specific.

> Regarding the SB DB, as Liran pointed out, it doesn't necessarily need
> durable persistence. It would be possible to make the whole thing work with
> an in memory SB DB. (I am waiting for you to start shooting holes in this
> hypothesis, but I'm reasonably confident those holes can be filled.) 

I basically agree.  I think that the requirements that I wrote cover
this pretty well:

- Transactions: Clients expect atomic, consistent, isolated
  transactions.

  Durability is not essential, because the clients will reissue
  lost transactions (up to and including completely refilling an
  empty database, although this can be slow).

- High availability: If the database server goes down, then this
  freezes the OVN configuration.  This is OK briefly for running
  clients--the existing configuration continues to work, it just
  can't be updated--but it prevents new clients or clients that
  restart from using OVN at all.

  For the same reason that durability is not essential, it is
  acceptable if an occasional fail-over between database servers
  loses a few transactions, though of course it's best to minimize
  the probability and the amount of data lost.

> That said, it does need to be replicated for HA - luckily the
> replication of an in memory data structure is easier and more
> performant than that of a durably persistent data structure. 

Yes.

> In order to support efficient syncing with clients (ovn-controller
> agents) the in memory replication should be a form of log shipping, so
> that clients that disconnect from one SB DB instance and reconnect to
> different SB DB instance can do a resync without a full table
> download. Is this premature optimization?

I think it's a prudent plan, since OVSDB already works in terms of log
shipping, although we don't call it that.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ryan Moats

"dev"  wrote on 03/10/2016 01:11:09 AM:

> From: Ben Pfaff 
> To: dev@openvswitch.org
> Date: 03/10/2016 01:31 AM
> Subject: [ovs-dev] RFC: OVN database options
> Sent by: "dev" 
>
> Requirements
> 
>
> OVN uses two databases, the "northbound" and "southbound" databases,
> in a somewhat idiosyncratic manner.  Each client of one of these
> databases maintains an in-memory replica of the database (or some
> subset of it), and the server sends it updates to this replica as they
> are committed.  Thus, at any given time, a client has a consistent
> snapshot of the database, although it might be old if the database has
> changed but the updates have not yet made it from the server to the
> client.
>
> Beyond supporting this usage model, the basic requirements for the OVN
> use case are
>
> - Size: 20 MB to 100 MB of data (estimated database size to hold
>   data for our target scale of 1,000 hypervisors and 20,000
>   logical ports).
>
> - Scale: The northbound database has only a single-digit number of
>   clients.  Each hypervisor is a client to the southbound
>   database, so about 1,000 clients for our target scale of 1,000
>   hypervisors.
>
> - Performance: Hundreds of transactions per second.  (Because of
>   the usage model described above, all transactions are write
>   transactions; clients read from their local replicas.)
>
> - Transactions: Clients expect atomic, consistent, isolated
>   transactions.
>
>   Durability is not essential, because the clients will reissue
>   lost transactions (up to and including completely refilling an
>   empty database, although this can be slow).
>
> - High availability: If the database server goes down, then this
>   freezes the OVN configuration.  This is OK briefly for running
>   clients--the existing configuration continues to work, it just
>   can't be updated--but it prevents new clients or clients that
>   restart from using OVN at all.
>
>   For the same reason that durability is not essential, it is
>   acceptable if an occasional fail-over between database servers
>   loses a few transactions, though of course it's best to minimize
>   the probability and the amount of data lost.
>
> - Open source.  Some "open source" databases only provide high
>   availability and transactions as proprietary extensions; that's
>   undesirable.
>
> Desirable features:
>
> - C client, since OVN is written in C; otherwise, we'll likely
>   have to write one.  (We've had suggestions that OVN should be
>   written in another language, such as Java, but we have not
>   decided to change the language yet.)
>
> - Python client, since OVS includes tools written in Python.
>
> - Table structured.  We could layer tables on top of a key-value
>   store if necessary.
>
> - Schema support, with referential integrity constraints.  We find
>   this helpful for increasing our confidence in the system.  This
>   is something that we could leave out or layer on top.
>
> - Network protocol.  Some databases are just designed for local
>   access.  If such a database were otherwise just right, we could
>   wrap it for distributed use.  The analysis below mostly ignores
>   databases that are local-only or in which remote access appears
>   to be an afterthought.
>
>
> Options
> ===
>
> Each entry has the columns listed below.  In general, all-caps answers
> are problematic for the OVN use case.
>
> - Database: The database being evaluated.
>
> - txn: "yes" if the database supports transactions across
>   arbitrary data, "NO" if its transactions are limited to a single
>   data item, such as a single key-value pair, or perhaps even more
>   limited.
>
> - ACID: The transactional properties that the database supports,
>   within the transactions that the database supports.  (Thus, a
>   database whose transactions cover only a single data item can be
>   listed as ACID, but this is only for those limited
>   transactions.)
>
> - consist: The distributed consistency model that the database
>   supports, one of "strong" for strong or linearizable
>   consistency, "tunable" for consistency that can be tuned to be
>   strong or linearizable or weaker, or "EVNTUAL" for eventual
>   consistency.
>
> - trk: "yes" if the database can automatically report data changes
>   to clients, "NO" if the database requires clients to poll for
>   changes.
>
> - HA: "yes" if the database can be configured for high
>   availability, so that loss of a single node does not stop
>   database activity, "NO" otherwise.
>
> - OS: "yes" if the database is open source or free (libre)
>   software, "NO" if it is proprietary.  When a database has open
>   source and proprietary editions, this is "yes" and only the
>   features in the open sour

Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ben Pfaff
On Thu, Mar 10, 2016 at 01:14:41PM -0600, Ryan Moats wrote:
> Do we want to be *in* the DB business?  I think the answer is no,
> which means we should be doing the work to *not* be in the DB business
> - refactoring the IDL to allow different DBs to be attached while
> ensuring that the requirements above are still met.  We can then have
> project code that allows various DBs to be plugged in (ovsdb would be
> one of them certainly) and then we've moved to a place where we can
> say "here's different DBs that ovn works with, if you want a different
> one, here's how to connect it..."

I think that the problem of selecting a good database is orthogonal to
the question of whether we should support more than one.  Let's treat
them separately.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ben Pfaff
On Thu, Mar 10, 2016 at 12:52:43PM -0500, Russell Bryant wrote:
> On Thu, Mar 10, 2016 at 2:11 AM, Ben Pfaff  wrote:
> 
> > Database   txn  ACID  consist  trk   HA   OSC   Py  format
> > -  ---    ---  ---  ---  ---  ---  ---  --
> > ActorDByes  ACID   strong   NO  yes  yes  yes  yes sql
> > Aerospike  yes  ACID   strong   NO  yes  yes  yes  yes   db/KV
> > Cassandra   NO  -C-D  tunable   NO  yes  yes   NO  yes   table
> > Cockroach DB   yes  ACID   strong   NO  yes  yes   ??  sql
> > Couchbase   NO      NO  yes  NO?  yes  yesJSON
> > CrateIO NO    EVNTUAL   NO  yes  yes   NO  yes sql
> > etcdNO  ACID   strong  yes? yes  yes  yes  yes  KV
> > Gigaspaces XAP yes  ACID   strong  yes  yes   NO   NO   NO   multi
> > HBase   NO  ACID   strong   NO  yes  yes   NO  yes   table
> > Hyperdex   yes  ACID   strong   NO  yes   NO  yes  yes  KV
> > Hypertable  NO      NO  yes  yes   NO  yes   table
> > MongoDB NO  ACID   strong   ??  yes  yes  yes  yesJSON
> > RAMCloud   yes     strong   NO  yes  yes   NO  yes  KV
> > Redis  yes  -C?D    NO  yes  yes  yes  yes  KV
> > RiakNO  ---D  EVNTUAL   NO  yes  yes  yes  yes  KV
> > Scalaris   yes  ACI-   strong   NO  yes  yes   NO  yes  KV
> > ScyllaDBNO  -C-D  tunable   NO  yes  yes   NO  yes   table
> > Voldemort   NO    EVNTUAL   NO  yes  yes   NO  yes  KV
> > Zookeeper  yes  AC-D   strong  yes  yes  yes  yes  yes  KV
> >
> > OVSDB  yes  ACID   strong  yes   NO  yes  yes  yes   table
> >
> 
> I've shared this message with a few people that I work with that have more
> experience evaluating these things than I do to gather some feedback.
> 
> Julien Danjou offered the following alternative that I felt was worth
> adding to the conversation:
> 
> > Database   txn  ACID  consist  trk   HA   OSC   Py  format
> > -  ---    ---  ---  ---  ---  ---  ---  --
> > PostgreSQL yes  ACID   strong  yes  yes  yes  yes  yes sql

I've been a fan of Postgres since I used in the 1990s for a web-based
application.  It didn't occur to me that it was appropriate here.
Julien, thanks so much for joining the discussion.

> > So yes, it has everything OVN needs. It can push notifications to
> > clients via the NOTIFY¹ command (that you can use in any
> > procedure/trigger). For example, you could imagine creating a trigger
> > that sends a JSON payload for each new update/insert in the database.
> > That's literally 10 lines of PL/SQL.

That's good to know.  I hadn't figured out how to do this kind of thing
with SQL-based systems.

> > ¹  http://www.postgresql.org/docs/9.5/static/sql-notify.html
> >
> > I think that PostgreSQL would be the safer bet in this move, as:
> > - building something on top of etcd would seem weak w.r.t your
> > schema/table requirements
> > - investing in OVSDB (though keep in mind I don't know it :-) would
> > probably end up in redoing a job PostgreSQL people already have done
> > better than you would ;-)
> >
> > The only questions that this raises to me are:
> > - whether PostgreSQL is too large/complex to deploy for OVN. Seeing the
> >   list of candidates that were evaluated, I wouldn't think so, but there
> >   can be a lot of different opinions on that based on different
> >   perception of PostgreSQL. And since you're targeting a network DB, you
> >   definitely need a daemon configured and set-up so I'm only partially
> >   worried here. :)
> 
> I wouldn't think so, but it sounds like I need to study the HA model.

OK, I added the entry and this note:

- PostgreSQL.  Julien Danjou writes:

HA can be done, though it requires probably a bit more manual
work to be put in place.  This can automated in some way, but
it's easy to do stream replication with a hot stand by server
to recover.  Recent versions of PostgreSQL offer tools for
that.

  (Looking at the PostgreSQL 9.4 manual, transaction log shipping
  with asynchronous transaction log streaming seems appropriate.)

  Julien continues:

It can push notifications to clients via the NOTIFY¹ command
(that you can use in any procedure/trigger). For example, you
could imagine creating a trigger that sends a JSON payload for
each new update/insert in the database.  That's literally 10
lines of PL/SQL.

> Specific to the OVN+OpenStack use case, I imagine a frequent question would
> be, "why do I have to use MariaDB+Galera AND PostgreSQL in the same
> environment?!"  I suppose OpenStack works with PostgreSQL, too, and it's
> just a deployment choice that most people seem to be using MariaDB+Galera.
> 
> Further on that note, I take it MariaDB+Galera is lacking against these
> requirements?  Or should it be on this list?

MariaDB doesn't seem to 

Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Ben Pfaff
On Thu, Mar 10, 2016 at 07:11:55PM +0100, Ivan Kelly wrote:
> > - Zookeeper.  The ZK model is similar to etc so it may have
> >   similar issues.  Also, ZK makes the transaction log available,
> >   for use by observer nodes to scale out reads, and this may be
> >   another way for the clients to track table changes.
> Not quite. It does make the log available in that there's nothing
> stopping you from connecting to the leader and reading the log and
> using it as you like, but it's not an officially supported usecase, so
> they don't actively encourage you to do it (they don't discourage
> either in fact).

OK.

> > The documentation I read for ZK talks about how it's often used to
> > manage pointers to other resources.  This implies, to my mind, that it's
> > not suitable for handling significant volumes of data but.  Is that a
> > true inference?
> Yes and No. Zookeeper is often used to point to larger data resources,
> but when they're talking large, they're talking HBase, HDFS, etc.
> Large data stores with multiple petabytes of data. In the case of OVN,
> I don't think it applies. It can easily handle at least a gig of data,
> and from your original email, I think we're talking a fraction of
> that.

I am reassured.  Thanks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-10 Thread Han Zhou
On Wed, Mar 9, 2016 at 11:11 PM, Ben Pfaff  wrote:
>
> Beyond supporting this usage model, the basic requirements for the OVN
> use case are:
>
> - Size: 20 MB to 100 MB of data (estimated database size to hold
>   data for our target scale of 1,000 hypervisors and 20,000
>   logical ports).
>
> - Scale: The northbound database has only a single-digit number of
>   clients.  Each hypervisor is a client to the southbound
>   database, so about 1,000 clients for our target scale of 1,000
>   hypervisors.
>

Ben, is there any reason we limit the scale to 1,000 hypervisors? OVN seems
have much more potential than that based on current tests, even with ovsdb.
As for ports, considering container use cases, there can be more than
20,000 even for 1,000 hypervisors.


> - OVSDB.  If we choose to use OVSDB, we'll have to add
>   high-availability support.  Also, the table doesn't mention
>   scaling, since it's hard to compare objectively, but the OVSDB
>   server currently doesn't scale well to the 1000 clients required
>   for the southbound database, although Andy has started working
>   on that.
>

That's not true :). With probe disabled, in our scale testing OVSDB server
is not yet the bottleneck with 1000 clients connected to southbound DB.


--
Best regards,
Han
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ivan Kelly
> Zookeeper transactions can be isolated depending on what level of
> isolation you need.
> A setData on a node operation can contain a version, so that it fails
> if that node has changed since the version. This means with a multi[1]
> of setData operations, you can effectively get a snapshot isolation
> level of isolation. For serializable, you could probably shoehorn it
> in by rewriting all nodes that you've written.
Thinking about this more, and refreshing my cache on isolation levels,
I realized that zookeeper doesn't in fact offer SI, since reads are
done one at a time, and the state in the database may change between
reads to the same node. So zookeeper offers "Read committed" rather
than SI.

-Ivan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Julien Danjou
On Thu, Mar 10 2016, Russell Bryant wrote:

> Specific to the OVN+OpenStack use case, I imagine a frequent question would
> be, "why do I have to use MariaDB+Galera AND PostgreSQL in the same
> environment?!"  I suppose OpenStack works with PostgreSQL, too, and it's
> just a deployment choice that most people seem to be using MariaDB+Galera.

OpenStack (supposedly) works with PostgreSQL. Galera is picked for most
deployment because it's an HA layer on top of MariaDB that is much
easier to deploy than PostgreSQL solution is.

Other than that, in term of RDBMS/SQL features, PostgreSQL is generally
on top of MySQL.

FTR, Gnocchi¹ is one of the OpenStack project pushing PostgreSQL as
recommended over MySQL, so it can leverage features such as precise
timestamps or time range computing which MySQL was/is unable to do (I
know, it sounds ridiculous). We actually wish we could use even more
PostgreSQL features and drop MySQL, but that'd be a bit harsh.

A great thing for OpenStack would be to capitalize on the work that OVN
could do if it'd pick the path of PostgreSQL in HA mode.

And sorry for getting a little bit off-topic here, :-)

Cheers,

¹  http://gnocchi.xyz/configuration.html

-- 
Julien Danjou
-- Free Software hacker
-- https://julien.danjou.info
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ben Pfaff
On Fri, Mar 11, 2016 at 09:58:18AM +0100, Ivan Kelly wrote:
> > Zookeeper transactions can be isolated depending on what level of
> > isolation you need.
> > A setData on a node operation can contain a version, so that it fails
> > if that node has changed since the version. This means with a multi[1]
> > of setData operations, you can effectively get a snapshot isolation
> > level of isolation. For serializable, you could probably shoehorn it
> > in by rewriting all nodes that you've written.
> Thinking about this more, and refreshing my cache on isolation levels,
> I realized that zookeeper doesn't in fact offer SI, since reads are
> done one at a time, and the state in the database may change between
> reads to the same node. So zookeeper offers "Read committed" rather
> than SI.

Just to make sure, does this means that a Zookeeper client cannot read a
consistent snapshot of the entire database?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ivan Kelly
> Just to make sure, does this means that a Zookeeper client cannot read a
> consistent snapshot of the entire database?
Yes, exactly. It can only read one node at a time, so writes can occur
between the reading of two nodes.

-Ivan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ben Pfaff
On Fri, Mar 11, 2016 at 05:10:15PM +0100, Ivan Kelly wrote:
> > Just to make sure, does this means that a Zookeeper client cannot read a
> > consistent snapshot of the entire database?
> Yes, exactly. It can only read one node at a time, so writes can occur
> between the reading of two nodes.

OK.  That's a major downside for this use case, because the OVN clients
are accustomed to viewing a consistent snapshot of the database.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ivan Kelly
On Fri, Mar 11, 2016 at 5:20 PM, Ben Pfaff  wrote:
> On Fri, Mar 11, 2016 at 05:10:15PM +0100, Ivan Kelly wrote:
>> > Just to make sure, does this means that a Zookeeper client cannot read a
>> > consistent snapshot of the entire database?
>> Yes, exactly. It can only read one node at a time, so writes can occur
>> between the reading of two nodes.
>
> OK.  That's a major downside for this use case, because the OVN clients
> are accustomed to viewing a consistent snapshot of the database.
Well, if you do the log tailing thing I suggested, then the client
will have access to a consistent snapshot, since they would only read
from the database directly once, and all client updates after that
would come from the log which arrive in a well defined order.

-Ivan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ben Pfaff
On Fri, Mar 11, 2016 at 05:26:06PM +0100, Ivan Kelly wrote:
> On Fri, Mar 11, 2016 at 5:20 PM, Ben Pfaff  wrote:
> > On Fri, Mar 11, 2016 at 05:10:15PM +0100, Ivan Kelly wrote:
> >> > Just to make sure, does this means that a Zookeeper client cannot read a
> >> > consistent snapshot of the entire database?
> >> Yes, exactly. It can only read one node at a time, so writes can occur
> >> between the reading of two nodes.
> >
> > OK.  That's a major downside for this use case, because the OVN clients
> > are accustomed to viewing a consistent snapshot of the database.
> Well, if you do the log tailing thing I suggested, then the client
> will have access to a consistent snapshot, since they would only read
> from the database directly once, and all client updates after that
> would come from the log which arrive in a well defined order.

OK.

I'm concerned about the log tailing solution, because it seems likely to
me that each hypervisor would have to examine every transaction, not
just those related to the logical switches that they're interested in.
This could become a scale issue.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ivan Kelly
>> Well, if you do the log tailing thing I suggested, then the client
>> will have access to a consistent snapshot, since they would only read
>> from the database directly once, and all client updates after that
>> would come from the log which arrive in a well defined order.
>
> OK.
>
> I'm concerned about the log tailing solution, because it seems likely to
> me that each hypervisor would have to examine every transaction, not
> just those related to the logical switches that they're interested in.
> This could become a scale issue.
As I have it in my head, the hypervisors wouldn't access the log
directly, but there'd a facade process which handles it. This facade
could easily do filtering to avoid having the whole log go to all
clients. TBH, I don't fully understand the semantics of ovsdb yet, and
it's interaction with OVN, so I'm not 100% that this approach would
work. It's something I plan to study next week though.

-Ivan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Mike Bayer



On 03/10/2016 06:50 PM, Ben Pfaff wrote:


I've been a fan of Postgres since I used in the 1990s for a web-based
application.  It didn't occur to me that it was appropriate here.
Julien, thanks so much for joining the discussion.


So yes, it has everything OVN needs. It can push notifications to
clients via the NOTIFY¹ command (that you can use in any
procedure/trigger). For example, you could imagine creating a trigger
that sends a JSON payload for each new update/insert in the database.
That's literally 10 lines of PL/SQL.


That's good to know.  I hadn't figured out how to do this kind of thing
with SQL-based systems.


¹  http://www.postgresql.org/docs/9.5/static/sql-notify.html

I think that PostgreSQL would be the safer bet in this move, as:
- building something on top of etcd would seem weak w.r.t your
schema/table requirements
- investing in OVSDB (though keep in mind I don't know it :-) would
probably end up in redoing a job PostgreSQL people already have done
better than you would ;-)

The only questions that this raises to me are:
- whether PostgreSQL is too large/complex to deploy for OVN. Seeing the
   list of candidates that were evaluated, I wouldn't think so, but there
   can be a lot of different opinions on that based on different
   perception of PostgreSQL. And since you're targeting a network DB, you
   definitely need a daemon configured and set-up so I'm only partially
   worried here. :)


Hi there, Russell Bryant invited me to this list to chime in on this 
discussion.   If it were me, I *might* not build out based on NOTIFY as 
the core system of notifying clients, and I'd likely stick with a tool
that's designed for cluster communication and in this case the custom 
service that's already there seems like it might be the best bet; I'd 
actually build out the service and use RAFT to keep it in sync with 
itself.


The reason is because Postgresql is not supplying you with an easy 
out-of-the-box HA component in any case (Galera does, but then you don't 
get NOTIFY), so you're going to have to build out something like RAFT or 
such on the PG side in any case in order to handle failover. 
Postgresql's HA story is not very good right now, it's very much 
roll-your-own, and it is nowhere near the sophistication of Galera's 
multi-master approach which would be an enormous muilt-year undertaking 
to recreate on Posgtresql. IMO building out the HA part from scratch 
is the difficult part; being able to send events to clients is pretty 
easy from any kind of custom service.   Since to do HA in PG you'd have 
to build your own event-dispatch system anyway (e.g. to determine a node 
is down and send out the call to pick a new master node as well as some 
method to get all the clients to send data updates to this node), might 
as well just build your custom service to do just the thing you need.











___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ben Pfaff
On Fri, Mar 11, 2016 at 12:13:25PM -0500, Mike Bayer wrote:
> On 03/10/2016 06:50 PM, Ben Pfaff wrote:
> >
> >I've been a fan of Postgres since I used in the 1990s for a web-based
> >application.  It didn't occur to me that it was appropriate here.
> >Julien, thanks so much for joining the discussion.
> >
> >>>So yes, it has everything OVN needs. It can push notifications to
> >>>clients via the NOTIFY¹ command (that you can use in any
> >>>procedure/trigger). For example, you could imagine creating a trigger
> >>>that sends a JSON payload for each new update/insert in the database.
> >>>That's literally 10 lines of PL/SQL.
> >
> >That's good to know.  I hadn't figured out how to do this kind of thing
> >with SQL-based systems.
> >
> >>>¹  http://www.postgresql.org/docs/9.5/static/sql-notify.html
> >>>
> >>>I think that PostgreSQL would be the safer bet in this move, as:
> >>>- building something on top of etcd would seem weak w.r.t your
> >>>schema/table requirements
> >>>- investing in OVSDB (though keep in mind I don't know it :-) would
> >>>probably end up in redoing a job PostgreSQL people already have done
> >>>better than you would ;-)
> >>>
> >>>The only questions that this raises to me are:
> >>>- whether PostgreSQL is too large/complex to deploy for OVN. Seeing the
> >>>   list of candidates that were evaluated, I wouldn't think so, but there
> >>>   can be a lot of different opinions on that based on different
> >>>   perception of PostgreSQL. And since you're targeting a network DB, you
> >>>   definitely need a daemon configured and set-up so I'm only partially
> >>>   worried here. :)
> 
> Hi there, Russell Bryant invited me to this list to chime in on this
> discussion.   If it were me, I *might* not build out based on NOTIFY as the
> core system of notifying clients, and I'd likely stick with a tool
> that's designed for cluster communication and in this case the custom
> service that's already there seems like it might be the best bet; I'd
> actually build out the service and use RAFT to keep it in sync with itself.
> 
> The reason is because Postgresql is not supplying you with an easy
> out-of-the-box HA component in any case (Galera does, but then you don't get
> NOTIFY), so you're going to have to build out something like RAFT or such on
> the PG side in any case in order to handle failover. Postgresql's HA story
> is not very good right now, it's very much roll-your-own, and it is nowhere
> near the sophistication of Galera's multi-master approach which would be an
> enormous muilt-year undertaking to recreate on Posgtresql. IMO building
> out the HA part from scratch is the difficult part; being able to send
> events to clients is pretty easy from any kind of custom service.   Since to
> do HA in PG you'd have to build your own event-dispatch system anyway (e.g.
> to determine a node is down and send out the call to pick a new master node
> as well as some method to get all the clients to send data updates to this
> node), might as well just build your custom service to do just the thing you
> need.

Thanks a lot for the comments!  I've added this to my notes.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: OVN database options

2016-03-11 Thread Ben Pfaff
On Fri, Mar 11, 2016 at 05:49:07PM +0100, Ivan Kelly wrote:
> >> Well, if you do the log tailing thing I suggested, then the client
> >> will have access to a consistent snapshot, since they would only read
> >> from the database directly once, and all client updates after that
> >> would come from the log which arrive in a well defined order.
> >
> > OK.
> >
> > I'm concerned about the log tailing solution, because it seems likely to
> > me that each hypervisor would have to examine every transaction, not
> > just those related to the logical switches that they're interested in.
> > This could become a scale issue.
> As I have it in my head, the hypervisors wouldn't access the log
> directly, but there'd a facade process which handles it. This facade
> could easily do filtering to avoid having the whole log go to all
> clients. TBH, I don't fully understand the semantics of ovsdb yet, and
> it's interaction with OVN, so I'm not 100% that this approach would
> work. It's something I plan to study next week though.

Thanks.  I've added that to my notes.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev