See attached files for raw notes.

--Colin
--Replacing-- Adding an option for MD-SAL Eventing with an off-the-shelf 
message bus

The MD-SAL provides many kinds of events including RPCs, YANG notifications, 
and data change notifications. It does so using a custom eventing system which 
has different delivery semantics (both within and outside clustering) and 
doesn't lend itself to easy tapping and debugging. It would be exciting if we 
could replace it all with an off-the-shelf (or maybe pluggable different) 
message bus. 

MD-SAL has (at least) 7 kinds of events:
* YANG Notifications
        * delivered locally on the same node that raised them only
        * best-effort delivery
        * code triggered
* Data (Tree) Change Notifications
        * delivered to the shard leader for the data that was changed
        * when data is changed in the data store
        * only triggered by the data store
        * boundaries of writes aren't necessarily preserved
        * on reboot, you get one big notification for all the data that was 
there before
* Clustered Data Change Notifications
        * same as data (tree) change notification, but go to all nodes in the 
cluster
        * need another mechanism to suppress it on some nodes
                * singleton service does this for you
        * unclear if it's 0 or more, at most once, at least once, or what
* Global RPCs (2 events)
        * delivered locally on the same node where the call was
* Mounted RPCs ???
        * routed to the node with a NETCONF connection and forwarded
* Mounted YANG Notifications
        * can't get them via RESTCONF, but otherwise like YANG notifications
* Routed RPCs
        * delivered to the (last or first, but effectively random) node that 
registered to handle it
        * if you're careful about who registers, you can govern where it goes
                * singleton service does this for
        * otherwise, :-(


Different delivery:
* we'd really like to have shard-leader deliver for improved performance
        * RPCs/request end up where the data is



If we agree this is a problem:
* We either need to clean up our mess
* or we could do that + rely on off-the-shelf message bus
        * tracing, tapping, parsing, plugging into from outside are all 
well-defined
        * we would get delivery semantics (both who, how many)
        * ordering between events
* OSGi event system exists and can bridge to anything
        * why don't we use this? at least first?
        


* brokered vs. brokerless?
        * brokered tends to give deliver requirements, but has external 
requirements

Potential issues:
* ordering: only Data Change Notifications are ordered
* delivery semantics: only Data Change Notifications are guaranteed
* performance: latency vs. throughput
        * could you make this a runtime, not compile time option to do Java 
function call vs. message?
        * currently we have apps that will make use of O(10^6) "messages"/sec
        * real users (AT&T) using O(10^3) in their deployment
        
Using etcd3 as a pluggable option for an MD-SAL Data Store


It seems as though etcd3 might provide an off-the-shelf replacement for our own 
data store that offers tree-based data items with good performance and the 
ability to watch for data changes on arbitrary subtrees. It's at least worth 
investigating.

    Two past mailing list threads:
        https://lists.opendaylight.org/pipermail/dev/2016-July/002336.html
        https://lists.opendaylight.org/pipermail/dev/2016-August/002566.html


Core things we need from our data store:
* tree-based data store (ORM is a possibility, but has performance implications)
* we allow for flexibility notifications about data changes
        * specific node in the tree
        * specific node and its children
        * specific node and its subtree
        * this is the require that seems to hang us up
                * do we really need this?
* O(10^6) tx/sec
        * same address space
        * not clustered
        * lower in a cluster
* stable, well-tested, clustered performance

* we have O(10^6) lines of code that rely on these assumptions

Key idea:
* etcd has a flat key-value store
* keys have have application structure
* allow for notification that use that structure
        * e.g., listen all keys that start foo/bar/...
        
Advantages:
* off the shelf data stores come with tools and a bigger constellation of 
expertise
* 

Big challenges:
* etcd targets O(10^3) tx/s sustained, 10^4 burst
        * persisted to disk
        * remote calls
        * it would be interesting to an apples-to-apples comparison
        

Yang 1.1 apparently couples RPC and data items
* RPCs include pointers into the data store


Next steps:
* prototype etcd3 as a shard provider for some of the MD-SAL
* 
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to