Hi Jozef/Tomas/Luis,

I was investigating Bug 7736
<https://bugs.opendaylight.org/show_bug.cgi?id=7736> and came across few
issue in our clustering implementation and also some limitation with
singleton clustering as well.

Issue 1 : Registering application on data change notification.
In the current implementation, when plugin receives the connection from
device, it register itself as a service instance to clustering singleton
service. After registering with clustering service, it receives the
notification to initialize the instance. It then try to set the master role
to the device and then write the device data to the data store.
Forwarding-Rule-Manager then listen on the data store notification and
whenever it see that node is added to the data store, it registers itself
as a service instance for that node. Given that we are using
ClusteredDataTreeChangeListener, all the FRM instances get the node added
notification from data store and all the cluster nodes end up registering
themselves as a service instance on the same service identifier. So even if
device is connected to only one controller FRM register itself on all the
three nodes, that's not correct behavior. So this bug can cause a issue
where openflowplugin cluster will be almost unusable. We have seen an issue
where if you connect the device to two controllers and disconnect the
device from first controller and connect it back, ownership goes to second
controller where device is also connected, and then you disconnect the
device from second controller and reconnect it, ownership goes to third
controller, but given that now ownership for that service identity is with
controller 3, even if device connect back to controller1/2, those
controller don't push the master role down. And this scenario can occur
trigger the moment your device disconnect from any of the controller.

Now problem is that for applications there is no way to find out if the
device is connected to it's host controller instance (until and unless we
write some hardcoded controller number/name in the data store for each
device where it's connected). The only way i can see is through the yang
notification, where plugin can send the nodeAdded/nodeRemoved notification
and application can register themself as a service instance if they receive
those events. That way we can avoid the problem i mentioned above. I pushed
a patch that does the same thing and it resolves this issue.

https://git.opendaylight.org/gerrit/#/c/51489/

Issue 2: Data Change notification every time node disconnect from any of
the node in cluster

Current implementation we see that even if the device is connected to all
the three controller, and the moment device disconnect from one of the
controller, applications receive data change notification where node data
is removed and shortly after another notification with the node data added.
Application thinks that the device just got disconnect from the controllers
and reconnected back, but in reality device is still connected to the
remaining two controller. I think the reason behind this is that the
current implementation of the singleton service don't send any notification
to non-owner controllers about the ownership of the device (e.g
isOwner=false, hasOwner=false, wasOwner=false). I think because of this
limitation we wrote the code in a way that whenever closeServiceInstance()
is called plugin removes the data from data store and when the other
controller get instantiateServiceInstance() it put the data back to data
store. And that actually generates two events for the application. Given
that device is connected to all the controllers, this behavior is not
correct. I can't think of any solution that can fix that, until and unless
singleton clustering service provide a specific notification about it to
other controllers, so that those controllers can device if they want to
clean-up the data or ignore it given that one of them is still an owner of
the device.

This same functional behavior can create another issue. If the device is
connected to only one controller in the cluster  and user kill that
controller, it would leave the stale data in the data store, because other
controllers won't be notified given that they didn't register as a service
instance for the service-group-id. I think this is major limitation and not
sure plugin can resolve it by itself (until and unless we use EOS +
Singleton Clustering Service hack to make it work).

Let me know your thoughts.

Side question: do anybody know if any enhancement is proposed in md-sal
project that can help solving this issue?

-- 
Thanks
Anil
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to