Hi Openflowjava/Openflowplugin Developer:

I meet a concurrency problem in odl openflowplugin.
I create ovs as switch using script and it connected to my controller one by 
one:
```python
for ovs_id in range (OVS_START_NUM,OVS_NUM):
    subprocess.call('ovs-vsctl add-br bridge%d' % ovs_id, shell=True)
    subprocess.call('ovs-vsctl set-controller bridge%d tcp:%s:6633' % ovs_id, 
controller, shell=True)

```


And I can see: all connected switch disconnected in a time suddenly. I can see 
the log:
```
2018-07-24T20:55:00,333 | INFO  | epollEventLoopGroup-9-42 | 
SystemNotificationsListenerImpl  | 326 - org.opendaylight.openflowplugin.impl - 
0.6.2.netease1 | ConnectionEvent: Connection closed by device, 
Device:/xxxx:35922, NodeId:openflow:xxx
```


I find the openflowjava code in `DelegatingInboundHandler.java`??it mean "the 
channel between controller and switch inactive."
```
    @Override
    public void channelInactive(final ChannelHandlerContext ctx) {
        LOG.debug("Channel inactive");
        if (!inactiveMessageSent) {
            DisconnectEventBuilder builder = new DisconnectEventBuilder();
            builder.setInfo("Channel inactive");
            consumer.consume(builder.build());
            inactiveMessageSent = true;
        }
    }

```


But I am very sure all switch and controller network link no problem. I had 
test many time, this issue will happen again. And I found the special log:
```
2018-07-27T19:07:37,499 | WARN  | 
opendaylight-cluster-data-akka.actor.default-dispatcher-28 | ContextChainImpl   
              | 326 - org.opendaylight.openflowplugin.impl - 0.6.2.netease1 | 
Not able to start clustering services for node openflow:86750152275523 
2018-07-27T19:07:37,500 | WARN  | ofppool-358      | ContextChainHolderImpl     
      | 326 - org.opendaylight.openflowplugin.impl | Not able to set MASTER 
role on device openflow:86750152275523, reason: java.lang.RuntimeException: 
Failed to initialize device openflow:86750152275523 in 9s: 
java.util.concurrent.TimeoutException: Waited 9000 milliseconds for 
com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture@7ae82a2[status=PENDING]
```
Before this log, the controller everything well. And After this log, I can see 
many `Connection closed by device` log.
This log show concurrent timeout and thread status PENDING.
I find the Exception code in `DeviceContextImpl.java`??
```
            final Future<Void> initialize = initializer
                    .get()
                    .initialize(this, switchFeaturesMandatory, 
skipTableFeatures, writerProvider, convertorExecutor);


            try {
                initialize.get(DEVICE_INIT_TIMEOUT, TimeUnit.MILLISECONDS);
            } catch (TimeoutException ex) {
                initialize.cancel(true);
                throw new RuntimeException(String.format("Failed to initialize 
device %s in %ss: %s",
                        deviceInfo.toString(), 
String.valueOf(DEVICE_INIT_TIMEOUT / 1000), ex.toString()), ex);
            } catch (ExecutionException | InterruptedException ex) {
                throw new RuntimeException(
                        String.format("Device %s cannot be initialized: %s", 
deviceInfo.toString(), ex.toString()), ex);
            }

```


The device initialize process for get 
OFPMPDESC/OFPMPMETERFEATURES/OFPMPGROUPFEATURES... from switch. And it request 
isn't exceute by channel's Eventloop?


I think this is a concurrency problem in openflowplugin / openflowjava / netty ?


For more, I alse test switch connect to 6633 and 6653 port in same controller. 
And I saw when all switch disconnect in 6633 port, but all switch connect to 
6653 port no problem.
I know 6653 and 6633 port is different netty EventLoopGroup.
Is a channel in one EventLoopGroup(6633), it device initialize failed and make 
all EventLoopGroup channel inactive? And another EventLoopGroup(6653) isn't 
affected.


I had read openflowjava and openflowplugin code very detail, but i cannot find 
the problem in source code.
Is it a concurrency bug in ofp using netty way? or something others?




Hope get your help. Thanks.
-----------
SimonCZW
_______________________________________________
openflowplugin-dev mailing list
openflowplugin-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to