Sorry for top posting, but here are two netvirt jobs just now started that will restart each controller with tell based = true before running netvirt csit:
https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openstack-ocata-gate-stateful-oxygen/2 https://jenkins.opendaylight.org/releng/job/netvirt-csit-3node-openstack-ocata-gate-stateful-nitrogen/13 The sandbox job will be gone in aprox 36 hours. the other should stay for 6 months. JamO On 01/11/2018 09:40 PM, Muthukumaran K wrote: > Hi Sam, Robert, > > On the observations which were made as early as September 2017 - > https://lists.opendaylight.org/pipermail/netvirt-dev/2017-September/005518.html > (thanks to Jamo for testing this out) > Enabling tell based protocol had 22% failure of CSIT at releng level. More > details on the last sandbox and releng runs below > > Having said that, since this is a 3 month old result and multiple changes > would have gone into netvirt + genius itself, it would be prudential to test > the same with the latest Oxygen build (at least it would reduce the > possibility of misinterpreting netvirt + genius related issues as MD-SAL > related issues). We will do one more sandbox run here at Ericsson with latest > ODL Master and re-publish the results with and without tell-based protocol > enabled by mid of next week. We will also try to run one round of bulk-flow > provisioning with OFPlugin's bulk-o-matic test driver to see the scale > behavior of tell-based protocol too. > > Actually two runs were performed one on releng and another in sandbox between > last week of August and mid of September 2017 against Nitrogen : > > Releng run : > ========== > https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-3node-openstack-ocata-gate-stateful-nitrogen/7/log.html.gz > > Sandbox run : > =========== > https://logs.opendaylight.org/sandbox/jenkins091/netvirt-csit-3node-openstack-ocata-jamo-upstream-stateful-nitrogen/1/odl1_karaf.log.gz > > Jamo's observations from sandbox run : > results are not good. Looks like things pass from a black box perspective in > our first l2 connectivity suite, but then lots of failures after that. > > I also notice that our non-failing keyword to write to the karaf log using > ssh to the karaf shell is failing, even in the above passing suite. > > Also, it's worth noting that in order to enable tell-based protocol I'm just > stealing a controller robot suite to do the work and running it first. > It makes the config change and reboots all the controllers. > > In one karaf log (I only looked at one) I saw a bunch of WARN messages about > "Unknown history .... ignoring..." > example: > > FrontendClientMetadataBuilder | 215 - > org.opendaylight.controller.sal-distributed-datastore - 1.7.0.SNAPSHOT | > member > 1-shard-topology-operational: Unknown history for aborted transaction > member-1-datastore-operational-fe-4-txn-7810-1, ignoring > > I also saw an ERROR about failure to serialize something or other: > > 2017-08-29 04:25:12,719 | ERROR | -dispatcher-3279 | EndpointWriter > | 41 - com.typesafe.akka.slf4j - 2.4.18 > | Failed to serialize remote message [class akka.actor.Status$Failure] > | using serializer [class > akka.serialization.JavaSerializer]. Transient association error (association > remains live) > akka.remote.MessageSerializer$SerializationException: Failed to serialize > remote message [class akka.actor.Status$Failure] using serializer [class > akka.serialization.JavaSerializer]. > > Observations: > =========== > > -----Original Message----- > From: Robert Varga [mailto:[email protected]] > Sent: Friday, January 12, 2018 2:11 AM > To: Sam Hague > Cc: Michael Vorburger; Muthukumaran K; Tom Pantelis; controller-dev; > [email protected]; Kency Kurian > Subject: Re: [controller-dev] Should application code persist do retries on > TransactionCommitFailedException caused by AskTimeoutException or could CDS > be configured to retry more? > > Regards > Muthu > > > On 11/01/18 21:26, Sam Hague wrote: >> Robert, >> >> when you mention odlparent/yangtools integrated - what does that mean? > > I meant the yangtools-2.0.0 stuff needs to be merged up -- which obviously > was delayed way longer than anticipated. > >> do we think that will happen for oxygen? > > I would love to have it in, but it does have potential to cause breakage > -- hence I am afraid we are out of runway. > >> There are a number of clustering bugs open that all have >> AskTimeoutException listed in the traces. I think the idea is the tell >> based change will help and then we can dig deeper if the bugs still exist. > > Yup. > >> Muthu, >> >> how did your testing with tell for netvirt tests go? Were we safe >> switching to it? > > *This* is the most critical question that needs to be answered. If netvirt > and BGP greenlight it, I think we can make the switch ... > > Regards, > Robert > > _______________________________________________ > controller-dev mailing list > [email protected] > https://lists.opendaylight.org/mailman/listinfo/controller-dev > _______________________________________________ controller-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/controller-dev
