Sorry for top posting, but here are two netvirt jobs just now started that
will restart each controller with tell based = true before running netvirt
csit:

https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openstack-ocata-gate-stateful-oxygen/2
https://jenkins.opendaylight.org/releng/job/netvirt-csit-3node-openstack-ocata-gate-stateful-nitrogen/13

The sandbox job will be gone in aprox 36 hours. the other should stay for 6 
months.

JamO


On 01/11/2018 09:40 PM, Muthukumaran K wrote:
> Hi Sam, Robert, 
> 
> On the observations which were made as early as September 2017 - 
> https://lists.opendaylight.org/pipermail/netvirt-dev/2017-September/005518.html
>  (thanks to Jamo for testing this out)
> Enabling tell based protocol had 22% failure of CSIT at releng level. More 
> details on the last sandbox and releng runs below
> 
> Having said that, since this is a 3 month old result and multiple changes 
> would have gone into netvirt + genius itself, it would be prudential to test 
> the same with the latest Oxygen build (at least it would reduce the 
> possibility of misinterpreting netvirt + genius related issues as MD-SAL 
> related issues). We will do one more sandbox run here at Ericsson with latest 
> ODL Master and re-publish the results with and without tell-based protocol 
> enabled by mid of next week. We will also try to run one round of bulk-flow 
> provisioning with OFPlugin's bulk-o-matic test driver to see the scale 
> behavior of tell-based protocol too. 
> 
> Actually two runs were performed one on releng and another in sandbox between 
> last week of August and mid of September 2017 against Nitrogen : 
> 
> Releng run :
> ========== 
> https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-3node-openstack-ocata-gate-stateful-nitrogen/7/log.html.gz
> 
> Sandbox run : 
> ===========
> https://logs.opendaylight.org/sandbox/jenkins091/netvirt-csit-3node-openstack-ocata-jamo-upstream-stateful-nitrogen/1/odl1_karaf.log.gz
> 
> Jamo's observations from sandbox run :
> results are not good. Looks like things pass from a black box perspective in 
> our first l2 connectivity suite, but then lots of failures after that.
> 
> I also notice that our non-failing keyword to write to the karaf log using 
> ssh to the karaf shell is failing, even in the above passing suite.
> 
> Also, it's worth noting that in order to enable tell-based protocol I'm just 
> stealing a controller robot suite to do the work and running it first.
> It makes the config change and reboots all the controllers.
> 
> In one karaf log (I only looked at one) I saw a bunch of WARN messages about 
> "Unknown history .... ignoring..."
> example:
> 
>   FrontendClientMetadataBuilder    | 215 - 
> org.opendaylight.controller.sal-distributed-datastore - 1.7.0.SNAPSHOT | 
> member
> 1-shard-topology-operational: Unknown history for aborted transaction 
> member-1-datastore-operational-fe-4-txn-7810-1, ignoring
> 
> I also saw an ERROR about failure to serialize something or other:
> 
> 2017-08-29 04:25:12,719 | ERROR | -dispatcher-3279 | EndpointWriter           
>         | 41 - com.typesafe.akka.slf4j - 2.4.18
> | Failed to serialize remote message [class akka.actor.Status$Failure] 
> | using serializer [class
> akka.serialization.JavaSerializer]. Transient association error (association 
> remains live)
> akka.remote.MessageSerializer$SerializationException: Failed to serialize 
> remote message [class akka.actor.Status$Failure] using serializer [class 
> akka.serialization.JavaSerializer].
> 
> Observations:
> ===========
> 
> -----Original Message-----
> From: Robert Varga [mailto:[email protected]] 
> Sent: Friday, January 12, 2018 2:11 AM
> To: Sam Hague
> Cc: Michael Vorburger; Muthukumaran K; Tom Pantelis; controller-dev; 
> [email protected]; Kency Kurian
> Subject: Re: [controller-dev] Should application code persist do retries on 
> TransactionCommitFailedException caused by AskTimeoutException or could CDS 
> be configured to retry more?
> 
> Regards
> Muthu
> 
> 
> On 11/01/18 21:26, Sam Hague wrote:
>> Robert,
>>
>> when you mention odlparent/yangtools integrated - what does that mean?
> 
> I meant the yangtools-2.0.0 stuff needs to be merged up -- which obviously 
> was delayed way longer than anticipated.
> 
>> do we think that will happen for oxygen?
> 
> I would love to have it in, but it does have potential to cause breakage
> -- hence I am afraid we are out of runway.
> 
>> There are a number of clustering bugs open that all have 
>> AskTimeoutException listed in the traces. I think the idea is the tell 
>> based change will help and then we can dig deeper if the bugs still exist.
> 
> Yup.
> 
>> Muthu,
>>
>> how did your testing with tell for netvirt tests go? Were we safe 
>> switching to it?
> 
> *This* is the most critical question that needs to be answered. If netvirt 
> and BGP greenlight it, I think we can make the switch ...
> 
> Regards,
> Robert
> 
> _______________________________________________
> controller-dev mailing list
> [email protected]
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
> 
_______________________________________________
controller-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to