Sanjib, after looking at the test results in CI, it seems some of the failures are not that sporadic, for example there is a consistent flow push failure when the leader restarts and the switch connects to the just restarted leader (currently a follower). I wonder if you could change your test so that after leader restart mininet connects to the old leader instead to a fix follower (follower #2), this way we can achieve a consistent error, easier to reproduce.
BR/Luis > On Apr 4, 2016, at 10:21 PM, Luis Gomez <[email protected]> wrote: > > The suite looks good to me too, also I cannot reproduce this issue using He > plugin in my local setup so lets observe the behavior in CI. > > BR/Luis > > >> On Apr 4, 2016, at 5:06 PM, Jamo Luhrsen <[email protected]> wrote: >> >> Sanjib, >> >> Thanks for the work. I gave +1 and some comments. Luis should be >> the one to take it across the finish line. >> >> As for the random failures, do you know which open bugs we can >> associate those with? >> >> Thanks, >> JamO >> >> On 04/04/2016 04:35 AM, Sanjib Mohapatra wrote: >>> Hi Luis/Jamo >>> >>> I have raised a review request https://git.opendaylight.org/gerrit/37054 >>> regarding a new test suite Cluster HA Data Recovery Leader Follower >>> Failover. it covers following tests. >>> >>> Description: In a 3 node cluster initial inventory shard status is >>> verified and following tests are performed. >>> >>> - Mininet switch is connected to a follower node and flow is added via >>> another follower node. Disconnect the switch and restarts the Leader. >>> Connect the switch again once Cluster is formed and verify flow is >>> installed in the switch again >>> >>> - Disconnect the switch and restarts one of the follower node. Connect >>> the switch and verify the flow is installed in the switch again. >>> >>> - Disconnect the switch and restarts the Cluster. Connect the switch >>> again when Cluster is formed and verify the flow is installed >>> successfully. >>> >>> Please do the review. >>> >>> Test Result of local run in a 3 node cluster is given below, the sporadic >>> failure needs to be investigated and BUG to be raised. >>> >>> root@mininet-vm:/home/mininet/integration/test/csit/suites/openflowplugin/Clustering# >>> pybot -L TRACE -v MININET_USER:mininet -v USER_HOME:/home/mininet -v >>> CONTROLLER:10.183.181.51 -v CONTROLLER1:10.183.181.52 -v >>> CONTROLLER2:10.183.181.53 -v USER:root -v PASSWORD:rootroot -v >>> DEFAULT_LINUX_PROMPT:\# -v NUM_ODL_SYSTEM:3 -v MININET_PASSWORD:rootroot >>> -v ODL_OF_PLUGIN:helium -v >>> KARAF_HOME:/home/mininet/controller-Be/deploy/current/odl -v >>> WORKSPACE:/home/mininet -v BUNDLEFOLDER:controller-Be/deploy/current/odl >>> 030__Cluster_HA_Data_Recovery_Leader_Follower_Failover.robot >>> ============================================================================== >>> Cluster HA Data Recovery Leader Follower Failover :: Test suite for >>> Cluster... >>> ============================================================================== >>> Create Original Cluster List :: Create original cluster list. | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Check Shards Status Before Leader Restart :: Check Status for all ... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Get inventory Leader Before Leader Restart :: Find leader in the i... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Start Mininet Connect To Follower Node1 :: Start mininet with conn... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Add Flows In Follower Node2 and Verify Before Leader Restart :: Ad... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Stop Mininet Connected To Follower Node1 and Exit :: Stop mininet ... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Restart Leader From Cluster Node :: Kill Leader Node and Start it ... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Get inventory Follower After Leader Restart :: Find new Followers ... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Start Mininet Connect To Follower Node2 :: Start mininet with conn... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Verify Flows In Switch After Leader Restart :: Verify flows are in... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Stop Mininet Connected To Follower Node2 and Exit :: Stop mininet ... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Restart Follower Node2 :: Kill Follower Node2 and Start it Up, Ver... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Get inventory Follower After Follower Restart :: Find Followers an... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Start Mininet Connect To Leader :: Start mininet with connection t... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Verify Flows In Switch After Follower Restart :: Verify flows are ... | >>> FAIL | >>> Keyword 'MininetKeywords.Mininet Sync Status' failed after retrying for 15 >>> seconds. The last error was: '*** s1 >>> ------------------------------------------------------------------------ >>> OFPST_AGGREGATE reply (OF1.3) (xid=0x2): packet_count=0 byte_count=0 >>> flow_count=0 >>> mininet>' contains 'flow_count=1' 0 times, not 1 time. >>> ------------------------------------------------------------------------------ >>> Stop Mininet Connected To Leader and Exit :: Stop mininet Connecte... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Restart Full Cluster :: Kill all Cluster Nodes and Start it Up All. | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Get inventory Status After Cluster Restart :: Find New Followers a... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Start Mininet Connect To Follower Node2 After Cluster Restart :: S... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Verify Flows In Switch After Cluster Restart :: Verify flows are i... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Delete Flows In Follower Node1 and Verify After Leader Restart :: ... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Stop Mininet Connected To Follower Node2 and Exit After Cluster Re... | >>> PASS | >>> ------------------------------------------------------------------------------ >>> Cluster HA Data Recovery Leader Follower Failover :: Test suite fo... | >>> FAIL | >>> 22 critical tests, 21 passed, 1 failed >>> 22 tests total, 21 passed, 1 failed >>> ============================================================================== >>> Output: >>> /home/mininet/integration/test/csit/suites/openflowplugin/Clustering/output.xml >>> Log: >>> /home/mininet/integration/test/csit/suites/openflowplugin/Clustering/log.html >>> Report: >>> /home/mininet/integration/test/csit/suites/openflowplugin/Clustering/report.html >>> >>> Thanks >>> Sanjib >>> >>> >>> -----Original Message----- >>> From: Gerrit Code Review [mailto:[email protected]] >>> Sent: 04 April 2016 16:57 >>> To: Sanjib Mohapatra >>> Subject: Change in integration/test[master]: Added test suites for cluster >>> HA data recovery leader follow... >>> >>> From jenkins-releng <[email protected]>: >>> >>> jenkins-releng has posted comments on this change. >>> >>> Change subject: Added test suites for cluster HA data recovery leader >>> follower failover >>> ...................................................................... >>> >>> >>> Patch Set 1: Verified+1 >>> >>> Build Unstable >>> >>> https://jenkins.opendaylight.org/releng/job/openflowplugin-csit-verify-1node-flow-services/98/ >>> : SUCCESS >>> >>> https://jenkins.opendaylight.org/releng/job/openflowplugin-csit-verify-3node-clustering-helium-design/41/ >>> : SUCCESS >>> >>> https://jenkins.opendaylight.org/releng/job/integration-csit-verify-1node-library/1460/ >>> : UNSTABLE >>> >>> https://jenkins.opendaylight.org/releng/job/openflowplugin-csit-verify-3node-clustering/74/ >>> : UNSTABLE >>> >>> https://jenkins.opendaylight.org/releng/job/integration-test-verify-python-boron/214/ >>> : SUCCESS >>> >>> -- >>> To view, visit https://git.opendaylight.org/gerrit/37054 >>> To unsubscribe, visit https://git.opendaylight.org/gerrit/settings >>> >>> Gerrit-MessageType: comment >>> Gerrit-Change-Id: I1cc767712bac3694ba4bf9b052765903bce28ae8 >>> Gerrit-PatchSet: 1 >>> Gerrit-Project: integration/test >>> Gerrit-Branch: master >>> Gerrit-Owner: Sanjib Mohapatra <[email protected]> >>> Gerrit-Reviewer: jenkins-releng <[email protected]> >>> Gerrit-HasComments: No >>> _______________________________________________ >>> integration-dev mailing list >>> [email protected] >>> https://lists.opendaylight.org/mailman/listinfo/integration-dev >>> > _______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
