----- Original Message -----
> From: "Ryan Moats" <rmo...@us.ibm.com>
> To: "Lance Richardson" <lrich...@redhat.com>
> Cc: dev@openvswitch.org
> Sent: Tuesday, July 19, 2016 10:20:37 AM
> Subject: Re: [ovs-dev] [PATCH] netdev-dummy: fix crash with more than one
> passive connection
>
> Lance Richardson <lrich...@redhat.com> wrote on 07/19/2016 09:07:41 AM:
>
> > From: Lance Richardson <lrich...@redhat.com>
> > To: Ryan Moats/Omaha/IBM@IBMUS
> > Cc: dev@openvswitch.org
> > Date: 07/19/2016 09:07 AM
> > Subject: Re: [ovs-dev] [PATCH] netdev-dummy: fix crash with more
> > than one passive connection
> >
> > ----- Original Message -----
> > > From: "Ryan Moats" <rmo...@us.ibm.com>
> > > To: "Lance Richardson" <lrich...@redhat.com>
> > > Cc: dev@openvswitch.org
> > > Sent: Monday, July 18, 2016 11:36:27 PM
> > > Subject: Re: [ovs-dev] [PATCH] netdev-dummy: fix crash with more
> > than one passive connection
> > >
> > > "dev" <dev-boun...@openvswitch.org> wrote on 07/18/2016 11:29:28 AM:
> > >
> > > >
> > > > > From: "Lance Richardson" <lrich...@redhat.com>
> > > > > To: dev@openvswitch.org
> > > > > Sent: Wednesday, July 6, 2016 7:39:52 PM
> > > > > Subject: [ovs-dev] [PATCH] netdev-dummy: fix crash with more than
> > > > one passive connection
> > > > >
> > > > > Investigation found that Some of the occasional failures in the
> > > > > "ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS" test case are caused
> > > > > by ovs-vswitchd crashing with SIGSEGV. It turns out that the
> > > > > crash occurrs when the number of netdev-dummy passive connections
> > > > > transitions from 1 to 2. When xrealloc() copies the array of
> > > > > dummy_packet_stream structures from the original buffer to a
> > > > > newly allocated one, the struct ovs_list txq member of the
> structure
> > > > > becomes corrupt (e.g. if ovs_list_is_empty() would have returned
> > > > > false before the copy, it will return true after the copy, which
> > > > > will lead to a crash when the bogus packet buffer on the list is
> > > > > dereferenced).
> > > > >
> > > > > Fix by taking a hint from David Wheeler and adding a level of
> > > > > indirection.
> > > > >
> > > > > Signed-off-by: Lance Richardson <lrich...@redhat.com>
> > >
> > > [snip]
> > >
> > > >
> > > > Here is a small script that reliably reproduces the crash in
> > > ovs-vswitchd.
> > > > I don't have an explanation for why we have two connections to the
> same
> > > > port in the "ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS" test case
> (this
> > > > happens infrequently), perhaps it's something like the active connect
> > > from
> > > > a peer ovs-vswitchd being interrupted and re-tried.
> > > >
> > > > #!/bin/bash
> > > >
> > > > set -ex
> > > >
> > > > PWD=$(pwd)
> > > > export PATH=${PWD}/vswitchd:${PWD}/utilities:${PWD}/ovsdb:$PATH
> > > >
> > > > ovs_setenv() {
> > > > export OVS_RUNDIR=${PWD}/$1
> > > > export OVS_LOGDIR=${PWD}/$1
> > > > export OVS_DBDIR=${PWD}/$1
> > > > export OVS_SYSCONFDIR=${PWD}/$1
> > > > export OVS_PKGDATADIR=${PWD}/$1
> > > > }
> > > >
> > > > for x in repro_main repro_hv1 repro_hv2; do
> > > > mkdir -p $x
> > > > rm -f $x/*
> > > > ovs_setenv $x
> > > > : > $OVS_RUNDIR/.conf.db.~lock~
> > > > ovsdb-tool create $OVS_RUNDIR/conf.db vswitchd/vswitch.ovsschema
> > > > ovsdb-server --remote=punix:$OVS_RUNDIR/db.sock -vconsole:off
> > > > --detach --no-chdir --pidfile --log-file
> > > > ovs-vsctl --no-wait -- init
> > > > ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -
> > > > vunixctl -vconsole:off --detach --no-chdir --pidfile --log-file
> > > > done
> > > >
> > > > ovs_setenv repro_main
> > > > ovs-vsctl add-br foo \
> > > > -- add-port foo p1 \
> > > > -- set Interface p1
> > > options:pstream="punix:$PWD/repro_main/p1.sock"
> > > >
> > > > ovs_setenv repro_hv1
> > > > ovs-vsctl add-br br1 \
> > > > -- add-port br1 p1 \
> > > > -- set Interface p1
> > > options:stream="unix:$PWD/repro_main/p1.sock"
> > > >
> > > > ovs_setenv repro_hv2
> > > > ovs-vsctl add-br br1 \
> > > > -- add-port br1 p1 \
> > > > -- set Interface p1
> > > options:stream="unix:$PWD/repro_main/p1.sock"
> > >
> > > I like this, but I think I'm going to be consistent and ask if you
> > > can spin a new version with the above script as a unit test so that
> > > we can see the crash before applying the rest of the patch and then
> > > verify that it doesn't happen with the patch.
> > >
> > > Ryan
> > >
> >
> > Hi Ryan,
> >
> > Thanks for the review. I'm a little unclear on whether it's a good idea
> > to add a test case that doesn't verify any functionality and is no longer
> > useful as soon as it is merged. I was rather hoping that the script
> provided
> > above would serve the "see the crash before applying" function.
> >
> > I'm fine with writing and posting such a test case if that's the policy,
> > but I do wonder how many new test cases we'll have if one is written for
> > every crash that has been fixed, and how effective this will be at
> detecting
> > future crash causes given the enormous number of ways code can be changed
> > to produce a crash.
> >
> > On the other hand it probably would be a good idea to add test cases to
> > verify netdev-dummy functionality (especially for things like multiple
> > passive connection support that aren't already used in other test cases).
> > This is probably a larger effort though.
>
> My reasoning (and it is purely personal, I'm not a committer or anything :)
> was that since we *know* there is a crash lurking there, a canary test
> would be useful to make sure we don't regress and allow it to happen
> again...
>
> If others think that I'm being overly-paranoid, then I'll happily drop the
> request.
>
> Ryan
>
OK, thanks Ryan... will wait for input from other reviewers.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev