On 5 April 2017 at 00:38, Roi Dayan <r...@mellanox.com> wrote:
>
>
> On 05/04/2017 07:18, Roi Dayan wrote:
>>
>>
>>
>> On 04/04/2017 23:53, Joe Stringer wrote:
>>>
>>> On 3 April 2017 at 10:53, Joe Stringer <j...@ovn.org> wrote:
>>>>
>>>> On 3 April 2017 at 03:27, Roi Dayan <r...@mellanox.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 29/03/2017 20:13, Joe Stringer wrote:
>>>>>>
>>>>>>
>>>>>> On 29 March 2017 at 04:50, Roi Dayan <r...@mellanox.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 23/03/2017 09:01, Joe Stringer wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I ran the make check-offloads tests on a recent net-next kernel
>>>>>>>> and it
>>>>>>>> failed, output was not as expected:
>>>>>>>>
>>>>>>>> ../../tests/system-offloaded-traffic.at:54
>>>>>>>> <http://system-offloaded-traffic.at:54>: ovs-appctl
>>>>>>>> dpctl/dump-flows |
>>>>>>>> grep "eth_type(0x0800)" | sed -e
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 's/used:[0-9].[0-9]*s/used:0.001s/;s/eth(src=[a-z0-9:]*,dst=[a-z0-9:]*)/eth(mac
>>>>>>>>
>>>>>>>> s)/;s/actions:[0-9,]*/actions:output/;s/recirc_id(0),//' | sort
>>>>>>>> --- - 2017-03-22 16:43:37.598689692 -0700
>>>>>>>> +++
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/vagrant/ovs/_build-clang/tests/system-offloads-testsuite.dir/at-groups/2/stdout
>>>>>>>>
>>>>>>>> 2017-03-22 16:43:37.595628000 -0700
>>>>>>>> @@ -1,3 +1,3 @@
>>>>>>>> -in_port(2),eth(macs),eth_type(0x0800), packets:9, bytes:756,
>>>>>>>> used:0.001s, actions:output
>>>>>>>> -in_port(3),eth(macs),eth_type(0x0800), packets:9, bytes:756,
>>>>>>>> used:0.001s, actions:output
>>>>>>>> +in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9,
>>>>>>>> bytes:882, used:0.001s, actions:output
>>>>>>>> +in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9,
>>>>>>>> bytes:882, used:0.001s, actions:output
>>>>>>>>
>>>>>>>
>>>>>>> Hi Joe,
>>>>>>>
>>>>>>> can you tell me what kernel you used here?
>>>>>>> maybe tc offloads were not supported and there was a fallback to
>>>>>>> OVS dp.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I believe that it was a snapshot of net-next relatively recently,
>>>>>> 01461abe62df ("Merge branch 'fib-notifications-cleanup'"). I could try
>>>>>> again with latest net-next? Or do you think there may be some
>>>>>> userspace dependency the test relies on?
>>>>>>
>>>>>
>>>>> I installed net-next kernel and make check-offloads pass for me.
>>>>> The last commit I'm on is
>>>>> 397df70 Merge branch '40GbE' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
>>>>> last tag is 4.11-rc3+
>>>>> I'm thinking maybe the test fails for you from something else like a
>>>>> second
>>>>> openvswitch process running already?
>>>>
>>>>
>>>> I don't think that was the case, but let me try again with your latest
>>>> series and the above commit.
>>>
>>>
>>> I saw the same behaviour with upstream net-next 397df7092a15. My host
>>> is Ubuntu 14.04 with this kernel.
>>>
>>> I thought it might be because I'm not running any of your hardware and
>>> I assumed that the testsuite doesn't require hardware to run. Looking
>>> at the test it seems that assumption was wrong, but when I tried to
>>> configure the tc-policy to skip_hw with the following modification,
>>> the OVSDB change didn't seem to propagate into OVS (there were no log
>>> messages about changing the tc-policy):
>>>
>>> diff --git a/tests/system-offloaded-traffic.at
>>> b/tests/system-offloaded-traffic.at
>>> index 7aec8a3f430e..3ddf23a939a8 100644
>>> --- a/tests/system-offloaded-traffic.at
>>> +++ b/tests/system-offloaded-traffic.at
>>> @@ -40,6 +40,7 @@ AT_SETUP([offloads - ping between two ports -
>>> offloads enabled])
>>> OVS_TRAFFIC_VSWITCHD_START()
>>>
>>> AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:hw-offload=true])
>>> +AT_CHECK([ovs-vsctl set Open_vSwitch .
>>> other_config:tc-policy="skip_hw"])
>>> AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>>>
>>> ADD_NAMESPACES(at_ns0, at_ns1)
>>>
>>> ---
>>>
>>> Looking again, my kernel config had CLS_FLOWER disabled so that's
>>> probably what caused the issue. My ovs-vswitchd log from the test is
>>> below.
>>>
>>> 2017-04-04T20:41:50.737Z|00001|vlog|INFO|opened log file
>>>
>>> /home/joe/git/openvswitch/_build-gcc/tests/system-offloads-testsuite.dir/2/ovs-vswitchd.log
>>>
>>> 2017-04-04T20:41:50.737Z|00002|ovs_numa|INFO|Discovered 2 CPU cores on
>>> NUMA node 0
>>> 2017-04-04T20:41:50.737Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes
>>> and 2 CPU cores
>>>
>>> 2017-04-04T20:41:50.738Z|00004|reconnect|INFO|unix:/home/joe/git/openvswitch/_build-gcc/tests/system-offloads-testsuite.dir/2/db.sock:
>>>
>>> connecting...
>>>
>>> 2017-04-04T20:41:50.738Z|00005|reconnect|INFO|unix:/home/joe/git/openvswitch/_build-gcc/tests/system-offloads-testsuite.dir/2/db.sock:
>>>
>>> connected
>>> 2017-04-04T20:41:50.743Z|00006|bridge|INFO|ovs-vswitchd (Open vSwitch)
>>> 2.7.90
>>> 2017-04-04T20:41:50.757Z|00007|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports recirculation
>>> 2017-04-04T20:41:50.757Z|00008|ofproto_dpif|INFO|system@ovs-system:
>>> MPLS label stack length probed as 1
>>> 2017-04-04T20:41:50.757Z|00009|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports truncate action
>>> 2017-04-04T20:41:50.757Z|00010|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports unique flow ids
>>> 2017-04-04T20:41:50.757Z|00011|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath does not support clone action
>>> 2017-04-04T20:41:50.757Z|00012|ofproto_dpif|INFO|system@ovs-system:
>>> Max sample nesting level probed as 10
>>> 2017-04-04T20:41:50.757Z|00013|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports ct_state
>>> 2017-04-04T20:41:50.757Z|00014|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports ct_zone
>>> 2017-04-04T20:41:50.757Z|00015|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports ct_mark
>>> 2017-04-04T20:41:50.757Z|00016|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports ct_label
>>> 2017-04-04T20:41:50.757Z|00017|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports ct_state_nat
>>> 2017-04-04T20:41:50.757Z|00018|ofproto_dpif|INFO|system@ovs-system:
>>> Datapath supports ct_orig_tuple
>>>
>>> 2017-04-04T20:41:50.758Z|00001|ofproto_dpif_upcall(handler1)|INFO|received
>>>
>>> packet on unassociated datapath port 0
>>> 2017-04-04T20:41:50.762Z|00019|bridge|INFO|bridge br0: added interface
>>> br0 on port 65534
>>> 2017-04-04T20:41:50.762Z|00020|bridge|INFO|bridge br0: using datapath
>>> ID 00003625ace05f40
>>> 2017-04-04T20:41:50.762Z|00021|connmgr|INFO|br0: added service
>>> controller
>>>
>>> "punix:/home/joe/git/openvswitch/_build-gcc/tests/system-offloads-testsuite.dir/2/br0.mgmt"
>>>
>>> 2017-04-04T20:41:50.774Z|00022|netdev|INFO|netdev: Flow API Enabled
>>> 2017-04-04T20:41:50.774Z|00023|tc|INFO|tc: Using policy 'none'
>>> 2017-04-04T20:41:50.789Z|00024|vconn|DBG|unix: sent (Success):
>>> OFPT_HELLO (OF1.5) (xid=0x1):
>>>  version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05, 0x06
>>> 2017-04-04T20:41:50.789Z|00025|vconn|DBG|unix: received: OFPT_HELLO
>>> (xid=0x1):
>>>  version bitmap: 0x01
>>> 2017-04-04T20:41:50.789Z|00026|vconn|DBG|unix: negotiated OpenFlow
>>> version 0x01 (we support version 0x06 and earlier, peer supports
>>> version 0x01)
>>> 2017-04-04T20:41:50.789Z|00027|vconn|DBG|unix: received: OFPT_FLOW_MOD
>>> (xid=0x2): ADD actions=NORMAL
>>> 2017-04-04T20:41:50.789Z|00028|vconn|DBG|unix: received:
>>> OFPT_BARRIER_REQUEST (xid=0x3):
>>> 2017-04-04T20:41:50.789Z|00029|vconn|DBG|unix: sent (Success):
>>> OFPT_BARRIER_REPLY (xid=0x3):
>>> 2017-04-04T20:41:50.790Z|00030|connmgr|INFO|br0<->unix: 1 flow_mods in
>>> the last 0 s (1 adds)
>>> 2017-04-04T20:41:50.866Z|00031|netdev_tc_offloads|INFO|added ingress
>>> qdisc to ovs-p0
>>> 2017-04-04T20:41:50.866Z|00032|bridge|INFO|bridge br0: added interface
>>> ovs-p0 on port 1
>>> 2017-04-04T20:41:50.894Z|00002|dpif_netlink(handler1)|ERR|failed
>>> adding flow: No such file or directory
>>> 2017-04-04T20:41:50.926Z|00033|netdev_tc_offloads|INFO|added ingress
>>> qdisc to ovs-p1
>>> 2017-04-04T20:41:50.926Z|00034|bridge|INFO|bridge br0: added interface
>>> ovs-p1 on port 2
>>> 2017-04-04T20:41:50.969Z|00003|dpif_netlink(handler1)|ERR|failed
>>> adding flow: No such file or directory
>>> 2017-04-04T20:41:50.971Z|00004|dpif_netlink(handler1)|ERR|failed
>>> adding flow: No such file or directory
>>> 2017-04-04T20:41:50.972Z|00005|dpif_netlink(handler1)|ERR|failed
>>> adding flow: No such file or directory
>>> 2017-04-04T20:41:51.909Z|00035|unixctl|DBG|received request
>>> dpctl/dump-flows["type=ovs"], id=0
>>> 2017-04-04T20:41:51.909Z|00036|netdev_tc_offloads|INFO|added ingress
>>> qdisc to ovs-p0
>>> 2017-04-04T20:41:51.909Z|00037|netdev_tc_offloads|INFO|added ingress
>>> qdisc to ovs-p1
>>> 2017-04-04T20:41:51.910Z|00038|unixctl|DBG|replying with success,
>>> id=0:
>>>
>>> "recirc_id(0),in_port(2),eth(src=36:ac:11:23:b4:67,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.1.1.1,tip=10.1.1.2,op=1/0xff),
>>>
>>> packets:0, bytes:0, used:never, actions:1,3
>>>
>>> recirc_id(0),in_port(3),eth(src=5a:7d:e9:0f:21:58,dst=36:ac:11:23:b4:67),eth_type(0x0806),
>>>
>>> packets:0, bytes:0, used:never, actions:2
>>>
>>> recirc_id(0),in_port(3),eth(src=5a:7d:e9:0f:21:58,dst=33:33:ff:0f:21:58),eth_type(0x86dd),ipv6(frag=no),
>>>
>>> packets:0, bytes:0, used:never, actions:1,2
>>>
>>> recirc_id(0),in_port(3),eth(src=5a:7d:e9:0f:21:58,dst=36:ac:11:23:b4:67),eth_type(0x0800),ipv4(frag=no),
>>>
>>> packets:9, bytes:882, used:0.012s, actions:2
>>>
>>> recirc_id(0),in_port(2),eth(src=36:ac:11:23:b4:67,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(frag=no),
>>>
>>> packets:1, bytes:90, used:0.092s, actions:1,3
>>>
>>> recirc_id(0),in_port(3),eth(src=5a:7d:e9:0f:21:58,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(frag=no),
>>>
>>> packets:1, bytes:90, used:0.284s, actions:1,2
>>>
>>> recirc_id(0),in_port(2),eth(src=36:ac:11:23:b4:67,dst=33:33:ff:23:b4:67),eth_type(0x86dd),ipv6(frag=no),
>>>
>>> packets:0, bytes:0, used:never, actions:1,3
>>>
>>> recirc_id(0),in_port(2),eth(src=36:ac:11:23:b4:67,dst=5a:7d:e9:0f:21:58),eth_type(0x0800),ipv4(frag=no),
>>>
>>> packets:9, bytes:882, used:0.012s, actions:3
>>> "
>>> 2017-04-04T20:41:51.918Z|00039|netdev_linux|WARN|ethtool command
>>> ETHTOOL_GSET on network device ovs-p1 failed: No such device
>>> 2017-04-04T20:41:51.922Z|00040|bridge|WARN|could not open network
>>> device ovs-p1 (No such device)
>>>
>>> 2017-04-04T20:41:51.927Z|00001|netdev_linux(revalidator3)|WARN|ioctl(SIOCGIFINDEX)
>>>
>>> on ovs-p1 device failed: No such device
>>>
>>> 2017-04-04T20:41:51.927Z|00002|netdev_tc_offloads(revalidator3)|ERR|failed
>>>
>>> to get ifindex for ovs-p1: No such device
>>> 2017-04-04T20:41:51.927Z|00003|dpif_netlink(revalidator3)|ERR|failed
>>> adding flow: No such device
>>> 2017-04-04T20:41:51.927Z|00004|dpif_netlink(revalidator3)|ERR|failed
>>> adding flow: No such device
>>> 2017-04-04T20:41:51.927Z|00005|dpif_netlink(revalidator3)|ERR|failed
>>> adding flow: No such device
>>> 2017-04-04T20:41:51.927Z|00006|dpif_netlink(revalidator3)|ERR|failed
>>> adding flow: No such device
>>>
>>> 2017-04-04T20:41:51.928Z|00007|netdev_tc_offloads(revalidator3)|ERR|failed
>>>
>>> to get ifindex for ovs-p1: No such device
>>>
>>> 2017-04-04T20:41:51.933Z|00008|netdev_tc_offloads(revalidator3)|ERR|failed
>>>
>>> to get ifindex for ovs-p1: No such device
>>> 2017-04-04T20:41:51.941Z|00001|fatal_signal(urcu2)|WARN|terminating
>>> with signal 15 (Terminated)
>>>
>>> Do you think it's worth doing better detection that CLS_FLOWER is
>>> unavailable and attempting to recover a bit better from that? The log
>>> messages certainly suggest something went wrong, but don't really
>>> point in the right direction.
>>>
>>
>>
>> Hi Joe,
>>
>> Thanks for pointing this out. We'll check about better handling if
>> cls_flower is not available.
>
>
>
> I looked at this a little and error can be from cls_flower or any module
> used with it. e.g. act_vlan, act_tunnel, etc.
> All those modules are being auto loaded when needed.
> if a module fails to load we get ENOENT. Though a module can also return
> ENOENT as well.

These are all cases, but my case was even further; I didn't have
CLS_FLOWER available as module at all.

ENOENT seems a bit strange to indicate lack of support, I wonder if we
can do something about that in the kernel (-EOPNOTSUPP)?

> We could log an warning once if we receive ENOENT that it might be the
> module is not loaded. like openvswitch log if module not loaded.
> "WARN|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch
> kernel module is probably not loaded."

That sounds reasonable.

> I'm not sure we want ovs-vswitchd to try and pre load all the possibly
> relevant modules. i.e. if a user will never use vlan then act_vlan is not
> needed and normally will not be auto loaded. Also this requires maintaining
> in case more modules are needed later.

If the kernel will autoload the relevant modules when they get used,
then I don't see a reason to add this logic to OVS.

> Should we maybe only try to load the basic module that is always needed,
> cls_flower?
> Can we discuss the options here without letting this block the
> submission? this could be in a later fix.

ovs-vswitchd doesn't even try to load 'openvswitch', so it's probably
reasonable to say that this should be left up to the init scripts and
so on, just like the openvswitch module.

> We didn't update any openvswitch documentation yet but this can be
> documented, that we require cls_flower and related modules and have a
> troubleshooting section corresponding to this error. what do you think?

Sure. Useful documentation is always good.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to