On 12/3/21 17:47, Van Haaren, Harry wrote:
>> -----Original Message-----
>> From: David Marchand <david.march...@redhat.com>
>> Sent: Thursday, December 2, 2021 3:18 PM
>> To: Van Haaren, Harry <harry.van.haa...@intel.com>
>> Cc: Amber, Kumar <kumar.am...@intel.com>; d...@openvswitch.org;
>> i.maxim...@ovn.org; f...@sysclose.org; maxime.coque...@redhat.com
>> Subject: Re: [ovs-dev] [PATCH v3 2/4] system-dpdk: Use dummy-pmd port for
>> packet injection.
>>
>> On Thu, Dec 2, 2021 at 2:56 PM Van Haaren, Harry
>> <harry.van.haa...@intel.com> wrote:
>>>
>>>> -----Original Message-----
>>>> From: dev <ovs-dev-boun...@openvswitch.org> On Behalf Of David
>> Marchand
>>>> Sent: Thursday, December 2, 2021 12:21 PM
>>>> To: Amber, Kumar <kumar.am...@intel.com>
>>>> Cc: d...@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org;
>>>> maxime.coque...@redhat.com
>>>> Subject: Re: [ovs-dev] [PATCH v3 2/4] system-dpdk: Use dummy-pmd port
>> for
>>>> packet injection.
>>>>
>>>> On Wed, Dec 1, 2021 at 3:52 PM Amber, Kumar <kumar.am...@intel.com>
>>>> wrote:
>>>>>> diff --git a/tests/genpkts.py b/tests/genpkts.py new file mode 100755
>> index
>>>>>> 0000000000..f64f786ccb
>>>>>> --- /dev/null
>>>>>> +++ b/tests/genpkts.py
>>>>>> @@ -0,0 +1,56 @@
>>>>>> +#!/usr/bin/env python3
>>>>>> +
>>>>>> +import sys
>>>>>> +
>>>>>> +from scapy.all import RandMAC, RandIP, RandIP6, RandShort, fuzz from
>>>>>> +scapy.all import IPv6, Dot1Q, IP, Ether, UDP, TCP
>>>>>> +
>>>>>> +if len(sys.argv) < 2:
>>>>>> +    print('usage: {} packets_count [fuzz]'.format(sys.argv[0]))
>>>>>> +    sys.exit(1)
>>>>>> +
>>>>>> +tmpl = []
>>>>>> +
>>>>>> +if len(sys.argv) == 2:
>>>>>> +    eth = Ether(dst='ff:ff:ff:ff:ff:ff')
>>>>>> +    vlan = eth / Dot1Q(vlan=1)
>>>>>> +    p = eth / IP() / TCP(sport=20, dport=80, flags='SA', window=8192)
>>>>>> +    tmpl += [p.build().hex()]
>>>>>> +    p = eth / IP() / UDP(sport=53, dport=53)
>>>>>> +    tmpl += [p.build().hex()]
>>>>>> +    p = eth / IP() / TCP(sport=20, dport=80, flags='S', window=8192)
>>>>>> +    tmpl += [p.build().hex()]
>>>>>> +    p = eth / IP() / UDP(sport=53, dport=53)
>>>>>> +    tmpl += [p.build().hex()]
>>>>>> +    p = vlan / IP() / UDP(sport=53, dport=53)
>>>>>> +    tmpl += [p.build().hex()]
>>>>>> +    p = vlan / IP() / TCP(sport=20, dport=80, flags='S', window=8192)
>>>>>> +    tmpl += [p.build().hex()]
>>>>>
>>>>> Hardcoding the values here is not preferable as we wanted to test the
>>>> optimized implementations
>>>>> with various values contained inside the header.
>>>>
>>>> Those hardcoded values comes from the pcap file that was previously used.
>>>> If you want to add more protocols, it is easier with this patch as you
>>>> only need to update some python script rather than rewrite a pcap
>>>> file.
>>>
>>> The PCAP was written by a script, which generated random MAC addresses;
>>>   previously:  eth = Ether(src=RandMAC(), dst=RandMAC())
>>
>> We have 3 MFEX tests.
>>
>> OVS-DPDK - MFEX Autovalidator
>> OVS-DPDK - MFEX Autovalidator fuzzy
>> OVS-DPDK - MFEX Configuration
>>
>> In the first and 3rd ones, a pcap file tests/pcap/mfex_test.pcap
>> (committed to OVS sources) was used before this patch.
>> The hunk above from my patch is about replacing this "hardcoded" pcap
>> binary file with a python script that generates hardcoded packets.
> 
> Apologies, my misunderstanding - I was under the impression the intent
> was to replace all of them.
> 
>> For the 2nd test, if you look at the patch, you'll notice that the
>> fuzzy part is handled like before and generate random packets.
> 
> Understood - gotcha.
> 
>>> I do not see the value add here, and I'm concerned that when regressions in
>> these methods are pointed
>>> out that these are not acted on, but instead we are told "this hardcoded
>> method is better". It is not.
>>
>> Read what I wrote as: "hardcoded python" is better than "hardcoded
>> pcap" binary file.
> 
> Yes, agree.
> 
>>>>>> +on_exit "pkill -f -x -9 '$PYTHON3 $srcdir/genpkts.py -1 fuzz'"
>>>>>> +($PYTHON3 $srcdir/genpkts.py -1 fuzz | while read pkt; do
>>>>>> +     ovs-appctl netdev-dummy/receive p1 "$pkt" || break
>>>>>> + done) &
>>>>>> +
>>>>>>  AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator],
>> [0],
>>>> [dnl
>>>>>> Miniflow extract implementation set to autovalidator.
>>>>>>  ])
>>>>>>
>>>>>> -OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep -oP
>>>>>> 'rx_packets=\s*\K\d+'` -ge 100000])
>>>>>
>>>>> We should increase the packet count to at-least 10x the current number to
>>>> have a proper fuzzy testing and we have measured it would only take 10~ to
>> 15~
>>>> sec more. The current runtime did not catch issues when we purposely broke
>> the
>>>> implementation and by allowing to run for 10000 packets, it did catch the
>> induced
>>>> error.
>>>>
>>>> For me, the fuzzy testing does not have its place in a CI, because it
>>>> is not reproducible.
>>>> I let it in place and just made sure it would not reach the timeout.
>>>
>>> Disagree here, Fuzzing is *ideal* for CI, as it tests different inputs 
>>> continuously,
>>> and each CI run improves the confidence in the system. A large number of
>> open
>>> source projects are actively doing large-scale fuzzing in CI instances.
>>> e.g.; https://google.github.io/clusterfuzz/
>>>
>>> If the fuzzing autotests fails in CI, it still flags that there is *an 
>>> issue*. In our
>>> case with AutoValidators for DPCLS and MFEX, it even prints a whole debug 
>>> log
>>> of "good" miniflow, as well as "bad" results of the optimized 
>>> implementation.
>>> These VLOG_ERR results would be hugely helpful in identifying & debugging.
>>>
>>> I do not understand the motivation for disabling/limiting fuzzing in CI.
>>
>> Fuzzing is sure a cool thing when run on an identified version to stress it.
>>
>> But those unit tests will be used for gating patches sent on the mailing 
>> list.
>> Having non reproducible input means the test may flag a patch as
>> failing because of some previously merged change that did introduce
>> the regression.

I agree that fuzzing is a great testing method and OVS actually uses it
via OSS-Fuzz project.  But I also agree that unit/system tests should be
as stable as possible, i.e. not be flaky.  Hence, fuzzing should not
really be part of them.

Also, fuzzing is not a replacement for unit tests, because it's unreliable.
For example, I tried to generate the pcap file with mfex_fuzz script and
didn't found a single valid TCP packet in it.  So, it doesn't cover even
very basic cases of valid traffic, and obviously any special cases of
valid traffic are not covered too.

Not in this patch set, but in general, we need at least pass all the
packets generated by tests/flowgen.py through the autovalidator.

Harry, Is that something you can work on?

>>
>> I did not remove the fuzzing part, just made it so it runs in a
>> reasonable amount of time.

This looks reasonable to me.  See below.

>>
>>
>>
>>>
>>>> On the system I used, this test takes 5s with 1k and timeouts with 10k.
>>>> So I guess the 10s/15s evaluation is dependent on the system.
>>>>
>>>> I prefer to stick to current value.
>>>>
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> David Marchand
>>>
>>> Removing automated randomness from Fuzzy testing, and removing/limiting
>> fuzz-testing in CI runs
>>> are both big steps backwards in my view, please reconsider what this patch 
>>> is
>> actually trying to achieve.
>>
>> This patch removes dependency on DPDK pcap to inject traffic in OVS.
>> OVS has its own mechanism and this dependency is unjustified.
>>
>> That's what this patch is about.
>>
>> What do you propose so that we can move forward?
> 
> I guess this is a tradeoff, we have fuzzing quantity & validation vs time.
> 
> This patch reduces the performance of the fuzzing (due to dummy-inject instead
> of PCAP PMD read with DPDK+PCAP dependency), and trades off the amount
> of fuzzing that will be done in the CI as a result.
> 
> Given that the other MFEX fuzzing tests are still valuable for stress-testing 
> using
> the DPDK PCAP PMD, I wonder is there enough value to have an option of PCAP
> or dummy-inject instead of removing the PCAP approach?
> 
> For example, can we runtime-detect if OVS is DPDK-PCAP enabled? And use the
> high performance method in that case?

I think the end goal should be to remove the DPDK dependency entirely,
so tests can be moved out of system-dpdk (patch #3), because they are
not actual system tests (in OVS's meaning of that term).

This move will compensate the reduced number of fuzzy runs with the
increased number of testsuite runs in general, since people who actually
has the hardware will run these tests while running generic testsuite.
I don't think that a lot of people actually runs system tests.

For your in-house testing, Harry, I assume, you might just run mfex tests
as many times as you think you need (Assuming that packets are actually
different every time, otherwise the whole fuzzy testing concept is kind
of pointless) until the right solution for fuzzing is in place.  That
should not be hard to do.

The right solution for the fuzzing should be: implement a fuzzing target
for the OSS-Fuzz and let it do its job.  All the found issues could be
added as regression tests to the generic testsuite along with other unit
tests.  It's a current workflow for all fuzzing tests we have.
See tests/oss-fuzz/ and tests/fuzz-regression/ directories for details.

Harry, Amber, can you work on that?

Once this is done, we can remove all the fuzzing parts for autovalidator
from the current testsuite keeping only static non-random tests.

For the current patch: Taking into account all that I said, I don't think
that reduced number of fuzzy packets should be a blocker here.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to