On Thu, May 7, 2015 at 9:22 AM, Oleg Strikov <oleg.strikov at canonical.com> wrote: > Hi DPDK users and developers, > > Few weeks ago I came up with the idea to run openvswitch with dpdk backend > inside qemu-kvm virtual machine. I don't have enough supported NICs yet and > my plan was to start experimenting inside the virtualized environment, > achieve functional state of all the components and then switch to the real > hardware. Additional useful side-effect of doing things inside the vm is > that issues can be easily reproduced by someone else in a different > environment. > > I (fondly) hoped that running openvswitch/dpdk inside the vm would be > simpler than running the same set of components on the real hardware. > Unfortunately I met a bunch of issues on the way. All these issues lie on a > borderline between dpdk and openvswitch but I think that you might be > interested in my story. Please note that I still don't have > openvswitch/dpdk working inside the vm. I definetely have some progress > though. > Thanks for summarizing all the issues. DPDK is testing is done on real hardware and we are planing testing it in VM. This will certainly help in fixing issues sooner.
> Q: Does it sound okay from functional (not performance) standpoint to run > openvswitch/dpdk inside the vm? Do we want to be able to do this? Does > anyone from the dpdk development team do this? > > ## Issue 1 ## > > Openvswitch requires backend pmd driver to provide N_CORES tx queues where > N_CORES is the amount of cores available on the machine (openvswitch counts > the amount of cpu* entries inside /sys/devices/system/node/node0/ folder). > To my understanding it doesn't take into account the actual amount of cores > used by dpdk and just allocates tx queue for each available core. You may > refer to this chunk of code for details: > https://github.com/openvswitch/ovs/blob/master/lib/dpif-netdev.c#L1067 > In case of OVS DPDK, there is no dpdk thread. Therefore all polling cores are managed by OVS and there is no need to account cores for DPDK. You can assign specific cores for OVS to limit number of cores used by OVS. > This approach works fine on the real hardware but makes some issues when we > run openvswitch/dpdk inside the virtual machine. I tried both emulated > e1000 NIC and virtio NIC and neither of them worked just from the box. > Emulated e1000 NIC doesn't support multiple tx queues at all (see > http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_e1000/em_ethdev.c#n884) and > virtio NIC doesn't support multiple tx queues by default. To enable > multiple tx queue for virtio NIC I had to add the following line to the > interface section of my libvirt config: '<driver name="vhost" queues="4"/>' > Good point. We should document this. Can you send patch to update README.DPDK? > ## Issue 2 ## > > Openvswitch calls rte_eth_tx_queue_setup() twice for the same > port_id/queue_id. First call takes place during device initialization (see > call to dpdk_eth_dev_init() inside netdev_dpdk_init(): > https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L522). > Second call takes place when openvswitch tries to add more tx queues to the > device (see call to dpdk_eth_dev_init() inside netdev_dpdk_set_multiq(): > https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L697). > Second call not only initialized new queues but tries to re-initialize > existing ones. > > Unfortunately virtio driver can't handle second call of > rte_eth_tx_queue_setup() and returns error here: > http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_virtio/virtio_ethdev.c#n316 > This happens because memzone with the name portN_tvqN already exists when > second call takes place (memzone has been created during the first call). > To deal with this issue I had to manually add rte_memzone_lookup-based > check for this situation and avoid allocation of a new memzone if it > already exists. > This sounds like issue with virtIO driver. I think we need to fix DPDK upstream for this to work correctly. > Q: Is it okay that openvswitch calls rte_eth_tx_queue_setup() twice? Right > now I can't understand if it's the issue with the virtio pmd driver or > incorrect API usage by openvswitch? Could someone shed some light on this > so I can move forward and maybe propose a fix. > > ## Issue 3 ## > > This issue is also (somehow) related to the fact that openvswitch calls > rte_eth_tx_queue_setup() twice. I fix the previous issue by the method > described above and initialization finishes. The whole machinery starts to > work but crashes at the very beginning (while fetching the first packet > from the NIC maybe). This crash happens here: > http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_virtio/virtio_rxtx.c#n588 > It takes place because vq_ring structure contains zeros instead of correct > values: > vq_ring = {num = 0, desc = 0x0, avail = 0x0, used = 0x0} > My understanding is that vq_ring gets initialized after the first call to > rte_eth_tx_queue_setup(), then overwritten by the second call to > rte_eth_tx_queue_setup() but without an appropriate initialization for the > second time. I'm trying to fix this issue right now. > This also sounds like DPDK issue. > Q: Does it sound like a realistic goal to make virtio driver work in > openvswitch-like scenarios? I'm definitely not an expert in the area of > dpdk and can't estimate time and resources required. Maybe it's better to > wait until I get a proper hardware? > It will be nice to make OVS-DPDK work in VM. As I said I am also planning on working on it. Thanks for the heads up.