[dpdk-dev] [PATCH v2 0/5] add dpdk packet capture support for tcpdump

2016-02-24 Thread Pavel Fedin
 Hello!

> >  2. What if i don't want separate RX and TX streams either? It only 
> > prevents me
> > from seeing the complete picture.
> 
> Do you mean not to have separate pcap files for tx and rx? If so, I would 
> prefer to keep this
> as it is.

 I mean - add an option not to have separate files.

> Because pcap changes need to be replaced with TUN/TAP pmd once available in 
> future.

 I believe it's lo-o-o-ong way to get there... 

> >  3. vhostuser ports are missing. Perhaps not really related to this 
> > patchset, i just
> > don't know how much code "server" part of vhostuser shares with normal PMDs,
> > but anyway, ability to dump them too would be nice to have.
> >
> 
> I think this can be done in future i.e. when vhost as PMD is available. But 
> as of now vhost
> is library.

 I expected "server"-side vhost to be the same as "client" part (AKA virtio), 
just use another mechanism for exchanging control
information (via socket). Is it not true? I suppose, driving queues from both 
sides should be quite symmetric.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH v2 0/5] add dpdk packet capture support for tcpdump

2016-02-18 Thread Pavel Fedin
 Hello!

 With the aforementioned fix (disabling src_ip_filter  if zero) i've got the 
patch series working. Now i have some more notes on
usability:

> 2)Start proc_info(runs as secondary process by default)application with new 
> parameters for
> tcpdump.
> ex: sudo ./build/app/proc_info/dpdk_proc_info -c 0x4 -n 2 -- -p 0x3 --tcpdump 
> '(0,0)(1,0)' --
> src-ip-filter="2.2.2.2"

 1. Perhaps, ability to separate queues is useful for something. But not 
always. For example, what if i want to capture all the
traffic which passes through some interface (common use case)? For example, 
with OpenVSwitch i can have 9 queues on my networking
card. So, i have to enumerate all of them: (0,0)(0,1)(0,2)... It's insane and 
inconvenient with many queues. What if you could have
shorthand notation, like (0) or (0,*) for this?
 2. What if i don't want separate RX and TX streams either? It only prevents me 
from seeing the complete picture.
 3. vhostuser ports are missing. Perhaps not really related to this patchset, i 
just don't know how much code "server" part of
vhostuser shares with normal PMDs, but anyway, ability to dump them too would 
be nice to have.

 Not directly related, but could we have some interface to tcpdump or 
wireshark? Would be good to have ability to dump packets in
real time.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH v2 4/5] lib/librte_eal: add tcpdump support in primary process

2016-02-17 Thread Pavel Fedin
 Hello!

> +static int
> +compare_filter(struct rte_mbuf *pkt)
> +{
> + struct ipv4_hdr *pkt_hdr = rte_pktmbuf_mtod_offset(pkt, struct ipv4_hdr 
> *,
> + sizeof(struct ether_hdr));
> + if (pkt_hdr->src_addr != src_ip_filter)
> + return -1;
> +
> + return 0;
> +}

 Some critics to this...
 What if i want to capture packets coming from more than one host?
 What if i want to capture all packets?
 What if it's not IPv4 at all?

 May be this function should always return 0 if src_ip_filter == 0? This would 
at least be a quick way to disable filtering.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH v2 2/5] drivers/net/pcap: add public api to create pcap device

2016-02-17 Thread Pavel Fedin
 Hello!

> diff --git a/drivers/net/pcap/rte_pmd_pcap_version.map
> b/drivers/net/pcap/rte_pmd_pcap_version.map
> index ef35398..104dc4d 100644
> --- a/drivers/net/pcap/rte_pmd_pcap_version.map
> +++ b/drivers/net/pcap/rte_pmd_pcap_version.map
> @@ -2,3 +2,11 @@ DPDK_2.0 {
> 
>   local: *;
>  };
> +
> +DPDK_2.3 {
> + global:
> +
> + rte_eth_from_pcapsndumpers;
> +
> +} DPDK_2.0;
> +

 This one produces style warning upon git am:
--- cut ---
Applying: drivers/net/pcap: add public api to create pcap
/home/p.fedin/dpdk/.git/rebase-apply/patch:333: new blank line at EOF.
+
--- cut ---

 I guess the last empty line is not needed

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] PING: [PATCH v2] pmd/virtio: fix cannot start virtio dev after stop

2016-02-04 Thread Pavel Fedin
 Hello! Are there any news about this patch? We have got this problem for the 
second time, it reproduces every time we try to use
ovs-dpdk inside virtual machine with virtio-net adapter.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia


> -Original Message-
> From: Jianfeng Tan [mailto:jianfeng.tan at intel.com]
> Sent: Monday, January 11, 2016 9:16 AM
> To: dev at dpdk.org
> Cc: p.fedin at samsung.com; yuanhan.liu at linux.intel.com; Jianfeng Tan
> Subject: [PATCH v2] pmd/virtio: fix cannot start virtio dev after stop
> 
> v2 changes:
> - Address compiling error.
> - Add Reported-by.
> 
> Fix the issue that virtio device cannot be started after stopped.
> 
> The field, hw->started, should be changed by virtio_dev_start/stop instead
> of virtio_dev_close.
> 
> Reported-by: Pavel Fedin 
> Signed-off-by: Jianfeng Tan 
> Acked-by: Yuanhan Liu 
> 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index d928339..5bdd305 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -490,11 +490,13 @@ virtio_dev_close(struct rte_eth_dev *dev)
> 
>   PMD_INIT_LOG(DEBUG, "virtio_dev_close");
> 
> + if (hw->started == 1)
> + virtio_dev_stop(dev);
> +
>   /* reset the NIC */
>   if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
>   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
>   vtpci_reset(hw);
> - hw->started = 0;
>   virtio_dev_free_mbufs(dev);
>   virtio_free_queues(dev);
>  }
> @@ -1408,10 +1410,9 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
>   if (rte_eal_process_type() == RTE_PROC_SECONDARY)
>   return -EPERM;
> 
> - if (hw->started == 1) {
> - virtio_dev_stop(eth_dev);
> - virtio_dev_close(eth_dev);
> - }
> + /* Close it anyway since there's no way to know if closed */
> + virtio_dev_close(eth_dev);
> +
>   pci_dev = eth_dev->pci_dev;
> 
>   eth_dev->dev_ops = NULL;
> @@ -1612,9 +1613,12 @@ static void
>  virtio_dev_stop(struct rte_eth_dev *dev)
>  {
>   struct rte_eth_link link;
> + struct virtio_hw *hw = dev->data->dev_private;
> 
>   PMD_INIT_LOG(DEBUG, "stop");
> 
> + hw->started = 0;
> +
>   if (dev->data->dev_conf.intr_conf.lsc)
>   rte_intr_disable(&dev->pci_dev->intr_handle);
> 
> --
> 2.1.4




[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-15 Thread Pavel Fedin
 Hello!

> If this is the case, i am wondering whether we should include
> "malfunctioning clients" in commit message. It triggers me to think if
> there are existing buggy implementations.

 Well... Can you suggest how to rephrase it? May be "if a client is 
malfunctioning it can..."? I lack fantasy, really, and to tell
the truth i don't care that much about the exact phrasing, i'm OK with 
everything.

> Anyway, check is OK.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Pavel Fedin
 Hello!

> Do you have examples for the malfunctioning clients? If so, could you
> list them in the commit message?

 The only malfunctioning client was DPDK itself, with virtio for container RFC 
applied. The client-side problem has been fixed
afterwards by http://dpdk.org/ml/archives/dev/2016-January/031169.html. See the 
RFC discussion thread.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Pavel Fedin
Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for
some reason. This causes NULL dereference in qva_to_vva().

Signed-off-by: Pavel Fedin 
Acked-by: Yuanhan Liu 
---
 lib/librte_vhost/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 0ba5045..3e7cec0 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -630,7 +630,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct 
vhost_vring_addr *addr)
struct vhost_virtqueue *vq;

dev = get_device(ctx);
-   if (dev == NULL)
+   if ((dev == NULL) || (dev->mem == NULL))
return -1;

/* addr->index refers to the queue index. The txq 1, rxq is 0. */
-- 
2.1.1




[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Pavel Fedin
 Hello!

> > Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a
> 
> What's this? An internal track id?

 Yes, it's from our gerrit. I've just done git format-patch.

> If so, you should not include it
> here: it's just meaningless to us.
> 
> Otherwise, this patch looks good to me.

 Should i repost, or can you just drop this tag by yourself when applying?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Pavel Fedin
 Hello!

> Could anyone please point out, how it can be tested further(how can
> traffic be sent across host and container)  ?

 Have you applied all three fixes discussed here?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-12 Thread Pavel Fedin
Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for
some reason. This causes NULL dereference in qva_to_vva().

Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a
Signed-off-by: Pavel Fedin 
---
 lib/librte_vhost/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 0ba5045..3e7cec0 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -630,7 +630,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct 
vhost_vring_addr *addr)
struct vhost_virtqueue *vq;

dev = get_device(ctx);
-   if (dev == NULL)
+   if ((dev == NULL) || (dev->mem == NULL))
return -1;

/* addr->index refers to the queue index. The txq 1, rxq is 0. */
-- 
2.1.1



[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> I might be missing something obvious here but, aside from having memory
> SHARED which most DPDK apps using hugepages will have anyway, what is
> the backward compatibility issues that you see here?

 Heh, sorry once again for confusing. Indeed, with hugepages we always get 
MAP_SHARED. I missed that. So, we indeed need
--shared-mem only in addition to --no-huge.

 Backwards compatibility issue is stated in the description of PATCH 1/4:
--- cut ---
b. possible ABI break, originally, --no-huge uses anonymous memory
instead of file-backed way to create memory.
--- cut ---
 The patch unconditionally changes that to SHARED. That's all.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> >   .repeated depends on CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS. By the way, 
> > looks like it does
> the same thing as you are trying to do with --single-file, but with 
> hugepages, doesn't it? I
> see it's currently used by ivshmem (which is AFAIK very immature and 
> half-abandoned).
> 
> Similar but not the same.
> --single-file: a single file for all mapped hugepages.
> SINGLE_FILE_SEGMENTS: a file per set of physically contiguous mapped
> hugepages (what DPDK calls memseg , memory segment). So there could be
> more than one file.

 Thank you for the explanation.

 By this time, i've done more testing. Current patchset breaks --no-huge. I did 
not study why:
--- cut ---
Program received signal SIGBUS, Bus error.
malloc_elem_init (elem=elem at entry=0x7fffe51e6000, heap=0x77fe5a1c, ms=ms 
at entry=0x77fb301c, size=size at entry=268435392) at 
/home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c:62
62  /home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c: No such file or 
directory.
Missing separate debuginfos, use: dnf debuginfo-install 
keyutils-libs-1.5.9-7.fc23.x86_64 krb5-libs-1.13.2-11.fc23.x86_64 
libcap-ng-0.7.7-2.fc23.x86_64 libcom_err-1.42.13-3.fc23.x86_64 
libselinux-2.4-4.fc23.x86_64 openssl-libs-1.0.2d-2.fc23.x86_64 
pcre-8.37-4.fc23.x86_64 zlib-1.2.8-9.fc23.x86_64
(gdb) where
#0  malloc_elem_init (elem=elem at entry=0x7fffe51e6000, heap=0x77fe5a1c, 
ms=ms at entry=0x77fb301c, size=size at entry=268435392)
at /home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c:62
#1  0x004a50b5 in malloc_heap_add_memseg (ms=0x77fb301c, 
heap=) at 
/home/p.fedin/dpdk/lib/librte_eal/common/malloc_heap.c:109
#2  rte_eal_malloc_heap_init () at 
/home/p.fedin/dpdk/lib/librte_eal/common/malloc_heap.c:232
#3  0x004be896 in rte_eal_memzone_init () at 
/home/p.fedin/dpdk/lib/librte_eal/common/eal_common_memzone.c:427
#4  0x0042ab02 in rte_eal_init (argc=argc at entry=11, argv=argv at 
entry=0x7fffeb80) at 
/home/p.fedin/dpdk/lib/librte_eal/linuxapp/eal/eal.c:799
#5  0x0066dfb9 in dpdk_init (argc=11, argv=0x7fffeb80) at 
lib/netdev-dpdk.c:2192
#6  0x0040ddd9 in main (argc=12, argv=0x7fffeb78) at 
vswitchd/ovs-vswitchd.c:74
--- cut ---

 And now i tend to think that we do not need --single-file at all. Because:
a) It's just a temporary workaround for "more than 8 regions" problem.
b) It's not compatible with physical hardware anyway.

 So i think that we could easily use "--no-huge --shared-mem" combination. We 
could address hugepages compatibility problem later.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> So are you suggesting to not introduce --single-file option but instead
> --shared-mem?
> AFAIK --single-file was trying to workaround the limitation of just
> being able to map 8 fds.

 Heh, yes, you're right... Indeed, sorry, i was not patient enough, i see it 
uses hpi->hugedir instead of using /dev/shm... I was confused by the code 
path... It seemed that --single-file is an alias to --no-hugepages.
 And the patch still changes mmap() mode to SHARED unconditionally, which is 
not good in terms of backwards compability (and this is explicitly noticed in 
the cover letter).

 So, let's try to sort out...
 a) By default we should still have MAP_PRIVATE
 b) Let's say that we need --shared-mem in order to make it MAP_SHARED. This 
can be combined with --no-hugepages if necessary (this is what i tried to 
implement based on the old RFC).
 c) Let's say that --single-file uses hugetlbfs but maps everything via single 
file. This still can be combined with --shared-mem.

 wouldn't this be more clear, more straightforward and implication-free?

 And if we agree on that, we could now try to decrease number of options:
 a) We could imply MAP_SHARED if cvio is used, because shared memory is 
mandatory in this case.
 b) (c) above again raises a question: doesn't it make 
CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS obsolete? Or may be we could use that one 
instead of --single-file (however i'm not a fan of compile-time configuration 
like this)?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> Oh I get it and recognize the problem here. The actual problem lies in
> the API rte_eal_get_backfile_info().
> backfiles[i].size = hugepage_files[i].size;
> Should use statfs or hugepage_files[i].size * hugepage_files[i].repeated
> to calculate the total size.

 .repeated depends on CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS. By the way, looks 
like it does the same thing as you are trying to do with --single-file, but 
with hugepages, doesn't it? I see it's currently used by ivshmem (which is 
AFAIK very immature and half-abandoned).
 Or should we just move .repeated out of the #ifdef ?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> >   BTW, i'm still unhappy about ABI breakage here. I think we could easily 
> > add --shared-mem
> option, which would simply change mapping mode to SHARED. So, we could use it 
> with both
> hugepages (default) and plain mmap (with --no-hugepages).
> 
> You mean, use "--no-hugepages --shared-mem" together, right?

 Yes. This would be perfectly backwards-compatible because.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Pavel Fedin
 Hello!

> Your guess makes sense because current implementation does not support
> multi-queues.
> 
>  From you log, only 0 and 1 are "ready for processing"; others are "not
> ready for processing".

 Yes, and if study it even more carefully, we see that we initialize all tx 
queues but only a single rx queue (#0).
 After some more code browsing and comparing the two patchsets i figured out 
that the problem is caused by inappropriate VIRTIO_NET_F_CTRL_VQ flag. In your 
RFC you used different capability set, while in v1 you seem to have forgotten 
about this.
 I suggest to temporarily move hw->guest_features assignment out of 
virtio_negotiate_features() into the caller, where we have eth_dev->dev_type, 
and can choose the right set depending on it.

 With all mentioned fixes i've got the ping running.
 Tested-by: Pavel Fedin 

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

>> Should this be "hugepage->size = internal_config.memory"? Otherwise the 
>> vhost-user
>> memtable entry has a size of only 2MB.

> I don't think so. See the definition:

> 47 struct hugepage_file {
> 48 void *orig_va;  /**< virtual addr of first mmap() */
> 49 void *final_va; /**< virtual addr of 2nd mmap() */
> 50 uint64_t physaddr;  /**< physical addr */
> 51 size_t size;/**< the page size */
> 52 int socket_id;  /**< NUMA socket ID */
> 53 int file_id;/**< the '%d' in HUGEFILE_FMT */
> 54 int memseg_id;  /**< the memory segment to which page belongs 
> */
> 
> 55 #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
> 56 int repeated;   /**< number of times the page size is 
> repeated */   
> 
> 57 #endif
> 58 char filepath[MAX_HUGEPAGE_PATH]; /**< path to backing file on 
> filesystem */ 
>
> 59 };

> size stands for the page size instead of total size.

 But in this case host gets this page size for total region size, therefore 
qva_to_vva() fails.
 I haven't worked with hugepages, but i guess that with real hugepages we get 
one file per page, therefore page size == mapping size. With newly introduced 
--single-file we now have something that pretends to be a single 
"uber-huge-page", so we need to specify total size of the mapping here.

 BTW, i'm still unhappy about ABI breakage here. I think we could easily add 
--shared-mem option, which would simply change mapping mode to SHARED. So, we 
could use it with both hugepages (default) and plain mmap (with --no-hugepages).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Pavel Fedin
_SET_VRING_CALL
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring call 
idx:15 file:67
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_NUM
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_BASE
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_ADDR
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_KICK
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring kick 
idx:15 file:68
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: virtio is not 
ready for processing.
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_CALL
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring call 
idx:17 file:69
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_NUM
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_BASE
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_ADDR
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_KICK
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring kick 
idx:17 file:70
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: virtio is not 
ready for processing.
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_FEATURES
--- cut ---

 Note that during multiqueue setup host state reverts back from "now ready for 
processing" to "not ready for processing". I guess this is the reason for the 
problem.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 4/4] virtio/vdev: add a new vdev named eth_cvio

2016-01-12 Thread Pavel Fedin
p = (uintptr_t)&sw_ring[i]->rearm_data;
>   *(uint64_t *)p = rxvq->mbuf_initializer;
> 
> - start_dp[i].addr =
> - (uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
> - RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
> + start_dp[i].addr = RTE_MBUF_DATA_DMA_ADDR(sw_ring[i], 
> rxvq->offset)
> + - sizeof(struct virtio_net_hdr);
>   start_dp[i].len = sw_ring[i]->buf_len -
>   RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
>   }
> @@ -366,7 +365,7 @@ virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf 
> **tx_pkts,
>   txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
>   for (i = 0; i < nb_tail; i++) {
>   start_dp[desc_idx].addr =
> - RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
> + RTE_MBUF_DATA_DMA_ADDR(*tx_pkts, txvq->offset);
>   start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
>   tx_pkts++;
>   desc_idx++;
> @@ -377,7 +376,8 @@ virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf 
> **tx_pkts,
>   for (i = 0; i < nb_commit; i++)
>   txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
>   for (i = 0; i < nb_commit; i++) {
> - start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
> + start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts,
> + txvq->offset);
>   start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
>   tx_pkts++;
>   desc_idx++;
> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> index 61b3137..dc0b656 100644
> --- a/drivers/net/virtio/virtqueue.h
> +++ b/drivers/net/virtio/virtqueue.h
> @@ -66,8 +66,14 @@ struct rte_mbuf;
> 
>  #define VIRTQUEUE_MAX_NAME_SZ 32
> 
> -#define RTE_MBUF_DATA_DMA_ADDR(mb) \
> +#ifdef RTE_VIRTIO_VDEV
> +#define RTE_MBUF_DATA_DMA_ADDR(mb, offset) \
> + (uint64_t)((uintptr_t)(*(void **)((uintptr_t)mb + offset)) \
> + + (mb)->data_off)
> +#else
> +#define RTE_MBUF_DATA_DMA_ADDR(mb, offset) \
>   (uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
> +#endif /* RTE_VIRTIO_VDEV */
> 
>  #define VTNET_SQ_RQ_QUEUE_IDX 0
>  #define VTNET_SQ_TQ_QUEUE_IDX 1
> @@ -167,7 +173,8 @@ struct virtqueue {
> 
>   void*vq_ring_virt_mem;/**< linear address of vring*/
>   unsigned int vq_ring_size;
> - phys_addr_t vq_ring_mem;  /**< physical address of vring */
> + phys_addr_t vq_ring_mem;  /**< phys address of vring for pci 
> dev,
> + 
> virt addr of vring for vdev */
> 
>   struct vring vq_ring;/**< vring keeping desc, used and avail */
>   uint16_tvq_free_cnt; /**< num of desc available */
> @@ -186,8 +193,10 @@ struct virtqueue {
>*/
>   uint16_t vq_used_cons_idx;
>   uint16_t vq_avail_idx;
> + uint16_t offset; /**< relative offset to obtain addr in mbuf */
>   uint64_t mbuf_initializer; /**< value to init mbufs. */
>   phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
> + void*virtio_net_hdr_vaddr;/**< linear address of vring*/
> 
>   struct rte_mbuf **sw_ring; /**< RX software ring. */
>   /* dummy mbuf, for wraparound when processing RX ring. */
> --
> 2.1.4

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia



[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Pavel Fedin
 Hello!

> > b) With --single-file - ovs runs, but doesn't get any packets at all. When 
> > i try to ping
> the container from within host side, it
> > counts drops on vhost-user port.
> Can you check the OVS in host side, if it prints out message of "virtio
> is now ready for processing"?

 No, i get errors:
--- cut ---
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: new virtio 
connection is 38
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: new device, 
handle is 0
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_OWNER
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_GET_FEATURES
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_FEATURES
Jan 12 10:27:43 nfv_test_x86_64 kernel: device ovs-netdev entered promiscuous 
mode
Jan 12 10:27:43 nfv_test_x86_64 kernel: device ovs0 entered promiscuous mode
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_MEM_TABLE
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: mapped 
region 0 fd:39 to:0x7f079c60 sz:0x20 off:0x0
align:0x20
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_CALL
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: vring call 
idx:0 file:49
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_NUM
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_BASE
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_ADDR
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: (0) Failed 
to find desc ring address.
--- cut ---

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH v2] pmd/virtio: fix cannot start virtio dev after stop

2016-01-11 Thread Pavel Fedin
 Tested-by: Pavel Fedin 

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


> -Original Message-
> From: Jianfeng Tan [mailto:jianfeng.tan at intel.com]
> Sent: Monday, January 11, 2016 9:16 AM
> To: dev at dpdk.org
> Cc: p.fedin at samsung.com; yuanhan.liu at linux.intel.com; Jianfeng Tan
> Subject: [PATCH v2] pmd/virtio: fix cannot start virtio dev after stop
> 
> v2 changes:
> - Address compiling error.
> - Add Reported-by.
> 
> Fix the issue that virtio device cannot be started after stopped.
> 
> The field, hw->started, should be changed by virtio_dev_start/stop instead
> of virtio_dev_close.
> 
> Reported-by: Pavel Fedin 
> Signed-off-by: Jianfeng Tan 
> Acked-by: Yuanhan Liu 
> 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index d928339..5bdd305 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -490,11 +490,13 @@ virtio_dev_close(struct rte_eth_dev *dev)
> 
>   PMD_INIT_LOG(DEBUG, "virtio_dev_close");
> 
> + if (hw->started == 1)
> + virtio_dev_stop(dev);
> +
>   /* reset the NIC */
>   if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
>   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
>   vtpci_reset(hw);
> - hw->started = 0;
>   virtio_dev_free_mbufs(dev);
>   virtio_free_queues(dev);
>  }
> @@ -1408,10 +1410,9 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
>   if (rte_eal_process_type() == RTE_PROC_SECONDARY)
>   return -EPERM;
> 
> - if (hw->started == 1) {
> - virtio_dev_stop(eth_dev);
> - virtio_dev_close(eth_dev);
> - }
> + /* Close it anyway since there's no way to know if closed */
> + virtio_dev_close(eth_dev);
> +
>   pci_dev = eth_dev->pci_dev;
> 
>   eth_dev->dev_ops = NULL;
> @@ -1612,9 +1613,12 @@ static void
>  virtio_dev_stop(struct rte_eth_dev *dev)
>  {
>   struct rte_eth_link link;
> + struct virtio_hw *hw = dev->data->dev_private;
> 
>   PMD_INIT_LOG(DEBUG, "stop");
> 
> + hw->started = 0;
> +
>   if (dev->data->dev_conf.intr_conf.lsc)
>   rte_intr_disable(&dev->pci_dev->intr_handle);
> 
> --
> 2.1.4




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-11 Thread Pavel Fedin
 Hello!


> This patchset is to provide high performance networking interface (virtio)
> for container-based DPDK applications. The way of starting DPDK apps in
> containers with ownership of NIC devices exclusively is beyond the scope.
> The basic idea here is to present a new virtual device (named eth_cvio),
> which can be discovered and initialized in container-based DPDK apps using
> rte_eal_init(). To minimize the change, we reuse already-existing virtio
> frontend driver code (driver/net/virtio/).

 With the aforementioned fixes i tried to run it inside libvirt-lxc. I got the 
following:
a) With hugepages - "abort with 256 hugepage files exceed the maximum of 8 for 
vhost-user" - i set -m 512
b) With --single-file - ovs runs, but doesn't get any packets at all. When i 
try to ping the container from within host side, it
counts drops on vhost-user port.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 3/4] virtio/vdev: add ways to interact with vhost

2016-01-11 Thread Pavel Fedin
F_MULTI_QUEUE  0x0100
> +#define IFF_ATTACH_QUEUE 0x0200
> +#define IFF_DETACH_QUEUE 0x0400
> +
> +/* Features for GSO (TUNSETOFFLOAD). */
> +#define TUN_F_CSUM   0x01/* You can hand me unchecksummed packets. */
> +#define TUN_F_TSO4   0x02/* I can handle TSO for IPv4 packets */
> +#define TUN_F_TSO6   0x04/* I can handle TSO for IPv6 packets */
> +#define TUN_F_TSO_ECN0x08/* I can handle TSO with ECN bits. */
> +#define TUN_F_UFO0x10/* I can handle UFO packets */
> +
> +#define PATH_NET_TUN "/dev/net/tun"
> +
> +#endif
> diff --git a/drivers/net/virtio/virtio_ethdev.h 
> b/drivers/net/virtio/virtio_ethdev.h
> index ae2d47d..9e1ecb3 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -122,5 +122,8 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct 
> rte_mbuf
> **tx_pkts,
>  #define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
>   VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
> 
> -
> +#ifdef RTE_VIRTIO_VDEV
> +void virtio_vdev_init(struct rte_eth_dev_data *data, const char *path,
> + int nb_rx, int nb_tx, int nb_cq, int queue_num, char *mac);
> +#endif
>  #endif /* _VIRTIO_ETHDEV_H_ */
> diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
> index 47f722a..af05ae2 100644
> --- a/drivers/net/virtio/virtio_pci.h
> +++ b/drivers/net/virtio/virtio_pci.h
> @@ -147,7 +147,6 @@ struct virtqueue;
>   * rest are per-device feature bits.
>   */
>  #define VIRTIO_TRANSPORT_F_START 28
> -#define VIRTIO_TRANSPORT_F_END   32
> 
>  /* The Guest publishes the used index for which it expects an interrupt
>   * at the end of the avail ring. Host should ignore the avail->flags field. 
> */
> @@ -165,6 +164,7 @@ struct virtqueue;
> 
>  struct virtio_hw {
>   struct virtqueue *cvq;
> +#define VIRTIO_VDEV_IO_BASE  0x
>   uint32_tio_base;
>   uint32_tguest_features;
>   uint32_tmax_tx_queues;
> @@ -174,6 +174,21 @@ struct virtio_hw {
>   uint8_t use_msix;
>   uint8_t started;
>   uint8_t mac_addr[ETHER_ADDR_LEN];
> +#ifdef RTE_VIRTIO_VDEV
> +#define VHOST_KERNEL 0
> +#define VHOST_USER   1
> + int type; /* type of backend */
> + uint32_tqueue_num;
> + char*path;
> + int mac_specified;
> + int vhostfd;
> + int backfd; /* tap device used in vhost-net */
> + int callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
> + int kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
> + uint32_tqueue_sel;
> + uint8_t status;
> + struct rte_eth_dev_data *data;
> +#endif
>  };
> 
>  /*
> @@ -229,6 +244,39 @@ outl_p(unsigned int data, unsigned int port)
>  #define VIRTIO_PCI_REG_ADDR(hw, reg) \
>   (unsigned short)((hw)->io_base + (reg))
> 
> +#ifdef RTE_VIRTIO_VDEV
> +uint32_t virtio_ioport_read(struct virtio_hw *, uint64_t);
> +void virtio_ioport_write(struct virtio_hw *, uint64_t, uint32_t);
> +
> +#define VIRTIO_READ_REG_1(hw, reg) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + inb((VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_read(hw, reg)
> +#define VIRTIO_WRITE_REG_1(hw, reg, value) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_write(hw, reg, value)
> +
> +#define VIRTIO_READ_REG_2(hw, reg) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + inw((VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_read(hw, reg)
> +#define VIRTIO_WRITE_REG_2(hw, reg, value) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_write(hw, reg, value)
> +
> +#define VIRTIO_READ_REG_4(hw, reg) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + inl((VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_read(hw, reg)
> +#define VIRTIO_WRITE_REG_4(hw, reg, value) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_write(hw, reg, value)
> +
> +#else /* RTE_VIRTIO_VDEV */
> +
>  #define VIRTIO_READ_REG_1(hw, reg) \
>   inb((VIRTIO_PCI_REG_ADDR((hw), (reg
>  #define VIRTIO_WRITE_REG_1(hw, reg, value) \
> @@ -244,6 +292,8 @@ outl_p(unsigned int data, unsigned int port)
>  #define VIRTIO_WRITE_REG_4(hw, reg, value) \
>   outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
> 
> +#endif /* RTE_VIRTIO_VDEV */
> +
>  static inline int
>  vtpci_with_feature(struct virtio_hw *hw, uint32_t bit)
>  {
> --
> 2.1.4

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-11 Thread Pavel Fedin
 Hello!

> -Original Message-
> From: Jianfeng Tan [mailto:jianfeng.tan at intel.com]
> Sent: Sunday, January 10, 2016 2:43 PM
> To: dev at dpdk.org
> Cc: rich.lane at bigswitch.com; yuanhan.liu at linux.intel.com; mst at 
> redhat.com;
> nakajima.yoshihiro at lab.ntt.co.jp; huawei.xie at intel.com; mukawa at 
> igel.co.jp;
> p.fedin at samsung.com; michael.qiu at intel.com; ann.zhuangyanying at 
> huawei.com; Jianfeng Tan
> Subject: [PATCH 2/4] mem: add API to obstain memory-backed file info

 "obtain" - typo in subject

> 
> A new API named rte_eal_get_backfile_info() and a new data
> struct back_file is added to obstain information of memory-
> backed file info.
> 
> Signed-off-by: Huawei Xie 
> Signed-off-by: Jianfeng Tan 
> ---
>  lib/librte_eal/common/include/rte_memory.h | 16 +
>  lib/librte_eal/linuxapp/eal/eal_memory.c   | 37 
> ++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_memory.h
> b/lib/librte_eal/common/include/rte_memory.h
> index 9c9e40f..75ef8db 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -109,6 +109,22 @@ struct rte_memseg {
>  } __rte_packed;
> 
>  /**
> + * This struct is used to store information about memory-backed file that
> + * we mapped in memory initialization.
> + */
> +struct back_file {
> + void *addr; /**< virtual addr */
> + size_t size;/**< the page size */
> + char filepath[PATH_MAX]; /**< path to backing file on filesystem */
> +};
> +
> +/**
> +  * Get the hugepage file information. Caller to free.
> +  * Return number of hugepage files used.
> +  */
> +int rte_eal_get_backfile_info(struct back_file **);
> +
> +/**
>   * Lock page in physical memory and prevent from swapping.
>   *
>   * @param virt
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 2bb1163..6ca1404 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -758,6 +758,9 @@ sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct 
> hugepage_info
> *hpi)
>   return 0;
>  }
> 
> +static struct hugepage_file *hugepage_files;
> +static int num_hugepage_files;
> +
>  /*
>   * Uses mmap to create a shared memory area for storage of data
>   * Used in this file to store the hugepage file map on disk
> @@ -776,9 +779,29 @@ create_shared_memory(const char *filename, const size_t 
> mem_size)
>   retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 
> 0);
>   close(fd);
> 
> + hugepage_files = retval;
> + num_hugepage_files = mem_size / (sizeof(struct hugepage_file));
> +
>   return retval;
>  }
> 
> +int
> +rte_eal_get_backfile_info(struct back_file **p)
> +{
> + struct back_file *backfiles;
> + int i, num_backfiles = num_hugepage_files;
> +
> + backfiles = malloc(sizeof(struct back_file) * num_backfiles);
> + for (i = 0; i < num_backfiles; ++i) {
> + backfiles[i].addr = hugepage_files[i].final_va;
> + backfiles[i].size = hugepage_files[i].size;
> + strcpy(backfiles[i].filepath, hugepage_files[i].filepath);
> + }
> +
> + *p = backfiles;
> + return num_backfiles;
> +}
> +
>  /*
>   * this copies *active* hugepages from one hugepage table to another.
>   * destination is typically the shared memory.
> @@ -1157,6 +1180,20 @@ rte_eal_hugepage_init(void)
>   mcfg->memseg[0].len = internal_config.memory;
>   mcfg->memseg[0].socket_id = socket_id;
> 
> + hugepage = create_shared_memory(eal_hugepage_info_path(),
> + sizeof(struct hugepage_file));
> + hugepage->orig_va = addr;
> + hugepage->final_va = addr;
> + hugepage->physaddr = rte_mem_virt2phy(addr);
> + hugepage->size = pagesize;
> + hugepage->socket_id = socket_id;
> + hugepage->file_id = 0;
> + hugepage->memseg_id = 0;
> +#ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
> + hugepage->repeated = internal_config.memory / pagesize;
> +#endif
> + strncpy(hugepage->filepath, filepath, MAX_HUGEPAGE_PATH);
> +
>   close(fd);
> 
>   return 0;
> --
> 2.1.4

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia



[dpdk-dev] [PATCH] pmd/virtio: fix cannot start virtio dev after stop

2016-01-11 Thread Pavel Fedin
 Hello!

 I tried to apply your patch to master and got compile errors. See inline.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianfeng Tan
> Sent: Tuesday, January 05, 2016 4:08 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] pmd/virtio: fix cannot start virtio dev after stop
> 
> Fix the issue that virtio device cannot be started after stopped.
> 
> The field, hw->started, should be changed by virtio_dev_start/stop instead
> of virtio_dev_close.
> 
> Signed-off-by: Jianfeng Tan 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index d928339..07fe271 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -490,11 +490,13 @@ virtio_dev_close(struct rte_eth_dev *dev)
> 
>   PMD_INIT_LOG(DEBUG, "virtio_dev_close");
> 
> + if (hw->started == 1)
> + virtio_dev_stop(eth_dev);
> +

 'dev', but not 'eth_dev' here.

>   /* reset the NIC */
>   if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
>   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
>   vtpci_reset(hw);
> - hw->started = 0;
>   virtio_dev_free_mbufs(dev);
>   virtio_free_queues(dev);
>  }
> @@ -1408,10 +1410,9 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
>   if (rte_eal_process_type() == RTE_PROC_SECONDARY)
>   return -EPERM;
> 
> - if (hw->started == 1) {
> - virtio_dev_stop(eth_dev);
> - virtio_dev_close(eth_dev);
> - }
> + /* Close it anyway since there's no way to know if closed */
> + virtio_dev_close(eth_dev);
> +
>   pci_dev = eth_dev->pci_dev;
> 
>   eth_dev->dev_ops = NULL;
> @@ -1615,6 +1616,8 @@ virtio_dev_stop(struct rte_eth_dev *dev)
> 
>   PMD_INIT_LOG(DEBUG, "stop");
> 
> + hw->started = 0;
> +

 'hw' is not declared in this function, you have to add it.

>   if (dev->data->dev_conf.intr_conf.lsc)
>   rte_intr_disable(&dev->pci_dev->intr_handle);
> 
> --
> 2.1.4

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 3/4] virtio/vdev: add ways to interact with vhost

2016-01-11 Thread Pavel Fedin
efine TUN_F_CSUM   0x01/* You can hand me unchecksummed packets. */
> +#define TUN_F_TSO4   0x02/* I can handle TSO for IPv4 packets */
> +#define TUN_F_TSO6   0x04/* I can handle TSO for IPv6 packets */
> +#define TUN_F_TSO_ECN0x08/* I can handle TSO with ECN bits. */
> +#define TUN_F_UFO0x10/* I can handle UFO packets */
> +
> +#define PATH_NET_TUN "/dev/net/tun"
> +
> +#endif
> diff --git a/drivers/net/virtio/virtio_ethdev.h 
> b/drivers/net/virtio/virtio_ethdev.h
> index ae2d47d..9e1ecb3 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -122,5 +122,8 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct 
> rte_mbuf
> **tx_pkts,
>  #define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
>   VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
> 
> -
> +#ifdef RTE_VIRTIO_VDEV
> +void virtio_vdev_init(struct rte_eth_dev_data *data, const char *path,
> + int nb_rx, int nb_tx, int nb_cq, int queue_num, char *mac);
> +#endif
>  #endif /* _VIRTIO_ETHDEV_H_ */
> diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
> index 47f722a..af05ae2 100644
> --- a/drivers/net/virtio/virtio_pci.h
> +++ b/drivers/net/virtio/virtio_pci.h
> @@ -147,7 +147,6 @@ struct virtqueue;
>   * rest are per-device feature bits.
>   */
>  #define VIRTIO_TRANSPORT_F_START 28
> -#define VIRTIO_TRANSPORT_F_END   32

 I understand that this #define is not used, but... May be we should do this 
cleanup as a separate patch? Otherwise it's hard to
track this change (i believe this definition had some use in the past).

> 
>  /* The Guest publishes the used index for which it expects an interrupt
>   * at the end of the avail ring. Host should ignore the avail->flags field. 
> */
> @@ -165,6 +164,7 @@ struct virtqueue;
> 
>  struct virtio_hw {
>   struct virtqueue *cvq;
> +#define VIRTIO_VDEV_IO_BASE  0x
>   uint32_tio_base;
>   uint32_tguest_features;
>   uint32_tmax_tx_queues;
> @@ -174,6 +174,21 @@ struct virtio_hw {
>   uint8_t use_msix;
>   uint8_t started;
>   uint8_t mac_addr[ETHER_ADDR_LEN];
> +#ifdef RTE_VIRTIO_VDEV
> +#define VHOST_KERNEL 0
> +#define VHOST_USER   1
> + int type; /* type of backend */
> + uint32_tqueue_num;
> + char*path;
> + int mac_specified;
> + int vhostfd;
> + int backfd; /* tap device used in vhost-net */
> + int callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
> + int kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
> + uint32_tqueue_sel;
> + uint8_t status;
> + struct rte_eth_dev_data *data;
> +#endif

 Actually i am currently working on this too, and i decided to use different 
approach. I made these extra fields into a separate
structure, changed 'io_base' to a pointer, and now i can store there a pointer 
to this extra structure. Device type can easily be
determined by (dev->dev_type == RTE_ETH_DEV_PCI) check, so you don't need 
VIRTIO_VDEV_IO_BASE magic value.

>  };
> 
>  /*
> @@ -229,6 +244,39 @@ outl_p(unsigned int data, unsigned int port)
>  #define VIRTIO_PCI_REG_ADDR(hw, reg) \
>   (unsigned short)((hw)->io_base + (reg))
> 
> +#ifdef RTE_VIRTIO_VDEV
> +uint32_t virtio_ioport_read(struct virtio_hw *, uint64_t);
> +void virtio_ioport_write(struct virtio_hw *, uint64_t, uint32_t);
> +
> +#define VIRTIO_READ_REG_1(hw, reg) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + inb((VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_read(hw, reg)
> +#define VIRTIO_WRITE_REG_1(hw, reg, value) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_write(hw, reg, value)
> +
> +#define VIRTIO_READ_REG_2(hw, reg) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + inw((VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_read(hw, reg)
> +#define VIRTIO_WRITE_REG_2(hw, reg, value) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_write(hw, reg, value)
> +
> +#define VIRTIO_READ_REG_4(hw, reg) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + inl((VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_read(hw, reg)
> +#define VIRTIO_WRITE_REG_4(hw, reg, value) \
> + (hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
> + outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
> + :virtio_ioport_write(hw, reg, value)

 I also decided to add two fields to 'hw', where pointers to these accessors 
are stored. I think this should be faster, however,
yes, this is not performance-critical code because it's executed only during 
initialization.

> +
> +#else /* RTE_VIRTIO_VDEV */
> +
>  #define VIRTIO_READ_REG_1(hw, reg) \
>   inb((VIRTIO_PCI_REG_ADDR((hw), (reg
>  #define VIRTIO_WRITE_REG_1(hw, reg, value) \
> @@ -244,6 +292,8 @@ outl_p(unsigned int data, unsigned int port)
>  #define VIRTIO_WRITE_REG_4(hw, reg, value) \
>   outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
> 
> +#endif /* RTE_VIRTIO_VDEV */
> +
>  static inline int
>  vtpci_with_feature(struct virtio_hw *hw, uint32_t bit)
>  {
> --
> 2.1.4

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia



[dpdk-dev] [RFC 0/5] virtio support for container

2015-12-31 Thread Pavel Fedin
 Hello!

 Last minute note. I have found the problem but have no time to research and 
fix it.
 It happens because ovs first creates the device, starts it, then stops it, and 
reconfigures queues. The second queue allocation
happens from within netdev_set_multiq(). Then ovs restarts the device and 
proceeds to actually using it.
 But, queues are not initialized properly in DPDK after the second allocation. 
Because of this thing:

/* On restart after stop do not touch queues */
if (hw->started)
return 0;

 It keeps us away from calling virtio_dev_rxtx_start(), which should in turn 
call virtio_dev_vring_start(), which calls
vring_init(). So, VIRTQUEUE_NUSED() dies badly because vq->vq_ring all contains 
NULLs.
 See you all after 10th. And happy New Year again!

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

> -Original Message-
> From: Pavel Fedin [mailto:p.fedin at samsung.com]
> Sent: Thursday, December 31, 2015 4:47 PM
> To: 'Tan, Jianfeng'; 'dev at dpdk.org'
> Subject: RE: [dpdk-dev] [RFC 0/5] virtio support for container
> 
>  Hello!
> 
> > > a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE
> > Please check if rte_eth_dev_start() is called.
> > (rte_eth_dev_start -> virtio_dev_start -> vtpci_reinit_complete -> 
> > kick_all_vq)
> 
>  I've figured out what happened, and it's my fault only :( I have modified 
> your patchset and
> added --shared-mem option. And forgot to specify it to gdb :) Without it 
> memory is not shared,
> and rte_memseg_info_get() returned fd = -1. And if you put it into control 
> message for
> sendmsg(), you get your -EBADF.
>  So please ignore this.
>  But, nevertheless, ovs in container still dies with:
> --- cut ---
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fff97fff700 (LWP 3866)]
> virtio_recv_mergeable_pkts (rx_queue=0x7fffd46a9a80, rx_pkts=0x7fff97ffe850, 
> nb_pkts=32) at
> /home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c:683
> 683   /home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c: No such file or 
> directory.
> Missing separate debuginfos, use: dnf debuginfo-install 
> keyutils-libs-1.5.9-7.fc23.x86_64
> krb5-libs-1.13.2-11.fc23.x86_64 libcap-ng-0.7.7-2.fc23.x86_64 
> libcom_err-1.42.13-
> 3.fc23.x86_64 libselinux-2.4-4.fc23.x86_64 openssl-libs-1.0.2d-2.fc23.x86_64 
> pcre-8.37-
> 4.fc23.x86_64 zlib-1.2.8-9.fc23.x86_64
> (gdb) where
> #0  virtio_recv_mergeable_pkts (rx_queue=0x7fffd46a9a80, 
> rx_pkts=0x7fff97ffe850, nb_pkts=32)
> at /home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c:683
> #1  0x00669ee8 in rte_eth_rx_burst (nb_pkts=32, 
> rx_pkts=0x7fff97ffe850, queue_id=0,
> port_id=0 '\000') at /home/p.fedin/dpdk/build/include/rte_ethdev.h:2510
> #2  netdev_dpdk_rxq_recv (rxq_=, packets=0x7fff97ffe850, 
> c=0x7fff97ffe84c) at
> lib/netdev-dpdk.c:1033
> #3  0x005e8ca1 in netdev_rxq_recv (rx=,
> buffers=buffers at entry=0x7fff97ffe850, cnt=cnt at entry=0x7fff97ffe84c) at 
> lib/netdev.c:654
> #4  0x005cb338 in dp_netdev_process_rxq_port (pmd=pmd at 
> entry=0x7fffac7f8010,
> rxq=, port=, port=) at 
> lib/dpif-netdev.c:2510
> #5  0x005cc649 in pmd_thread_main (f_=0x7fffac7f8010) at 
> lib/dpif-netdev.c:2671
> #6  0x00628424 in ovsthread_wrapper (aux_=) at 
> lib/ovs-thread.c:340
> #7  0x770f660a in start_thread () from /lib64/libpthread.so.0
> #8  0x76926bbd in clone () from /lib64/libc.so.6
> (gdb)
> --- cut ---
> 
>  and l2fwd does not reproduce this. So, let's wait until 11.01.2016. And 
> happy New Year to
> everybody who reads it (and who doesn't) :)
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia




[dpdk-dev] [RFC 0/5] virtio support for container

2015-12-31 Thread Pavel Fedin
 Hello!

> > a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE
> Please check if rte_eth_dev_start() is called.
> (rte_eth_dev_start -> virtio_dev_start -> vtpci_reinit_complete -> 
> kick_all_vq)

 I've figured out what happened, and it's my fault only :( I have modified your 
patchset and added --shared-mem option. And forgot
to specify it to gdb :) Without it memory is not shared, and 
rte_memseg_info_get() returned fd = -1. And if you put it into control
message for sendmsg(), you get your -EBADF.
 So please ignore this.
 But, nevertheless, ovs in container still dies with:
--- cut ---
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff97fff700 (LWP 3866)]
virtio_recv_mergeable_pkts (rx_queue=0x7fffd46a9a80, rx_pkts=0x7fff97ffe850, 
nb_pkts=32) at
/home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c:683
683 /home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c: No such file or 
directory.
Missing separate debuginfos, use: dnf debuginfo-install 
keyutils-libs-1.5.9-7.fc23.x86_64 krb5-libs-1.13.2-11.fc23.x86_64
libcap-ng-0.7.7-2.fc23.x86_64 libcom_err-1.42.13-3.fc23.x86_64 
libselinux-2.4-4.fc23.x86_64 openssl-libs-1.0.2d-2.fc23.x86_64
pcre-8.37-4.fc23.x86_64 zlib-1.2.8-9.fc23.x86_64
(gdb) where
#0  virtio_recv_mergeable_pkts (rx_queue=0x7fffd46a9a80, 
rx_pkts=0x7fff97ffe850, nb_pkts=32) at
/home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c:683
#1  0x00669ee8 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7fff97ffe850, 
queue_id=0, port_id=0 '\000') at
/home/p.fedin/dpdk/build/include/rte_ethdev.h:2510
#2  netdev_dpdk_rxq_recv (rxq_=, packets=0x7fff97ffe850, 
c=0x7fff97ffe84c) at lib/netdev-dpdk.c:1033
#3  0x005e8ca1 in netdev_rxq_recv (rx=, buffers=buffers 
at entry=0x7fff97ffe850, cnt=cnt at entry=0x7fff97ffe84c)
at lib/netdev.c:654
#4  0x005cb338 in dp_netdev_process_rxq_port (pmd=pmd at 
entry=0x7fffac7f8010, rxq=, port=,
port=) at lib/dpif-netdev.c:2510
#5  0x005cc649 in pmd_thread_main (f_=0x7fffac7f8010) at 
lib/dpif-netdev.c:2671
#6  0x00628424 in ovsthread_wrapper (aux_=) at 
lib/ovs-thread.c:340
#7  0x770f660a in start_thread () from /lib64/libpthread.so.0
#8  0x76926bbd in clone () from /lib64/libc.so.6
(gdb)
--- cut ---

 and l2fwd does not reproduce this. So, let's wait until 11.01.2016. And happy 
New Year to everybody who reads it (and who doesn't)
:)

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [RFC 0/5] virtio support for container

2015-12-31 Thread Pavel Fedin
 Hello!

> >  Here you ignore errors. And this particular request for some reason ends up
> > in EBADF. The most magic part is that sometimes it just
> > works...
> >  Not sure if i can finish it today, and here in Russia we have New Year 
> > holidays
> > until 11th.
> 
> Oops, I made a mistake here. I got vhost_user_read() and vhost_user_write() 
> backwards.

 But nevertheless they do the right thing. vhost_user_read() actually writes 
the message into socket, and vhost_user_write() reads
it. So they should work correctly.
 I've just checked, fd number is not corrupted.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [RFC 0/5] virtio support for container

2015-12-31 Thread Pavel Fedin
Hello!

> > a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE
> Please check if rte_eth_dev_start() is called.
> (rte_eth_dev_start -> virtio_dev_start -> vtpci_reinit_complete -> 
> kick_all_vq)
> 
> > b) set_vring_addr() does not make sure that dev->mem is set.
> >  I am preparing a patch to fix (b).
> 
> Yes, it seems like a bug, lack of necessary check.

 I've made some progress about (a). It's tricky. This caused by this fragment:

if (vhost_user_read(vhost->sockfd, &msg, len, fds, fd_num) < 0)
return 0;

 Here you ignore errors. And this particular request for some reason ends up in 
EBADF. The most magic part is that sometimes it just
works...
 Not sure if i can finish it today, and here in Russia we have New Year 
holidays until 11th.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [RFC 0/5] virtio support for container

2015-12-31 Thread Pavel Fedin
 Hello!

> Before you start another ovs_in_container, previous ones get killed?

 Of course. It crashes.

> If so, vhost information in ovs_on_host will be wiped as the unix socket is 
> broken.

 Yes. And ovs_on_host crashes because:
a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE (i don't know why 
yet)
b) set_vring_addr() does not make sure that dev->mem is set.

 I am preparing a patch to fix (b).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [RFC 0/5] virtio support for container

2015-12-31 Thread Pavel Fedin
 Hello!

> First of all, when you say openvswitch, are you referring to ovs-dpdk?

 I am referring to mainline ovs, compiled with dpdk, and using userspace 
dataplane.
 AFAIK ovs-dpdk is early Intel fork, which is abandoned at the moment.

> And can you detail your test case? Like, how do you want ovs_on_host and 
> ovs_in_container to
> be connected?
> Through two-direct-connected physical NICs, or one vhost port in ovs_on_host 
> and one virtio
> port in ovs_in_container?

 vhost port. i. e.

 |
LOCAL--dpdkvhostuser<+>cvio->LOCAL
  ovs|  ovs
 |
host |container

 By this time i advanced in my research. ovs not only crashes by itself, but 
manages to crash host side. It does this by doing
reconfiguration sequence without sending VHOST_USER_SET_MEM_TABLE, therefore 
host-side ovs tries to refer old addresses and dies
badly.
 Those messages about memory pool already being present are perhaps OK.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [RFC 0/5] virtio support for container

2015-12-30 Thread Pavel Fedin
 Hello everybody!

 I am currently working on improved version of this patchset, and i am testing 
it with openvswitch. I run two openvswitch instances:
on host and in container. Both ovs instances forward packets between its LOCAL 
port and vhost/virtio port. This way i can
comfortably run PING between my host and container.
 The problem is that the patchset seems to be broken somehow. ovs-vswitchd 
fails to open dpdk0 device, and if i set --log-level=9
for DPDK, i see this in the console:
--- cut ---
Broadcast message from systemd-journald at localhost.localdomain (Wed 
2015-12-30 11:13:00 MSK):

ovs-vswitchd[557]: EAL: TSC frequency is ~3400032 KHz


Broadcast message from systemd-journald at localhost.localdomain (Wed 
2015-12-30 11:13:00 MSK):

ovs-vswitchd[560]: EAL: memzone_reserve_aligned_thread_unsafe(): memzone 
 already exists


Broadcast message from systemd-journald at localhost.localdomain (Wed 
2015-12-30 11:13:00 MSK):

ovs-vswitchd[560]: RING: Cannot reserve memory
--- cut ---

 How can i debug this?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia





[dpdk-dev] [PATCH v1 2/2] virtio: Extend virtio-net PMD to support container environment

2015-12-28 Thread Pavel Fedin
 Hello!

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tetsuya Mukawa
> Sent: Wednesday, December 16, 2015 11:37 AM
> To: dev at dpdk.org
> Cc: nakajima.yoshihiro at lab.ntt.co.jp; mst at redhat.com
> Subject: [dpdk-dev] [PATCH v1 2/2] virtio: Extend virtio-net PMD to support 
> container
> environment
> 
> The patch adds a new virtio-net PMD configuration that allows the PMD to
> work on host as if the PMD is in VM.
> Here is new configuration for virtio-net PMD.
>  - CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE
> To use this mode, EAL needs physically contiguous memory. To allocate
> such memory, enable below option, and add "--contig-mem" option to
> application command line.
>  - CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS
> 
> To prepare virtio-net device on host, the users need to invoke QEMU process
> in special qtest mode. This mode is mainly used for testing QEMU devices
> from outer process. In this mode, no guest runs.
> Here is QEMU command line.
> 
>  $ qemu-system-x86_64 \
>   -machine pc-i440fx-1.4,accel=qtest \
>   -display none -qtest-log /dev/null \
>   -qtest unix:/tmp/socket,server \
>   -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1 \
>   -device virtio-net-pci,netdev=net0,mq=on \
>   -chardev socket,id=chr1,path=/tmp/ivshmem,server \
>   -device ivshmem,size=1G,chardev=chr1,vectors=1
> 
> * QEMU process is needed per port.
> * In most cases, just using above command is enough.
> * The vhost backends like vhost-net and vhost-user can be specified.
> * Only checked "pc-i440fx-1.4" machine, but may work with other
>   machines. It depends on a machine has piix3 south bridge.
>   If the machine doesn't have, virtio-net PMD cannot receive status
>   changed interrupts.
> * Should not add "--enable-kvm" to QEMU command line.
> 
> After invoking QEMU, the PMD can connect to QEMU process using unix
> domain sockets. Over these sockets, virtio-net, ivshmem and piix3
> device in QEMU are probed by the PMD.
> Here is example of command line.
> 
>  $ testpmd -c f -n 1 -m 1024 --contig-mem \
>  --vdev="eth_virtio_net0,qtest=/tmp/socket,ivshmem=/tmp/ivshmem" \
>  -- --disable-hw-vlan --txqflags=0xf00 -i
> 
> Please specify same unix domain sockets and memory size in both QEMU and
> DPDK command lines like above.
> The share memory size should be power of 2, because ivshmem only accepts
> such memry size.
> 
> Also, "--contig-mem" option is needed for the PMD like above. This option
> allocates contiguous memory, and create one hugepage file on hugetlbfs.
> If there is no enough contiguous memory, initialization will be failed.
> 
> This contiguous memory is used as shared memory between DPDK application
> and ivshmem device in QEMU.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  config/common_linuxapp |1 +
>  drivers/net/virtio/Makefile|4 +
>  drivers/net/virtio/qtest.c | 1107 
> 
>  drivers/net/virtio/virtio_ethdev.c |  341 ++-
>  drivers/net/virtio/virtio_ethdev.h |   12 +
>  drivers/net/virtio/virtio_pci.h|   25 +
>  6 files changed, 1461 insertions(+), 29 deletions(-)
>  create mode 100644 drivers/net/virtio/qtest.c
> 
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 74bc515..eaa720c 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -269,6 +269,7 @@ CONFIG_RTE_LIBRTE_PMD_SZEDATA2=n
>  # Compile burst-oriented VIRTIO PMD driver
>  #
>  CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
> +CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
> diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> index 43835ba..697e629 100644
> --- a/drivers/net/virtio/Makefile
> +++ b/drivers/net/virtio/Makefile
> @@ -52,6 +52,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE),y)
> + SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += qtest.c
> +endif
> +
>  # this lib depends upon:
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
> diff --git a/drivers/net/virtio/qtest.c b/drivers/net/virtio/qtest.c
> new file mode 100644
> index 000..4ffdefb
> --- /dev/null
> +++ b/drivers/net/virtio/qtest.c
> @@ -0,0 +1,1107 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 IGEL Co., Ltd. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above

[dpdk-dev] [PATCH v2 5/6] vhost: claim that we support GUEST_ANNOUNCE feature

2015-12-22 Thread Pavel Fedin
 Hello!

> > diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> > index 03044f6..0ba5045 100644
> > --- a/lib/librte_vhost/virtio-net.c
> > +++ b/lib/librte_vhost/virtio-net.c
> > @@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root;
> >  #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> > (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> > (1ULL << VIRTIO_NET_F_CTRL_RX) | \
> > +   (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
> 
> Do we really need this? I can understand when guest declare with
> this VIRTIO_NET_F_GUEST_ANNOUNCE flag. With that, guest itself will
> handle the announcement after migration. However, how could I
> understand if it's declared by a vhost-user backend?

 I guess the documentation is unclear. This is due to way how qemu works. It 
queries vhost-user backend for the features, then offers them to the guest. The 
guest then responds with features FROM THE SUGGESTED SET, which it supports. 
So, if the backend does not claim to support this feature, qemu will not offer 
it to the guest, therefore the guest will not try to activate it.
 I think this is done because this feature is only useful for migration. If 
vhost-user backend does not support migration, it needs neither 
VHOST_USER_SEND_RARP nor guest-side announce.
 Actually, i was thinking about patching qemu once, but... The changeset seemed 
too complicated, and i imagined the situation described in the above sentence, 
so decided to abandon it.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH v2 0/6] vhost-user live migration support

2015-12-21 Thread Pavel Fedin
 Works fine.

 Tested-by: Pavel Fedin 

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Thursday, December 17, 2015 6:12 AM
> To: dev at dpdk.org
> Cc: huawei.xie at intel.com; Michael S. Tsirkin; Victor Kaplansky; Iremonger 
> Bernard; Pavel
> Fedin; Peter Xu; Yuanhan Liu; Chen Zhihui; Yang Maggie
> Subject: [PATCH v2 0/6] vhost-user live migration support
> 
> This patch set adds the vhost-user live migration support.
> 
> The major task behind that is to log pages we touched during
> live migration, including used vring and desc buffer. So, this
> patch set is basically about adding vhost log support, and
> using it.
> 
> Patchset
> 
> - Patch 1 handles VHOST_USER_SET_LOG_BASE, which tells us where
>   the dirty memory bitmap is.
> 
> - Patch 2 introduces a vhost_log_write() helper function to log
>   pages we are gonna change.
> 
> - Patch 3 logs changes we made to used vring.
> 
> - Patch 4 logs changes we made to vring desc buffer.
> 
> - Patch 5 and 6 add some feature bits related to live migration.
> 
> 
> A simple test guide (on same host)
> ==
> 
> The following test is based on OVS + DPDK (check [0] for
> how to setup OVS + DPDK):
> 
> [0]: http://wiki.qemu.org/Features/vhost-user-ovs-dpdk
> 
> Here is the rough test guide:
> 
> 1. start ovs-vswitchd
> 
> 2. Add two ovs vhost-user port, say vhost0 and vhost1
> 
> 3. Start a VM1 to connect to vhost0. Here is my example:
> 
>$ $QEMU -enable-kvm -m 1024 -smp 4 \
>-chardev socket,id=char0,path=/var/run/openvswitch/vhost0  \
>-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>-device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>-object 
> memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
>-numa node,memdev=mem -mem-prealloc \
>-kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
>-hda fc-19-i386.img \
>-monitor telnet::,server,nowait -curses
> 
> 4. run "ping $host" inside VM1
> 
> 5. Start VM2 to connect to vhost0, and marking it as the target
>of live migration (by adding -incoming tcp:0: option)
> 
>$ $QEMU -enable-kvm -m 1024 -smp 4 \
>-chardev socket,id=char0,path=/var/run/openvswitch/vhost1  \
>-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>-device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>-object 
> memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
>-numa node,memdev=mem -mem-prealloc \
>-kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
>-hda fc-19-i386.img \
>-monitor telnet::3334,server,nowait -curses \
>-incoming tcp:0:
> 
> 6. connect to VM1 monitor, and start migration:
> 
>> migrate tcp:0:
> 
> 7. After a while, you will find that VM1 has been migrated to VM2,
>and the "ping" command continues running, perfectly.
> 
> 
> Cc: Chen Zhihui 
> Cc: Yang Maggie 
> ---
> Yuanhan Liu (6):
>   vhost: handle VHOST_USER_SET_LOG_BASE request
>   vhost: introduce vhost_log_write
>   vhost: log used vring changes
>   vhost: log vring desc buffer changes
>   vhost: claim that we support GUEST_ANNOUNCE feature
>   vhost: enable log_shmfd protocol feature
> 
>  lib/librte_vhost/rte_virtio_net.h | 36 ++-
>  lib/librte_vhost/vhost_rxtx.c | 88 
> +++
>  lib/librte_vhost/vhost_user/vhost-net-user.c  |  7 ++-
>  lib/librte_vhost/vhost_user/vhost-net-user.h  |  6 ++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 48 +++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  5 +-
>  lib/librte_vhost/virtio-net.c |  5 ++
>  7 files changed, 165 insertions(+), 30 deletions(-)
> 
> --
> 1.9.0




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-16 Thread Pavel Fedin
 Hello!

> However, I'm more curious about the ping loss? Did you still see
> that? And to be more specific, have the wireshark captured the
> GRAP from the guest?

 Yes, everything is fine.

root at nfv_test_x86_64 /var/log/libvirt/qemu # tshark -i ovs-br0
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ovs-br0'
  1   0.00 RealtekU_3b:83:1a -> BroadcastARP 42 Gratuitous ARP for 
192.168.6.2 (Request)
  2   0.24 fe80::5054:ff:fe3b:831a -> ff02::1  ICMPv6 86 Neighbor 
Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
  3   0.049490 RealtekU_3b:83:1a -> BroadcastARP 42 Gratuitous ARP for 
192.168.6.2 (Request)
  4   0.049497 fe80::5054:ff:fe3b:831a -> ff02::1  ICMPv6 86 Neighbor 
Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
  5   0.199485 RealtekU_3b:83:1a -> BroadcastARP 42 Gratuitous ARP for 
192.168.6.2 (Request)
  6   0.199492 fe80::5054:ff:fe3b:831a -> ff02::1  ICMPv6 86 Neighbor 
Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
  7   0.449500 RealtekU_3b:83:1a -> BroadcastARP 42 Gratuitous ARP for 
192.168.6.2 (Request)
  8   0.449508 fe80::5054:ff:fe3b:831a -> ff02::1  ICMPv6 86 Neighbor 
Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
  9   0.517229  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  
id=0x04af, seq=70/17920, ttl=64
 10   0.517277  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply
id=0x04af, seq=70/17920, ttl=64 (request in 9)
 11   0.799521 RealtekU_3b:83:1a -> BroadcastARP 42 Gratuitous ARP for 
192.168.6.2 (Request)
 12   0.799553 fe80::5054:ff:fe3b:831a -> ff02::1  ICMPv6 86 Neighbor 
Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
 13   1.517210  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  
id=0x04af, seq=71/18176, ttl=64
 14   1.517238  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply
id=0x04af, seq=71/18176, ttl=64 (request in 13)
 15   2.517219  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  
id=0x04af, seq=72/18432, ttl=64
 16   2.517256  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply
id=0x04af, seq=72/18432, ttl=64 (request in 15)
 17   3.517497  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  
id=0x04af, seq=73/18688, ttl=64
 18   3.517518  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply
id=0x04af, seq=73/18688, ttl=64 (request in 17)
 19   4.517219  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  
id=0x04af, seq=74/18944, ttl=64
 20   4.517237  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply
id=0x04af, seq=74/18944, ttl=64 (request in 19)
 21   5.517222  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  
id=0x04af, seq=75/19200, ttl=64
 22   5.517242  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply
id=0x04af, seq=75/19200, ttl=64 (request in 21)
 23   6.517235  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  
id=0x04af, seq=76/19456, ttl=64
 24   6.517256  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply
id=0x04af, seq=76/19456, ttl=64 (request in 23)
 25   6.531466 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 Who has 
192.168.6.2?  Tell 192.168.6.1
 26   6.531619 RealtekU_3b:83:1a -> be:e1:71:c1:47:4d ARP 42 192.168.6.2 is at 
52:54:00:3b:83:1a
 27   7.517212  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  
id=0x04af, seq=77/19712, ttl=64
 28   7.517229  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply
id=0x04af, seq=77/19712, ttl=64 (request in 27)

 But there's one important detail here. Any replicated network interfaces 
(LOCAL port in my example) should be fully cloned on both
hosts, including MAC addresses. Otherwise after the migration the guest 
continues to send packets to old MAC, and, obvious, there's
still ping loss until it redoes the ARP for its ping target.

>  And what's the output of 'grep virtio /proc/interrupts' inside guest?

11:  0  0  0  0   IO-APIC  11-fasteoi   
uhci_hcd:usb1, virtio3
 24:  0  0  0  0   PCI-MSI 114688-edge  
virtio2-config
 25:   3544  0  0  0   PCI-MSI 114689-edge  
virtio2-req.0
 26: 10  0  0  0   PCI-MSI 49152-edge  
virtio0-config
 27:852  0  0  0   PCI-MSI 49153-edge  
virtio0-input.0
 28:  3  0  0  0   PCI-MSI 49154-edge  
virtio0-output.0
 29: 10  0  0  0   PCI-MSI 65536-edge  
virtio1-config
 30:172  0  0      0   PCI-MSI 65537-edge  
virtio1-input.0
 31:  1  0  0  0   PCI-MSI 65538-edge  
virtio1-output.0

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-16 Thread Pavel Fedin
 Hello!

> I can reproduce your issue on my side with above patch (and only when
> F_GUEST_ANNOUNCE is not set at DPDK vhost lib). TBH, I don't know
> why that happened, the cause could be subtle, and I don't think it's
> worthwhile to dig it, especially it's not the right way to do it.

 May be not right, may be it can be done... Actually, i found what was wrong. 
qemu tries to feed features back to vhost-user via
VHOST_USER_SET_FEATURES, and DPDK barfs on the unknown bit. More tweaking is 
needed for qemu to do the trick correctly.

> So, would you please try to set the F_GUEST_ANNOUNCE flag on DPDK vhost
> lib side, as my early diff showed and have another test?

 Tried it, works fine, thank you.
 I have almost implemented the workaround in qemu... However now i start to 
think that you are right. Theoretically, the application
may want to suppress GUEST_ANNOUNCE for some reason. So, let it stay this way. 
Please include this bit into your v2.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-16 Thread Pavel Fedin
 Hello!

> 1. if vhost-user backend (or say, DPDK) supports GUEST_ANNOUNCE, and
>send another RARP (or say, GARP, I will use RARP as example),
>then there will be two RARP later on the line, right? (since the
>QEMU one is sent unconditionally from qemu_announce_self).

 qemu_announce_self() is NOT unconditional. It applies only to emulated 
physical NICs and bypasses virtio/vhost. So it will not send anything at all 
for vhost-user.

> 2. if the only thing vhost-user backend is to send another same RARP
>when got SEND_RARP request, why would it bother if QEMU will
>unconditionally send one?

 See above, it won't send one.
 It looks to me like qemu_announce_self() is just a poor man's solution which 
even doesn't always work (because GARP should reassociate an existing IP with 
new MAC, shouldn't it? and qemu doesn't know the IP and just sets both src and 
dst to 0.0.0.0).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-15 Thread Pavel Fedin
 Hello!

> After a migration, to avoid netwotk outage, all interfaces of the guest must 
> send a packet to update switches mapping (ideally a GARP).
> As some interfaces do not do it QEMU does it in behalf of the guest by 
> sending a RARP (his RARP is not forged by the guest but by QEMU). This is the
> qemu_self_announce purpose that "spoofs" a RARP to all backend of guest 
> ethernet interfaces. For vhost-user backend, QEMU can not do it directly

 Aha, see it now. qemu_announce_self() uses qemu_foreach_nic(), which actually 
iterates only over NET_CLIENT_OPTIONS_KIND_NIC interfaces. I expect these are 
fully emulated hardware controllers. virtio uses another type (see enum 
NetClientOptionsKind).
 So, we can happily ignore qemu_announce_self(), it does not do anything for 
us. Thanks for pointing it out.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-15 Thread Pavel Fedin
ltekU_3b:83:1a -> BroadcastARP 42 Who has 192.168.6.1?  
Tell 192.168.6.2
 14  22.538943 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at 
be:e1:71:c1:47:4d
 15  23.540937 RealtekU_3b:83:1a -> BroadcastARP 42 Who has 192.168.6.1?  
Tell 192.168.6.2
 16  23.540942 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at 
be:e1:71:c1:47:4d
 17  25.537519 RealtekU_3b:83:1a -> BroadcastARP 42 Who has 192.168.6.1?  
Tell 192.168.6.2
 18  25.537525 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at 
be:e1:71:c1:47:4d
 19  26.538939 RealtekU_3b:83:1a -> BroadcastARP 42 Who has 192.168.6.1?  
Tell 192.168.6.2
 20  26.538944 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at 
be:e1:71:c1:47:4d
 21  27.540937 RealtekU_3b:83:1a -> BroadcastARP 42 Who has 192.168.6.1?  
Tell 192.168.6.2
 22  27.540942 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at 
be:e1:71:c1:47:4d
 23  29.538475 RealtekU_3b:83:1a -> BroadcastARP 42 Who has 192.168.6.1?  
Tell 192.168.6.2
 24  29.538482 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at 
be:e1:71:c1:47:4d
 25  30.538935 RealtekU_3b:83:1a -> BroadcastARP 42 Who has 192.168.6.1?  
Tell 192.168.6.2
 26  30.538941 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at 
be:e1:71:c1:47:4d
 27  31.540935 RealtekU_3b:83:1a -> BroadcastARP 42 Who has 192.168.6.1?  
Tell 192.168.6.2
 28  31.540941 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at 
be:e1:71:c1:47:4d
^C28 packets captured

 Obviously, the guest simply doesn't read incoming packets. ifconfig for the 
interface on guest side shows:

RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 9 overruns 0 frame 9

 BTW, number 9 exactly matches the number of ARP replies from the host. The 
question is - why? Looks like guest behavior changes
somehow. Is it a bug in guest? It's very strange, because in these sessions i 
see only one difference in IPv6 packets:

  4   0.858957   :: -> ff02::1:ff3b:831a ICMPv6 78 Neighbor 
Solicitation for fe80::5054:ff:fe3b:831a

This is present in session #1 and missing from session #2. Can it affect the 
whole thing somehow? But i don't even use IPv6.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] problem vhost-user sockets

2015-12-15 Thread Pavel Fedin
 Hello!

> I'm thinking you can't simply unlink a file given by a user inside
> a libraray unconditionaly. Say, what if a user gives a wrong socket
> path?

 Well... We can improve the security by checking that:

a) The file exists and it's a socket.
b) Nobody is listening on it.

> I normally write a short script to handle it automatically.

 I know, you can always hack up some kludges, just IMHO it's not 
production-grade solution. What if you are cloud administrator, and
you have 1000 users, each of them using 100 vhost-user interfaces? List all of 
them in some script? Too huge job, i would say.
 And without it the thing just appears to be too fragile, requiring manual 
maintenance after a single stupid failure.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-15 Thread Pavel Fedin
 Hello!

> >  Wrong. I tried to unconditionally enforce it in qemu (my guest does 
> > support it), and the
> link stopped working at all. I don't understand why.
> 
> I'm wondering how did you do that? Why do you need enforece it in QEMU?
> Isn't it already supported so far?

 I mean - qemu first asks vhost-user server (ovs/DPDK in our case) about 
capabilities, then negotiates them with the guest. And DPDK
doesn't report VIRTIO_NET_F_GUEST_ANNOUNCE, so i just ORed this flag in qemu 
before the negotiation with guest (because indeed my
logic says that the host should not do anything special about it). So the 
overall effect is the same as in your patch

> diff --git a/lib/librte_vhost/virtio-net.c
> b/lib/librte_vhost/virtio-net.c
> index 03044f6..0ba5045 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root;
>  #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> (1ULL << VIRTIO_NET_F_CTRL_RX) | \
> +   (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
> (VHOST_SUPPORTS_MQ)| \
> (1ULL << VIRTIO_F_VERSION_1)   | \
> (1ULL << VHOST_F_LOG_ALL)  | \

 But i was somehow wrong and this causes the whole thing to stop working 
instead. Even after just booting up the network doesn't
work and PINGs do not pass.

> However, I found the GARP is not sent out at all, due to an error
> I met and reported before:
> 
> KVM: injection failed, MSI lost (Operation not permitted)

 Interesting, i don't have this problem here. Some bug in your kernel/hardware?

> One thing worth noting is that it happened only when I did live migration
> on two different hosts (the two hosts happened to be using a same old
> kernel: v3.11.10).  It works pretty well on same host. So, seems like
> a KVM bug then?

 3.18.9 here and no this problem.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] problem vhost-user sockets

2015-12-15 Thread Pavel Fedin
 Hello!

 I have a question regarding vhostuser. If we cannot bind to a socket, why do 
we simply fail with error instead of just unlink()ing
the path before binding?

 This causes a very annoying problem with ovs. After ovs is stopped (i use 
supplied system integration), these sockets are not
removed. Looks like ovs just exits without correct cleanup. This effectively 
causes my vhostuser interfaces to go down until i clean
them up manually. And i have to do it after every ovs restart, every system 
reboot, etc. It is very annoying.
 I understand that the app should really do correct cleanup upon exit. But what 
if it abnormally crashes because of some reason
(bug, attack, etc)? Shouldn't it be able to automatically recover?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia





[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-15 Thread Pavel Fedin
 Hello!

> After a migration, to avoid network outage, the guest must announce its new 
> location to the L2 layer, typically with a GARP. Otherwise requests sent to
> the guest arrive to the old host until a ARP request is sent (after 30 
> seconds) or the guest sends some data.
> QEMU implementation of self announce after a migration with a vhost backend 
> is the following:
> - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest sends 
> automatically a GARP.
> - Else if the vhost backend implements VHOST_USER_SEND_RARP this request is 
> sent to the vhost backend. When this message is received the vhost backend
> must act as it receives a RARP from the guest (purpose of this RARP is to 
> update switches' MAC->port maaping as a GARP). This RARP is a false one,
> created by the vhost backend,
> - Else nothing is done and we have a network outage until a ARP is sent or 
> the guest sends some data.

 But what is qemu_announce_self() then? It's just unconditionally triggered 
after migration, but indeed sends some strange thing.

> VIRTIO_GUEST_ANNOUNCE feature is negotiated if:
>  - the vhost backend announces the support of this feature. Maybe QEMU can be 
> updated to support unconditionnaly this feature

 Wrong. I tried to unconditionally enforce it in qemu (my guest does support 
it), and the link stopped working at all. I don't understand why.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-15 Thread Pavel Fedin
 Hello!

> Note quite sure. I found Thibaut submitted a patch to send
> VHOST_USER_SEND_RARP request after migration is done months
> ago. Thibaut, would you please elaborate it a bit more what
> should be done on vhost-user backend? To construct a gratuitous
> ARP request and broadcast it?

 By the way, some more info for you all.
1. I've just examined qemu_announce_self() and i see that IPs are all set to 0 
in the packet it generates. It's quite logical
because qemu has no idea what address is used by the guest, even more, 
theoretically it could be not IPv4 at all. But then - how can
it work at all, and what's the use for this packet?
2. I tried to work around if by adding VIRTIO_NET_F_GUEST_ANNOUNCE. I expected 
that the guest will see it and make announcement by
itself. But result was quite the opposite - PING stopped working at all, right 
from the beginning, even without migration.

 Can local qemu/DPDK/etc gurus give some explanation?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-15 Thread Pavel Fedin
 Hello!

> I mean I do find that qemu_annouce_self composes an ARP
> broadcoast request, but it seems that I didn't catch it on
> the target host.
> 
> Something wrong, or someting I missed?

 To tell the truth, i don't know. I am also learning qemu internals on the fly. 
Indeed, i see that it should announce itself. But
this brings up a question: why do we need special announce procedure in 
vhost-user then?
 I think you can add some debug output and see how it works in realtime. This 
is what i normally do when i don't understand in which
sequence things happen.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-14 Thread Pavel Fedin
 Hello!

> > I _guess_ the problem for ping might be: guest ARP entry for
> > 192.168.100.1 is not updated. Or say, after guest migrated to host2
> > from host1, guest is still trying to send packet to host1's NIC (no
> > one is telling it to update, right?), so no one is responding the
> > ping. When the entry is expired, guest will resend the ARP request,
> > and host2 will respond this time, with mac address on host2 provided
> > this time. After that, ping works again.
> 
> Peter,
> 
> Thanks for your input, and that sounds reasonable. You just reminded
> me that the host1's NIC is indeed different with host2's NIC: the ovs
> bridge mac address is different.

 Yes, this is indeed what is happening, and actually i already wrote about it. 
In wireshark it looks exactly like that: the some
PINGs are sent without replies, then the guest redoes ARP, PING replies resume.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-14 Thread Pavel Fedin
 Hello!

> > Host<--->openVSwitch<--->guest
> >   LOCAL   vhostuser
> >
> >  So, in order to migrate the guest, i simply replicated this setup on both 
> > hosts, with the
> same IPs on host side. And on both hosts i set up the following ruleset for 
> openvswitch:
> 
> Regarding to "with the same IPs on host side": do you mean that you
> configured the same IP on two hosts in the intranet?

 No intranet. You can think of it as an isolated network between the host and 
guest, and that's all. I just assigned an IP to ovs' LOCAL interface on both 
hosts, and these ovs instances knew nothing about each other, neither they 
forwarded packets between each other. I didn't want to make things 
overcomplicated and decided not to mess with host's own connection to the 
intranet, just something that sits on the other side of vhost-user and replies 
to PINGs was perfectly OK for me.

> I think this
> does not matter if we are testing it functionally (whether live
> migration could work), However I would still perfer to try ping
> another host (say, host3) inside the intranet. What do you think?

 Yes, perhaps this would be better test, may be next time i'll do it. Anyway, 
IIRC, PATCH v2 is coming.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-14 Thread Pavel Fedin
 Hello!

> When doing the ping, was it from the guest (to another host) or to
> the guest (from another host)?
> 
> In any case, I still could not understand why the ping loss happened
> in this test.
> 
> If ping from guest, no ARP refresh is required at all?

 ping from guest to host.

 Ok, my setup was:

Host<--->openVSwitch<--->guest
  LOCAL   vhostuser

 So, in order to migrate the guest, i simply replicated this setup on both 
hosts, with the same IPs on host side. And on both hosts i set up the following 
ruleset for openvswitch:

ovs-ofctl add-flow ovs-br0 in_port=1,actions=output:LOCAL
ovs-ofctl add-flow ovs-br0 in_port=LOCAL,actions=output:1

 And on the second host, for some reason, vhostuser port got no 2 in the 
database instead of 1. Probably because first i added wrong port, then added 
correct one, then removed the wrong one. So, as i wrote before - please don't 
worry, the patch works fine, it was totally my lame fault.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-11 Thread Pavel Fedin
 Hello!

> On Fri, Dec 11, 2015 at 11:26:55AM +0300, Pavel Fedin wrote:
> >  Hello!
> >
> >  I am currently testing this patchset with qemu and have problems.
> 
> Hi,
> 
> Thanks for testing!

 Not at all :)

 BTW, it works, and it was my bad. openvswitch was configured incorrectly on 
the other side, vhost port number was different for
some reason, while ruleset was the same. I reconfigured it and now everything 
migrates correctly, except increased downtime because
of missing GARP (the guest misses some PINGs, then it retries ARP, which brings 
the link back up).

 Tested-by: Pavel Fedin 

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support

2015-12-11 Thread Pavel Fedin
 Hello!

 I am currently testing this patchset with qemu and have problems.

 The guest migrates correctly, but after the migration it cries in the log:

Vhost user backend fails to broadcast fake RARP

 and pinging the (new) host doesn't work. When i migrate it back to the old 
host, the network resumes working.

 I have analyzed the code, and this problem happens because we support neither 
VHOST_USER_PROTOCOL_F_RARP, nor
VIRTIO_NET_F_GUEST_ANNOUNCE. Since the latter seems to be related only to 
guest, i simply enabled it in qemu by force, and after
this the network doesn't work at all.
 Can anybody help me and explain how the thing works? I expected that 
gratuitous ARP packets are harmless, but they seem to break
things somehow. And what was used for testing the implementation?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia