[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Tan, Jianfeng
Hello!

On 1/12/2016 10:45 PM, Amit Tomer wrote:
> Hello,
>
> I run l2fwd from inside docker with following logs:
>
> But, don't see Port statistics gets updated ?
>

In vhost-switch, it judges if a virtio device is ready for processing 
after receiving
a pkt from virtio device. So you'd better construct a pkt, and send it 
out firstly
in l2fwd.

Thanks,
Jianfeng


[dpdk-dev] [PATCH v4 0/6] vmxnet3 TSO, tx cksum offload and cleanups

2016-01-12 Thread Stephen Hemminger
On Tue, 12 Jan 2016 18:08:31 -0800
Yong Wang  wrote:

> v4:
> * moved cleanups to separate patches
> * correctly handled multi-seg pkts with data ring used
> 
> v3:
> * fixed comments from Stephen
> * added performance number for tx data ring
> 
> v2:
> * fixed some logging issues when debug option turned on
> * updated the txq_flags check in vmxnet3_dev_tx_queue_setup()
> 
> This patchset adds TCP/UDP checksum offload and TSO to vmxnet3 PMD.
> One of the use cases is to support STT.  It also restores the tx
> data ring feature that was removed from a previous patch.
> 
> Yong Wang (6):
>   vmxnet3: fix typos and remove unused struct
>   vmxnet3: restore tx data ring support
>   vmxnet3: cleanup txNumDeferred usage
>   vmxnet3: add tx l4 cksum offload
>   vmxnet3: add TSO support
>   vmxnet3: announce device offload capability
> 
>  doc/guides/rel_notes/release_2_3.rst|  11 +++
>  drivers/net/vmxnet3/base/includeCheck.h |  39 
>  drivers/net/vmxnet3/base/vmxnet3_defs.h |   9 +-
>  drivers/net/vmxnet3/vmxnet3_ethdev.c|  16 +++-
>  drivers/net/vmxnet3/vmxnet3_ring.h  |  13 ---
>  drivers/net/vmxnet3/vmxnet3_rxtx.c  | 160 
> +---
>  6 files changed, 151 insertions(+), 97 deletions(-)
>  delete mode 100644 drivers/net/vmxnet3/base/includeCheck.h
> 

Looks good. The only thing maybe worth adding would be some more checks
int the vmxnet3_dev_configure for unsupported offload bits, etc.

Acked-by: Stephen Hemminger 


[dpdk-dev] [PATCH v3 1/4] vmxnet3: restore tx data ring support

2016-01-12 Thread Stephen Hemminger
On Wed, 13 Jan 2016 02:20:01 +
Yong Wang  wrote:

> >Good idea to use a local region which optmizes the copy in the host,
> >but this implementation needs to be more general.
> >
> >As written it is broken for multi-segment packets. A multi-segment
> >packet will have a pktlen >= datalen as in:
> >  m -> mb_segs=3, pktlen=1200, datalen=200  
> >-> datalen=900
> >-> datalen=100  
> >
> >There are two ways to fix this. You could test for nb_segs == 1
> >or better yet. Optimize each segment it might be that the first
> >segment (or tail segment) would fit in the available data area.  
> 
> Currently the vmxnet3 backend has a limitation of 128B data area so
> it should work even for the multi-segmented pkt shown above. But
> I agree it does not work for all multi-segmented packets.  The
> following packet will be such an example.
> 
> m -> nb_segs=3, pktlen=128, datalen=64
> -> datalen=32
> -> datalen=32  
> 
> 
> It?s unclear if/how we might get into such a multi-segmented pkt
> but I agree we should handle this case.  Patch updated taking the
> simple approach (checking for nb_segs == 1).  I?ll leave the
> optimization as a future patch.

Such a packet can happen when adding a tunnel header such as VXLAN
and the underlying packet is shared (refcnt > 1) or does not have
enough headroom for the tunnel header.


[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Amit Tomer
Hello,

> In vhost-switch, it judges if a virtio device is ready for processing after
> receiving
> a pkt from virtio device. So you'd better construct a pkt, and send it out
> firstly
> in l2fwd.

I tried to ping the socket interface from host for the same purpose
but it didn't work.

Could you please suggest some other approach for achieving same(how
pkt can be sent out to l2fwd)?

Also, before trying this, I have verified that vhost-switch is working
ok with testpmd .

Thanks,
Amit.


[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Amit Tomer
Hello,

>  Have you applied all three fixes discussed here?

I am running it with, only RFC patches applied with "--no-huge" in l2fwd.

Thanks
Amit.


[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Amit Tomer
Hello,

I run l2fwd from inside docker with following logs:

But, don't see Port statistics gets updated ?

#/home/ubuntu/dpdk# sudo docker run -i -t -v
/home/ubuntu/dpdk/usvhost:/usr/src/dpdk/usvhost l4
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 5 on socket 0
EAL: Detected lcore 6 as core 6 on socket 0
EAL: Detected lcore 7 as core 7 on socket 0
EAL: Detected lcore 8 as core 8 on socket 0
EAL: Setting up physically contiguous memory...
EAL: TSC frequency is ~9 KHz
EAL: Master lcore 1 is ready (tid=b5968000;cpuset=[1])
Notice: odd number of ports in portmask.
Lcore 1: RX port 0
Initializing port 0... done:
Port 0, MAC address: F6:9F:7A:47:A4:99

Checking link statusdone
Port 0 Link Up - speed 1 Mbps - full-duplex
L2FWD: entering main loop on lcore 1
L2FWD:  -- lcoreid=1 portid=0


Port statistics 
Statistics for port 0 --
Packets sent:0
Packets received:0
Packets dropped: 0
Aggregate statistics ===
Total packets sent:  0
Total packets received:  0
Total packets dropped:   0


Host side logs after running

# ./vhost-switch -c 0x3 f -n 4 --socket-mem 2048 --huge-dir
/dev/hugepages -- -p 0x1  --dev-basename usvhost

PMD: eth_ixgbe_dev_init(): MAC: 4, PHY: 3
PMD: eth_ixgbe_dev_init(): port 1 vendorID=0x8086 deviceID=0x1528
pf queue num: 0, configured vmdq pool num: 64, each vmdq pool has 2 queues
VHOST_PORT: Max virtio devices supported: 64
VHOST_PORT: Port 0 MAC: d8 9d 67 ee 55 f0
VHOST_PORT: Skipping disabled port 1
VHOST_DATA: Procesing on Core 1 started
VHOST_CONFIG: socket created, fd:20
VHOST_CONFIG: bind to usvhost
VHOST_CONFIG: new virtio connection is 21
VHOST_CONFIG: new device, handle is 0
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
VHOST_CONFIG: mapped region 0 fd:22 to 0x7f3400 sz:0x400 off:0x0
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:23
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:0 file:24
VHOST_CONFIG: virtio isn't ready for processing.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:1 file:26
VHOST_CONFIG: virtio is now ready for processing.
VHOST_DATA: (0) Device has been added to data core 1

Could anyone please point out, how it can be tested further(how can
traffic be sent across host and container)  ?

Thanks,
Amit.

On Tue, Jan 12, 2016 at 4:18 PM, Pavel Fedin  wrote:
>  Hello!
>
>> Your guess makes sense because current implementation does not support
>> multi-queues.
>>
>>  From you log, only 0 and 1 are "ready for processing"; others are "not
>> ready for processing".
>
>  Yes, and if study it even more carefully, we see that we initialize all tx 
> queues but only a single rx queue (#0).
>  After some more code browsing and comparing the two patchsets i figured out 
> that the problem is caused by inappropriate VIRTIO_NET_F_CTRL_VQ flag. In 
> your RFC you used different capability set, while in v1 you seem to have 
> forgotten about this.
>  I suggest to temporarily move hw->guest_features assignment out of 
> virtio_negotiate_features() into the caller, where we have eth_dev->dev_type, 
> and can choose the right set depending on it.
>
>  With all mentioned fixes i've got the ping running.
>  Tested-by: Pavel Fedin 
>
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
>
>


[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Tan, Jianfeng
Hello!

>   But in this case host gets this page size for total region size, therefore 
> qva_to_vva() fails.
>   I haven't worked with hugepages, but i guess that with real hugepages we 
> get one file per page, therefore page size == mapping size. With newly 
> introduced --single-file we now have something that pretends to be a single 
> "uber-huge-page", so we need to specify total size of the mapping here.

Oh I get it and recognize the problem here. The actual problem lies in 
the API rte_eal_get_backfile_info().
backfiles[i].size = hugepage_files[i].size;
Should use statfs or hugepage_files[i].size * hugepage_files[i].repeated 
to calculate the total size.

>
>   BTW, i'm still unhappy about ABI breakage here. I think we could easily add 
> --shared-mem option, which would simply change mapping mode to SHARED. So, we 
> could use it with both hugepages (default) and plain mmap (with 
> --no-hugepages).

You mean, use "--no-hugepages --shared-mem" together, right?
That makes sense to me.

Thanks,
Jianfeng

>
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
>
>



[dpdk-dev] [RFC] cryptodev: Change burst APIs to crypto operation oriented

2016-01-12 Thread Declan Doherty
In this rfc I'm looking to get some feedback on a proposal to change the 
cryptodev burst API from the current implementation of accepting burst 
of rte_mbuf's to a burst API based on rte_crypto_op's.

-static inline uint16_t
-rte_cryptodev_dequeue_burst(uint8_t dev_id, uint16_t qp_id,
-   struct rte_mbuf **pkts, uint16_t nb_pkts)
+static inline uint16_t
+rte_cryptodev_dequeue_op_burst(uint8_t dev_id, uint16_t qp_id,
+   struct rte_crypto_op **ops, uint16_t nb_ops)


-static inline uint16_t
-rte_cryptodev_dequeue_burst(uint8_t dev_id, uint16_t qp_id,
-   struct rte_mbuf **pkts, uint16_t nb_pkts)
+ static inline uint16_t
+rte_cryptodev_dequeue_op_burst(uint8_t dev_id, uint16_t qp_id,
+   struct rte_crypto_op **ops, uint16_t nb_ops)


The motivation for these changes are to address the concerns
raise in the discussion of the rte_mbuf_offload library patch
(http://dpdk.org/ml/archives/dev/2015-November/028247.html) by both 
Thomas and Olivier. By changing to an API which accepts bursts of 
rte_crypto_op structures we are no longer need to have a specific field 
in the rte_mbuf for offload operations and instead with a small 
modification to the rte_crypto_op structure by adding a field for the 
source rte_mbuf on which the crypto operation is going to be performed 
the same functionality can be performed. This will break the current 
dependency between the rte_mbuf and the rte_mbuf_offload library and by 
proxy the rte_cyptodev library.

struct rte_crypto_op {
 enum rte_crypto_op_sess_type type;
 enum rte_crypto_op_status status;

+   struct rte_mbuf *m_src; /**< source mbuf */
struct rte_mbuf *m_dst; /**< Destination mbuf */


}

Another advantage of this approach is that it simplifies and speeds up
the processing of bursts within crypto PMDs as they no longer have to 
search for the crypto operation within the rte_mbuf_offload structure 
and can instead just operate on the crypto operation directly.


Regarding the rte_mbuf_offload library I think that it should be removed 
and that we can look adding a more general solution for managing 
external metadata to the rte_mbuf library when that functionality is 
required.


[dpdk-dev] [PATCH v4 6/6] vmxnet3: announce device offload capability

2016-01-12 Thread Yong Wang
Signed-off-by: Yong Wang 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index d90e62f..8a40127 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -693,7 +693,8 @@ vmxnet3_dev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
 }

 static void
-vmxnet3_dev_info_get(__attribute__((unused))struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
+vmxnet3_dev_info_get(__attribute__((unused))struct rte_eth_dev *dev,
+struct rte_eth_dev_info *dev_info)
 {
dev_info->max_rx_queues = VMXNET3_MAX_RX_QUEUES;
dev_info->max_tx_queues = VMXNET3_MAX_TX_QUEUES;
@@ -716,6 +717,17 @@ vmxnet3_dev_info_get(__attribute__((unused))struct 
rte_eth_dev *dev, struct rte_
.nb_min = VMXNET3_DEF_TX_RING_SIZE,
.nb_align = 1,
};
+
+   dev_info->rx_offload_capa =
+   DEV_RX_OFFLOAD_VLAN_STRIP |
+   DEV_RX_OFFLOAD_UDP_CKSUM |
+   DEV_RX_OFFLOAD_TCP_CKSUM;
+
+   dev_info->tx_offload_capa =
+   DEV_TX_OFFLOAD_VLAN_INSERT |
+   DEV_TX_OFFLOAD_TCP_CKSUM |
+   DEV_TX_OFFLOAD_UDP_CKSUM |
+   DEV_TX_OFFLOAD_TCP_TSO;
 }

 /* return 0 means link status changed, -1 means not changed */
-- 
1.9.1



[dpdk-dev] [PATCH v4 5/6] vmxnet3: add TSO support

2016-01-12 Thread Yong Wang
This commit adds vmxnet3 TSO support.

Verified with test-pmd (set fwd csum) that both tso and
non-tso pkts can be successfully transmitted and all
segmentes for a tso pkt are correct on the receiver side.

Signed-off-by: Yong Wang 
---
 doc/guides/rel_notes/release_2_3.rst |   3 +
 drivers/net/vmxnet3/vmxnet3_rxtx.c   | 108 ++-
 2 files changed, 84 insertions(+), 27 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 58205fe..ae487bb 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -24,6 +24,9 @@ Drivers

   Support TCP/UDP checksum offload.

+* **vmxnet3: add TSO support.**
+
+
 Libraries
 ~

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 2c1bc3c..103294a 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -295,27 +295,45 @@ vmxnet3_dev_clear_queues(struct rte_eth_dev *dev)
}
 }

+static int
+vmxnet3_unmap_pkt(uint16_t eop_idx, vmxnet3_tx_queue_t *txq)
+{
+   int completed = 0;
+   struct rte_mbuf *mbuf;
+
+   /* Release cmd_ring descriptor and free mbuf */
+   VMXNET3_ASSERT(txq->cmd_ring.base[eop_idx].txd.eop == 1);
+
+   mbuf = txq->cmd_ring.buf_info[eop_idx].m;
+   if (mbuf == NULL)
+   rte_panic("EOP desc does not point to a valid mbuf");
+   rte_pktmbuf_free(mbuf);
+
+   txq->cmd_ring.buf_info[eop_idx].m = NULL;
+
+   while (txq->cmd_ring.next2comp != eop_idx) {
+   /* no out-of-order completion */
+   
VMXNET3_ASSERT(txq->cmd_ring.base[txq->cmd_ring.next2comp].txd.cq == 0);
+   vmxnet3_cmd_ring_adv_next2comp(>cmd_ring);
+   completed++;
+   }
+
+   /* Mark the txd for which tcd was generated as completed */
+   vmxnet3_cmd_ring_adv_next2comp(>cmd_ring);
+
+   return completed + 1;
+}
+
 static void
 vmxnet3_tq_tx_complete(vmxnet3_tx_queue_t *txq)
 {
int completed = 0;
-   struct rte_mbuf *mbuf;
vmxnet3_comp_ring_t *comp_ring = >comp_ring;
struct Vmxnet3_TxCompDesc *tcd = (struct Vmxnet3_TxCompDesc *)
(comp_ring->base + comp_ring->next2proc);

while (tcd->gen == comp_ring->gen) {
-   /* Release cmd_ring descriptor and free mbuf */
-   VMXNET3_ASSERT(txq->cmd_ring.base[tcd->txdIdx].txd.eop == 1);
-   while (txq->cmd_ring.next2comp != tcd->txdIdx) {
-   mbuf = 
txq->cmd_ring.buf_info[txq->cmd_ring.next2comp].m;
-   txq->cmd_ring.buf_info[txq->cmd_ring.next2comp].m = 
NULL;
-   rte_pktmbuf_free_seg(mbuf);
-
-   /* Mark the txd for which tcd was generated as 
completed */
-   vmxnet3_cmd_ring_adv_next2comp(>cmd_ring);
-   completed++;
-   }
+   completed += vmxnet3_unmap_pkt(tcd->txdIdx, txq);

vmxnet3_comp_ring_adv_next2proc(comp_ring);
tcd = (struct Vmxnet3_TxCompDesc *)(comp_ring->base +
@@ -351,21 +369,43 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
struct rte_mbuf *txm = tx_pkts[nb_tx];
struct rte_mbuf *m_seg = txm;
int copy_size = 0;
+   bool tso = (txm->ol_flags & PKT_TX_TCP_SEG) != 0;
+   /* # of descriptors needed for a packet. */
+   unsigned count = txm->nb_segs;

-   /* Is this packet execessively fragmented, then drop */
-   if (unlikely(txm->nb_segs > VMXNET3_MAX_TXD_PER_PKT)) {
-   ++txq->stats.drop_too_many_segs;
-   ++txq->stats.drop_total;
+   avail = vmxnet3_cmd_ring_desc_avail(>cmd_ring);
+   if (count > avail) {
+   /* Is command ring full? */
+   if (unlikely(avail == 0)) {
+   PMD_TX_LOG(DEBUG, "No free ring descriptors");
+   txq->stats.tx_ring_full++;
+   txq->stats.drop_total += (nb_pkts - nb_tx);
+   break;
+   }
+
+   /* Command ring is not full but cannot handle the
+* multi-segmented packet. Let's try the next packet
+* in this case.
+*/
+   PMD_TX_LOG(DEBUG, "Running out of ring descriptors "
+  "(avail %d needed %d)", avail, count);
+   txq->stats.drop_total++;
+   if (tso)
+   txq->stats.drop_tso++;
rte_pktmbuf_free(txm);
-   ++nb_tx;
+   nb_tx++;
continue;
}

-   /* Is command ring full? 

[dpdk-dev] [PATCH v4 4/6] vmxnet3: add tx l4 cksum offload

2016-01-12 Thread Yong Wang
Support TCP/UDP checksum offload.

Signed-off-by: Yong Wang 
---
 doc/guides/rel_notes/release_2_3.rst |  3 +++
 drivers/net/vmxnet3/vmxnet3_rxtx.c   | 26 +++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index a23c8ac..58205fe 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -20,6 +20,9 @@ Drivers
   Tx data ring has been shown to improve small pkt forwarding performance
   on vSphere environment.

+* **vmxnet3: add tx l4 cksum offload.**
+
+  Support TCP/UDP checksum offload.

 Libraries
 ~
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index f3af2f2..2c1bc3c 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -415,7 +415,27 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
gdesc->txd.tci = txm->vlan_tci;
}

-   /* TODO: Add transmit checksum offload here */
+   if (txm->ol_flags & PKT_TX_L4_MASK) {
+   gdesc->txd.om = VMXNET3_OM_CSUM;
+   gdesc->txd.hlen = txm->l2_len + txm->l3_len;
+
+   switch (txm->ol_flags & PKT_TX_L4_MASK) {
+   case PKT_TX_TCP_CKSUM:
+   gdesc->txd.msscof = gdesc->txd.hlen + 
offsetof(struct tcp_hdr, cksum);
+   break;
+   case PKT_TX_UDP_CKSUM:
+   gdesc->txd.msscof = gdesc->txd.hlen + 
offsetof(struct udp_hdr, dgram_cksum);
+   break;
+   default:
+   PMD_TX_LOG(WARNING, "requested cksum offload 
not supported %#llx",
+  txm->ol_flags & PKT_TX_L4_MASK);
+   abort();
+   }
+   } else {
+   gdesc->txd.hlen = 0;
+   gdesc->txd.om = VMXNET3_OM_NONE;
+   gdesc->txd.msscof = 0;
+   }

/* flip the GEN bit on the SOP */
rte_compiler_barrier();
@@ -729,8 +749,8 @@ vmxnet3_dev_tx_queue_setup(struct rte_eth_dev *dev,
PMD_INIT_FUNC_TRACE();

if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS) !=
-   ETH_TXQ_FLAGS_NOXSUMS) {
-   PMD_INIT_LOG(ERR, "TX no support for checksum offload yet");
+   ETH_TXQ_FLAGS_NOXSUMSCTP) {
+   PMD_INIT_LOG(ERR, "SCTP checksum offload not supported");
return -EINVAL;
}

-- 
1.9.1



[dpdk-dev] [PATCH v4 3/6] vmxnet3: cleanup txNumDeferred usage

2016-01-12 Thread Yong Wang
Signed-off-by: Yong Wang 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 4ccab0e..f3af2f2 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -332,6 +332,8 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_tx;
vmxnet3_tx_queue_t *txq = tx_queue;
struct vmxnet3_hw *hw = txq->hw;
+   Vmxnet3_TxQueueCtrl *txq_ctrl = >shared->ctrl;
+   uint32_t deferred = rte_le_to_cpu_32(txq_ctrl->txNumDeferred);

if (unlikely(txq->stopped)) {
PMD_TX_LOG(DEBUG, "Tx queue is stopped.");
@@ -419,15 +421,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
rte_compiler_barrier();
gdesc->dword[2] ^= VMXNET3_TXD_GEN;

-   txq->shared->ctrl.txNumDeferred++;
+   txq_ctrl->txNumDeferred = rte_cpu_to_le_32(++deferred);
nb_tx++;
}

-   PMD_TX_LOG(DEBUG, "vmxnet3 txThreshold: %u", 
txq->shared->ctrl.txThreshold);
+   PMD_TX_LOG(DEBUG, "vmxnet3 txThreshold: %u", 
rte_le_to_cpu_32(txq_ctrl->txThreshold));

-   if (txq->shared->ctrl.txNumDeferred >= txq->shared->ctrl.txThreshold) {
-
-   txq->shared->ctrl.txNumDeferred = 0;
+   if (deferred >= rte_le_to_cpu_32(txq_ctrl->txThreshold)) {
+   txq_ctrl->txNumDeferred = 0;
/* Notify vSwitch that packets are available. */
VMXNET3_WRITE_BAR0_REG(hw, (VMXNET3_REG_TXPROD + txq->queue_id 
* VMXNET3_REG_ALIGN),
   txq->cmd_ring.next2fill);
-- 
1.9.1



[dpdk-dev] [PATCH v4 2/6] vmxnet3: restore tx data ring support

2016-01-12 Thread Yong Wang
Tx data ring support was removed in a previous change that
added multi-seg transmit.  This change adds it back.

According to the original commit (2e849373), 64B pkt
rate with l2fwd improved by ~20% on an Ivy Bridge
server at which point we start to hit some bottleneck
on the rx side.

I also re-did the same test on a different setup (Haswell
processor, ~2.3GHz clock rate) on top of the master
and still observed ~17% performance gains.

Fixes: 7ba5de417e3c ("vmxnet3: support multi-segment transmit")

Signed-off-by: Yong Wang 
---
 doc/guides/rel_notes/release_2_3.rst |  5 +
 drivers/net/vmxnet3/vmxnet3_rxtx.c   | 17 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..a23c8ac 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -15,6 +15,11 @@ EAL
 Drivers
 ~~~

+* **vmxnet3: restore tx data ring.**
+
+  Tx data ring has been shown to improve small pkt forwarding performance
+  on vSphere environment.
+

 Libraries
 ~
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index a3154bc..4ccab0e 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -348,6 +348,7 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint32_t first2fill, avail, dw2;
struct rte_mbuf *txm = tx_pkts[nb_tx];
struct rte_mbuf *m_seg = txm;
+   int copy_size = 0;

/* Is this packet execessively fragmented, then drop */
if (unlikely(txm->nb_segs > VMXNET3_MAX_TXD_PER_PKT)) {
@@ -365,6 +366,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
break;
}

+   if (txm->nb_segs == 1 && rte_pktmbuf_pkt_len(txm) <= 
VMXNET3_HDR_COPY_SIZE) {
+   struct Vmxnet3_TxDataDesc *tdd;
+
+   tdd = txq->data_ring.base + txq->cmd_ring.next2fill;
+   copy_size = rte_pktmbuf_pkt_len(txm);
+   rte_memcpy(tdd->data, rte_pktmbuf_mtod(txm, char *), 
copy_size);
+   }
+
/* use the previous gen bit for the SOP desc */
dw2 = (txq->cmd_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT;
first2fill = txq->cmd_ring.next2fill;
@@ -377,7 +386,13 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
   transmit buffer size (16K) is greater than
   maximum sizeof mbuf segment size. */
gdesc = txq->cmd_ring.base + txq->cmd_ring.next2fill;
-   gdesc->txd.addr = RTE_MBUF_DATA_DMA_ADDR(m_seg);
+   if (copy_size)
+   gdesc->txd.addr = 
rte_cpu_to_le_64(txq->data_ring.basePA +
+   
txq->cmd_ring.next2fill *
+   sizeof(struct 
Vmxnet3_TxDataDesc));
+   else
+   gdesc->txd.addr = RTE_MBUF_DATA_DMA_ADDR(m_seg);
+
gdesc->dword[2] = dw2 | m_seg->data_len;
gdesc->dword[3] = 0;

-- 
1.9.1



[dpdk-dev] [PATCH v4 1/6] vmxnet3: fix typos and remove unused struct

2016-01-12 Thread Yong Wang
Signed-off-by: Yong Wang 
---
 drivers/net/vmxnet3/base/includeCheck.h | 39 -
 drivers/net/vmxnet3/base/vmxnet3_defs.h |  9 +---
 drivers/net/vmxnet3/vmxnet3_ethdev.c|  2 +-
 drivers/net/vmxnet3/vmxnet3_ring.h  | 13 ---
 drivers/net/vmxnet3/vmxnet3_rxtx.c  |  2 +-
 5 files changed, 3 insertions(+), 62 deletions(-)
 delete mode 100644 drivers/net/vmxnet3/base/includeCheck.h

diff --git a/drivers/net/vmxnet3/base/includeCheck.h 
b/drivers/net/vmxnet3/base/includeCheck.h
deleted file mode 100644
index 310cebe..000
--- a/drivers/net/vmxnet3/base/includeCheck.h
+++ /dev/null
@@ -1,39 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _INCLUDECHECK_H
-#define _INCLUDECHECK_H
-
-#include "vmxnet3_osdep.h"
-
-#endif /* _INCLUDECHECK_H */
diff --git a/drivers/net/vmxnet3/base/vmxnet3_defs.h 
b/drivers/net/vmxnet3/base/vmxnet3_defs.h
index 2b56574..68ae8b6 100644
--- a/drivers/net/vmxnet3/base/vmxnet3_defs.h
+++ b/drivers/net/vmxnet3/base/vmxnet3_defs.h
@@ -35,14 +35,7 @@
 #ifndef _VMXNET3_DEFS_H_
 #define _VMXNET3_DEFS_H_

-#define INCLUDE_ALLOW_USERLEVEL
-#define INCLUDE_ALLOW_VMKERNEL
-#define INCLUDE_ALLOW_DISTRIBUTE
-#define INCLUDE_ALLOW_VMKDRIVERS
-#define INCLUDE_ALLOW_VMCORE
-#define INCLUDE_ALLOW_MODULE
-#include "includeCheck.h"
-
+#include "vmxnet3_osdep.h"
 #include "upt1_defs.h"

 /* all registers are 32 bit wide */
diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index c363bf6..d90e62f 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -819,7 +819,7 @@ vmxnet3_dev_vlan_filter_set(struct rte_eth_dev *dev, 
uint16_t vid, int on)
else
VMXNET3_CLEAR_VFTABLE_ENTRY(hw->shadow_vfta, vid);

-   /* don't change active filter if in promiscious mode */
+   /* don't change active filter if in promiscuous mode */
if (rxConf->rxMode & VMXNET3_RXM_PROMISC)
return 0;

diff --git a/drivers/net/vmxnet3/vmxnet3_ring.h 
b/drivers/net/vmxnet3/vmxnet3_ring.h
index 612487e..15b19e1 100644
--- a/drivers/net/vmxnet3/vmxnet3_ring.h
+++ b/drivers/net/vmxnet3/vmxnet3_ring.h
@@ -130,18 +130,6 @@ struct vmxnet3_txq_stats {
uint64_ttx_ring_full;
 };

-typedef struct vmxnet3_tx_ctx {
-   int  ip_type;
-   bool is_vlan;
-   bool is_cso;
-
-   uint16_t evl_tag;   /* only valid when is_vlan == TRUE */
-   uint32_t eth_hdr_size;  /* only valid for pkts requesting tso or csum
-* offloading */
-   uint32_t ip_hdr_size;
-   uint32_t l4_hdr_size;
-} vmxnet3_tx_ctx_t;
-
 typedef struct vmxnet3_tx_queue {
struct vmxnet3_hw*hw;
struct vmxnet3_cmd_ring  cmd_ring;
@@ -155,7 +143,6 @@ typedef struct vmxnet3_tx_queue {
uint8_t  port_id;   /**< Device port 
identifier. */
 } vmxnet3_tx_queue_t;

-
 struct vmxnet3_rxq_stats {
uint64_t drop_total;
uint64_t drop_err;
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 4de5d89..a3154bc 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ 

[dpdk-dev] [PATCH v4 0/6] vmxnet3 TSO, tx cksum offload and cleanups

2016-01-12 Thread Yong Wang
v4:
* moved cleanups to separate patches
* correctly handled multi-seg pkts with data ring used

v3:
* fixed comments from Stephen
* added performance number for tx data ring

v2:
* fixed some logging issues when debug option turned on
* updated the txq_flags check in vmxnet3_dev_tx_queue_setup()

This patchset adds TCP/UDP checksum offload and TSO to vmxnet3 PMD.
One of the use cases is to support STT.  It also restores the tx
data ring feature that was removed from a previous patch.

Yong Wang (6):
  vmxnet3: fix typos and remove unused struct
  vmxnet3: restore tx data ring support
  vmxnet3: cleanup txNumDeferred usage
  vmxnet3: add tx l4 cksum offload
  vmxnet3: add TSO support
  vmxnet3: announce device offload capability

 doc/guides/rel_notes/release_2_3.rst|  11 +++
 drivers/net/vmxnet3/base/includeCheck.h |  39 
 drivers/net/vmxnet3/base/vmxnet3_defs.h |   9 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c|  16 +++-
 drivers/net/vmxnet3/vmxnet3_ring.h  |  13 ---
 drivers/net/vmxnet3/vmxnet3_rxtx.c  | 160 +---
 6 files changed, 151 insertions(+), 97 deletions(-)
 delete mode 100644 drivers/net/vmxnet3/base/includeCheck.h

-- 
1.9.1



[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Pavel Fedin
 Hello!

> Could anyone please point out, how it can be tested further(how can
> traffic be sent across host and container)  ?

 Have you applied all three fixes discussed here?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-12 Thread Pavel Fedin
Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for
some reason. This causes NULL dereference in qva_to_vva().

Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a
Signed-off-by: Pavel Fedin 
---
 lib/librte_vhost/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 0ba5045..3e7cec0 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -630,7 +630,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct 
vhost_vring_addr *addr)
struct vhost_virtqueue *vq;

dev = get_device(ctx);
-   if (dev == NULL)
+   if ((dev == NULL) || (dev->mem == NULL))
return -1;

/* addr->index refers to the queue index. The txq 1, rxq is 0. */
-- 
2.1.1



[dpdk-dev] [PATCH 4/4] virtio/vdev: add a new vdev named eth_cvio

2016-01-12 Thread Tan, Jianfeng

Hi Fedin,

On 1/12/2016 4:39 PM, Tan, Jianfeng wrote:
> Hi Fedin,
>
> On 1/12/2016 3:45 PM, Pavel Fedin wrote:
>>   Hello!
>>
>>   See inline
>>
>>> ...
>>>   }
>>>
>>> +struct rte_mbuf *m = NULL;
>>> +if (dev->dev_type == RTE_ETH_DEV_PCI)
>>> +vq->offset = (uintptr_t)>buf_addr;
>>> +#ifdef RTE_VIRTIO_VDEV
>>> +else {
>>> +vq->offset = (uintptr_t)>buf_physaddr;
>>   Not sure, but shouldn't these be swapped? Originally, for PCI 
>> devices, we used buf_physaddr.
>
> Oops, seems that you are right. I'm trying to figure out why I can 
> rx/tx pkts using the wrong version.
>

I figure out why. When we run apps without root privilege, mempool's 
elt_pa is assigned the same of elt_va_start. So it happens to be right 
value to translate addresses. But it's definitely a bug. Thanks for 
pointing this out.

Thanks,
Jianfeng




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Tan, Jianfeng

Hi!

On 1/12/2016 4:26 AM, Rich Lane wrote:
> On Sun, Jan 10, 2016 at 3:43 AM, Jianfeng Tan  > wrote:
>
> @@ -1157,6 +1180,20 @@ rte_eal_hugepage_init(void)
> mcfg->memseg[0].len = internal_config.memory;
> mcfg->memseg[0].socket_id = socket_id;
>
> +   hugepage =
> create_shared_memory(eal_hugepage_info_path(),
> +   sizeof(struct hugepage_file));
> +   hugepage->orig_va = addr;
> +   hugepage->final_va = addr;
> +   hugepage->physaddr = rte_mem_virt2phy(addr);
> +   hugepage->size = pagesize;
>
>
> Should this be "hugepage->size = internal_config.memory"? Otherwise 
> the vhost-user
> memtable entry has a size of only 2MB.

I don't think so. See the definition:

  47 struct hugepage_file {
  48 void *orig_va;  /**< virtual addr of first mmap() */
  49 void *final_va; /**< virtual addr of 2nd mmap() */
  50 uint64_t physaddr;  /**< physical addr */
  51 size_t size;/**< the page size */
  52 int socket_id;  /**< NUMA socket ID */
  53 int file_id;/**< the '%d' in HUGEFILE_FMT */
  54 int memseg_id;  /**< the memory segment to which page 
belongs */
  55 #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
  56 int repeated;   /**< number of times the page size 
is repeated */
  57 #endif
  58 char filepath[MAX_HUGEPAGE_PATH]; /**< path to backing file 
on filesystem */
  59 };

size stands for the page size instead of total size.

Thanks,
Jianfeng


[dpdk-dev] [PATCH v2 6/7] eal: pci: export pci_map_device

2016-01-12 Thread Yuanhan Liu
On Tue, Jan 12, 2016 at 04:40:43PM +0800, Yuanhan Liu wrote:
> On Tue, Jan 12, 2016 at 09:31:05AM +0100, David Marchand wrote:
> > On Tue, Jan 12, 2016 at 7:59 AM, Yuanhan Liu  > linux.intel.com>
> > wrote:
> > 
> > Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
> > invoke pci_map_device internally for us. From that point view, there
> > is no need to export pci_map_device.
> > 
> > However, for virtio pmd driver, which is designed to work without
> > binding UIO (or something similar first), pci_map_device() will fail,
> > which ends up with virtio pmd driver being skipped. Therefore, we can
> > not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.
> > 
> > Therefore, this patch exports pci_map_device, and let virtio pmd
> > call it when necessary.
> > 
> > 
> > Well, if you introduce map function, I suppose, for hotplug, you would need
> > unmap.
> 
> Good remind. Thanks. I will export pci_unmap_device as well.

And here you go.

--yliu

-- >8 --
>From aa3d9d0fa827781d1563fd4c06ba04a8fafdc41c Mon Sep 17 00:00:00 2001
From: Yuanhan Liu 
Date: Mon, 11 Jan 2016 16:51:35 +0800
Subject: [PATCH] eal: pci: export pci_[un]map_device

Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
invoke pci_map_device internally for us. From that point view, there
is no need to export pci_map_device.

However, for virtio pmd driver, which is designed to work without
binding UIO (or something similar first), pci_map_device() will fail,
which ends up with virtio pmd driver being skipped. Therefore, we can
not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.

Therefore, this patch exports pci_map_device, and let virtio pmd
call it when necessary.

Signed-off-by: Yuanhan Liu 
---
v2: - export pci_unmap_device as well

- Add few more comments about rte_eal_pci_map_device().
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |  4 ++--
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++
 lib/librte_eal/common/eal_common_pci.c  |  4 ++--
 lib/librte_eal/common/eal_private.h | 18 -
 lib/librte_eal/common/include/rte_pci.h | 27 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |  4 ++--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++
 7 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 6c21fbd..95c32c1 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -93,7 +93,7 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)

 /* Map pci device */
 int
-pci_map_device(struct rte_pci_device *dev)
+rte_eal_pci_map_device(struct rte_pci_device *dev)
 {
int ret = -1;

@@ -115,7 +115,7 @@ pci_map_device(struct rte_pci_device *dev)

 /* Unmap pci device */
 void
-pci_unmap_device(struct rte_pci_device *dev)
+rte_eal_pci_unmap_device(struct rte_pci_device *dev)
 {
/* try unmapping the NIC resources */
switch (dev->kdrv) {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..1b28170 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,10 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_pci_map_device;
+   rte_eal_pci_unmap_device;
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index dcfe947..96d5113 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -188,7 +188,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
pci_config_space_set(dev);
 #endif
/* map resources for devices that use igb_uio */
-   ret = pci_map_device(dev);
+   ret = rte_eal_pci_map_device(dev);
if (ret != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
@@ -254,7 +254,7 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr,

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)
/* unmap resources for devices that use igb_uio */
-   pci_unmap_device(dev);
+   rte_eal_pci_unmap_device(dev);

return 0;
}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 072e672..2342fa1 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -165,24 +165,6 @@ struct rte_pci_device;
 int pci_unbind_kernel_driver(struct rte_pci_device *dev);

 /**
- * Map this device
- *
- * This function is private to EAL.
- 

[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> I might be missing something obvious here but, aside from having memory
> SHARED which most DPDK apps using hugepages will have anyway, what is
> the backward compatibility issues that you see here?

 Heh, sorry once again for confusing. Indeed, with hugepages we always get 
MAP_SHARED. I missed that. So, we indeed need
--shared-mem only in addition to --no-huge.

 Backwards compatibility issue is stated in the description of PATCH 1/4:
--- cut ---
b. possible ABI break, originally, --no-huge uses anonymous memory
instead of file-backed way to create memory.
--- cut ---
 The patch unconditionally changes that to SHARED. That's all.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Tan, Jianfeng
Hi Fedin,

On 1/12/2016 4:39 PM, Pavel Fedin wrote:
>   Hello!
>
>> See my reply to "mem: add API to obstain memory-backed file info" for a 
>> workaround. With fixes for that and the TUNSETVNETHDRSZ issue I was able to
>> get traffic running over vhost-user.
>   With ovs or test apps? I still have problems with ovs after this. Packets 
> go from host to container, but not back. Here is host-side log (i added also 
> GPA display in order to debug the problem you pointed at):
> --- cut ---
> ...
> --- cut ---
>
>   Note that during multiqueue setup host state reverts back from "now ready 
> for processing" to "not ready for processing". I guess this is the reason for 
> the problem.

Your guess makes sense because current implementation does not support 
multi-queues.

 From you log, only 0 and 1 are "ready for processing"; others are "not 
ready for processing".

Thanks,
Jianfeng


> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
>
>



[dpdk-dev] [PATCH v2 6/7] eal: pci: export pci_map_device

2016-01-12 Thread Yuanhan Liu
On Tue, Jan 12, 2016 at 09:31:05AM +0100, David Marchand wrote:
> On Tue, Jan 12, 2016 at 7:59 AM, Yuanhan Liu 
> wrote:
> 
> Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
> invoke pci_map_device internally for us. From that point view, there
> is no need to export pci_map_device.
> 
> However, for virtio pmd driver, which is designed to work without
> binding UIO (or something similar first), pci_map_device() will fail,
> which ends up with virtio pmd driver being skipped. Therefore, we can
> not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.
> 
> Therefore, this patch exports pci_map_device, and let virtio pmd
> call it when necessary.
> 
> 
> Well, if you introduce map function, I suppose, for hotplug, you would need
> unmap.

Good remind. Thanks. I will export pci_unmap_device as well.

> [snip]
> 
> 
> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/
> common/include/rte_pci.h
> index 334c12e..e9e1725 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -485,6 +485,17 @@ int rte_eal_pci_read_config(const struct
> rte_pci_device *device,
> ? */
> ?int rte_eal_pci_write_config(const struct rte_pci_device *device,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?const void *buf, size_t len, off_t offset);
> +/**
> + * Map this device
> + *
> + * This function is private to EAL.
> + *
> + * @return
> + *? ?0 on success, negative on error and positive if no driver
> + *? ?is found for the device.
> + */
> +int rte_eal_pci_map_device(struct rte_pci_device *dev);
> +
> 
> 
> If you export it, then this can not be marked as private anymore.

Oops, a silly C error. Will fix it.

> Description could be better (I agree it was not that great before).
> And a little comment on when to call: driver should not set
> RTE_PCI_DRV_NEED_MAPPING flag if it wants to use it.

Good suggestion.

> The rest looks good to me.

Thanks.

--yliu


[dpdk-dev] [PATCH 4/4] virtio/vdev: add a new vdev named eth_cvio

2016-01-12 Thread Tan, Jianfeng
Hi Fedin,

On 1/12/2016 3:45 PM, Pavel Fedin wrote:
>   Hello!
>
>   See inline
>
>> ...
>>  }
>>
>> +struct rte_mbuf *m = NULL;
>> +if (dev->dev_type == RTE_ETH_DEV_PCI)
>> +vq->offset = (uintptr_t)>buf_addr;
>> +#ifdef RTE_VIRTIO_VDEV
>> +else {
>> +vq->offset = (uintptr_t)>buf_physaddr;
>   Not sure, but shouldn't these be swapped? Originally, for PCI devices, we 
> used buf_physaddr.

Oops, seems that you are right. I'm trying to figure out why I can rx/tx 
pkts using the wrong version.

>>   #define VIRTIO_READ_REG_1(hw, reg) \
>> -(hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>> +((hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>>  inb((VIRTIO_PCI_REG_ADDR((hw), (reg \
>> -:virtio_ioport_read(hw, reg)
>> +:virtio_ioport_read(hw, reg))
>>   #define VIRTIO_WRITE_REG_1(hw, reg, value) \
>> -(hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>> +((hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>>  outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
>> -:virtio_ioport_write(hw, reg, value)
>> +:virtio_ioport_write(hw, reg, value))
>>
>>   #define VIRTIO_READ_REG_2(hw, reg) \
>> -(hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>> +((hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>>  inw((VIRTIO_PCI_REG_ADDR((hw), (reg \
>> -:virtio_ioport_read(hw, reg)
>> +:virtio_ioport_read(hw, reg))
>>   #define VIRTIO_WRITE_REG_2(hw, reg, value) \
>> -(hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>> +((hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>>  outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
>> -:virtio_ioport_write(hw, reg, value)
>> +:virtio_ioport_write(hw, reg, value))
>>
>>   #define VIRTIO_READ_REG_4(hw, reg) \
>> -(hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>> +((hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>>  inl((VIRTIO_PCI_REG_ADDR((hw), (reg \
>> -:virtio_ioport_read(hw, reg)
>> +:virtio_ioport_read(hw, reg))
>>   #define VIRTIO_WRITE_REG_4(hw, reg, value) \
>> -(hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>> +((hw->io_base != VIRTIO_VDEV_IO_BASE) ? \
>>  outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg \
>> -:virtio_ioport_write(hw, reg, value)
>> +:virtio_ioport_write(hw, reg, value))
>   These bracket fixups should be squashed into #3
>

I'll rewrite this into function pointers according to Yuanhan's patch 
for virtio 1.0.

Thanks,
Jianfeng



[dpdk-dev] [PATCH 4/4] virtio/vdev: add a new vdev named eth_cvio

2016-01-12 Thread Yuanhan Liu
On Tue, Jan 12, 2016 at 10:45:59AM +0300, Pavel Fedin wrote:
>  Hello!
> 
>  See inline

Hi,

Please strip unrelated context, so that people could reach to your
comments as quick as possible, otherwise, people could easily get
lost from the long patch.

> 
> > -Original Message-
> > From: Jianfeng Tan [mailto:jianfeng.tan at intel.com]
> > +   struct rte_mbuf *m = NULL;
> > +   if (dev->dev_type == RTE_ETH_DEV_PCI)
> > +   vq->offset = (uintptr_t)>buf_addr;
> > +#ifdef RTE_VIRTIO_VDEV
> > +   else {
> > +   vq->offset = (uintptr_t)>buf_physaddr;
> 
>  Not sure, but shouldn't these be swapped? Originally, for PCI devices, we 
> used buf_physaddr.

And this reply just servers as an example only :)

--yliu


[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Tetsuya Mukawa
On 2016/01/12 15:14, Yuanhan Liu wrote:
> On Tue, Jan 12, 2016 at 03:01:01PM +0900, Tetsuya Mukawa wrote:
>> On 2016/01/12 14:46, Tan, Jianfeng wrote:
>>> Hi Tetsuya,
>>>  
>>>
 Hi Jianfeng and Xie,

 I guess my implementation and yours have a lot of common code, so I will
 try to rebase my patch on yours.
>>> We also think so. And before you rebase your code, I think we can rely
>>> on Yuanhan's
>>> struct virtio_pci_ops to make the code structure brief and clear, as
>>> discussed in your
>>> patch's thread, i.e., we both rebase our code according to Yuanhan's
>>> code. Is that OK?
>>>
>> Yes, I agree with it.
> I will send v2 out today, and hopefully someone will ACK and test it
> soon.  After that, I'm also hoping Thomas could do a quick merge then.
>
>   --yliu
>

Hi Yuanhan,

Thanks, I will review and test it also.

Tetsuya


[dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2016-01-12 Thread Zhang, Helin
Hi Matthew

Yes, you have indicated out the key, the power management module has changed or 
upgraded.
Could you help to try the legacy one to see if it still works, as indicated in 
your link?

Taking control of the governor from kernel to user space, might need one more 
checks before that.
But it is actually not a big issue, as user can switch it back to anything via 
'echo'.

Yes, it seems that librte_power is out of date for a while. It is not easy to 
track all the kernel versions.
Now we have good chance to do that, as you have reported issues. Let's have a 
look on the new power management mechanism and then see if we can do something.

Really thanks to your questions!

Regards,
Helin

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall
> Sent: Sunday, January 3, 2016 3:51 PM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> 
> Hello,
> 
> In about one month, I never received any response about all these major
> issues I was finding with librte_power and the intel_pstate based CPU
> clockrate control driver used in all the new Linux kernels.
> 
>  From what I can tell, none of this librte_power code ever worked right in the
> first place on Sandy Bridge and newer, because the chip secretly ignores
> clockrate adjustments from outside.
> 
> Can anyone who is more expert about Intel Power Management please help
> me check this and point me to some documentation which explains how this
> is supposed to work?
> 
> I am kind of blocked on doing performance / production quality
> improvements on my code, without some kind of basic help understanding
> how this librte_power stuff should work.
> 
> Thanks,
> Matthew.
> 
> On 12/5/15 4:08 PM, Matthew Hall wrote:
> > Hello all,
> >
> > I wanted to ask some questions about librte_power and the great
> > adaptive polling / IRQ mode example in l3fwd-power.
> >
> > I am very interested in getting this to work in my project because it
> > will make it much friendlier to attract new community developers if I
> > am as cooperative as possible with system resources.
> >
> > Let's discuss the init process for a moment. It has some problems on
> > my system, and I need some help to figure out how to handle this right.
> >
> > 1. Begins with the call to rte_power_init.
> >
> > 2. Attempts to init ACPI cpufreq mode.
> >
> > 2.1. Sets lcore cpufreq governor to userspace mode.
> >
> > 2.2. Function power_get_available_freqs checks lcore CPU frequencies
> from:
> >
> > /sys/devices/system/cpu/cpuX/cpufreq/scaling_available_frequencies
> >
> > 2.3. This fails with (cryptic) error "POWER: ERR: File not openned". I
> > am planning to write a patch for this error a bit later.
> >
> > My kernel is using the intel_pstate driver, so
> > scaling_available_frequencies does not exist:
> >
> > http://askubuntu.com/questions/544266/why-are-missing-the-frequency-
> op
> > tions-on-cpufreq-utils-indicator
> >
> > 3. When power_get_available_freqs fails, rte_power_acpi_cpufreq_init
> fails.
> >
> > 4. rte_power_init will try rte_power_kvm_vm_init. That will fail
> > because it's a physical Skylake system not some kind of VM.
> >
> > 5. Now rte_power_init totally fails, with error "POWER: ERR: Unable to
> > set Power Management Environment for lcore 0".
> >
> > So, I have a couple of questions to figure out from here:
> >
> > 1. It seems bad to switch the governor into userspace before verifying
> > the frequencies available in scaling_available_frequencies. If there
> > are no frequencies available, it seems like it should not be trying to
> > take over control of an effectively uncontrollable value.
> >
> > 2. If the governor is switched to userspace, and then no governing is
> > done, it seems like the clockrate will necessarily always be wrong
> > also because nothing will be configuring it anymore, neither kernel,
> > nor failed DPDK userspace code, since rte_power_freq_up / down
> > function pointers will always be NULL. Is this true? This seems bad if so.
> >
> > It seems that the librte_power code is basically out of date, as
> > pstate has been present since Sandy Bridge, which is quite old by now
> > for network processing. I am not sure how to make this work right now.
> > So far I see a couple options but I really don't know much about this stuff:
> >
> > 1) skip rte_power_init completely, and let intel_pstate handle it
> > using HWP mode
> >
> > 2) disable intel_pstate, switch to the legacy ACPI cpufreq (but people
> > warned this old driver is mostly a no-op and the CPU ignores its frequency
> requests).
> >
> > The Internet advice says it's possible, but not a very good idea, to
> > switch from the modern intel_pstate driver to the legacy ACPI mode.
> > Reading through the kernel docs (below) state that it's better to use
> > HWP (Hardware P State)
> > mode:
> >
> > https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt
> >
> > If none of this rte_power_init stuff 

[dpdk-dev] [PATCH v3 09/12] virtio: vfio: Enable RTE_PCI_DRV_NEED_MAPPING flag in driver

2016-01-12 Thread Yuanhan Liu
On Sat, Jan 09, 2016 at 06:08:46PM +0530, Santosh Shukla wrote:
> On Thu, Jan 7, 2016 at 11:50 PM, Stephen Hemminger
>  wrote:
> > On Thu,  7 Jan 2016 22:03:06 +0530
> > Santosh Shukla  wrote:
> >
> >> +#ifdef RTE_EAL_VFIO
> >> + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | 
> >> RTE_PCI_DRV_DETACHABLE,
> >> +#else
> >>   .drv_flags = RTE_PCI_DRV_DETACHABLE,
> >> +#endif
> >
> > Since VFIO is determined at runtime not compile time, the flags should
> > be updated at runtime not compile time.
> >
> >
> In general, Yes, Its a wrong approach i..e. Wrapping __need_mapping
> flag only for vfio case. I am thinking to add vfio parser routine
> something similar to virtio_xxx_xx_uio_xx() / virtio_xx_xx_ioport()
> currently exist. This will remove RTE_EAL_VFIO ifdef clutter for this
> patch and [08/12] patch and also virtio pmd driver can then initialize
> device for vfio mode..
> 
> _but_ I still need _MAPPING flag enabled for in virtio driver as
> because for vfio case - I want vfio_xx_mmap() routine to create vfio
> container/group_id and then create vfio_dev_fd for each virtio-net-pci
> interface.

I'm thinking my following patch will help:

http://dpdk.org/dev/patchwork/patch/9814/

--yliu

> Let me know my approach aligned to your suggestion.


[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver

2016-01-12 Thread Yuanhan Liu
On Tue, Jan 12, 2016 at 02:58:57PM +0800, Yuanhan Liu wrote:
> v2: - fix a data corruption reported by Qian, due to hdr size mismatch.
>   check detailes at ptach 5.
> 
> - Add missing config_irq and isr reading support from v1.
> 
> - fix comments from v1.
> 
> Almost all difference comes from virtio 1.0 are the PCI layout change:
> the major configuration structures are stored at bar space, and their
> location is stored at corresponding pci cap structure. Reading/parsing
> them is one of the major work of patch 7.
> 
> To make handling virtio v1.0 and v0.95 co-exist well, this patch set
> introduces a virtio_pci_ops structure, to add another layer so that
> we could keep those vtpci_foo_bar "APIs". With that, we could do the
> minimum change to add virtio 1.0 support.

Oops, I just found that I missed a simple test guide here, as promised
before. And here it is:

Firstly, you need get a virtio 1.0 supported QEMU (say, v2.5), then add
option "disable-modern=false" to qemu virtio-net-pci device to enable
virtio 1.0 (which is disabled by default).

And if you see something like following from 'lspci -v', it means virtio
1.0 is indeed enabled:

00:04.0 Ethernet controller: Red Hat, Inc Virtio network device
Subsystem: Red Hat, Inc Device 0001 
Physical Slot: 4 
Flags: bus master, fast devsel, latency 0, IRQ 11 
I/O ports at c040 [size=64] 
Memory at febf1000 (32-bit, non-prefetchable) [size=4K] 
Memory at fe00 (64-bit, prefetchable) [size=8M] 
Expansion ROM at feb8 [disabled] [size=256K] 
Capabilities: [98] MSI-X: Enable+ Count=6 Masked- 
==> Capabilities: [84] Vendor Specific Information: Len=14  
==> Capabilities: [70] Vendor Specific Information: Len=14  
==> Capabilities: [60] Vendor Specific Information: Len=10  
==> Capabilities: [50] Vendor Specific Information: Len=10  
==> Capabilities: [40] Vendor Specific Information: Len=10  
Kernel driver in use: virtio-pci 
Kernel modules: virtio_pci

After that, there wasn't anything speical comparing to the old virtio
0.95 pmd driver.

--yliu
> 
> ---
> Yuanhan Liu (7):
>   virtio: don't set vring address again at queue startup
>   virtio: introduce struct virtio_pci_ops
>   virtio: move left pci stuff to virtio_pci.c
>   viritio: switch to 64 bit features
>   virtio: retrieve hdr_size from hw->vtnet_hdr_size
>   eal: pci: export pci_map_device
>   virtio: add 1.0 support
> 
>  doc/guides/rel_notes/release_2_3.rst|   3 +
>  drivers/net/virtio/virtio_ethdev.c  | 301 +-
>  drivers/net/virtio/virtio_ethdev.h  |   3 +-
>  drivers/net/virtio/virtio_pci.c | 768 
> +++-
>  drivers/net/virtio/virtio_pci.h | 102 +++-
>  drivers/net/virtio/virtio_rxtx.c|  21 +-
>  drivers/net/virtio/virtqueue.h  |   4 +-
>  lib/librte_eal/bsdapp/eal/eal_pci.c |   2 +-
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   6 +
>  lib/librte_eal/common/eal_common_pci.c  |   2 +-
>  lib/librte_eal/common/eal_private.h |  11 -
>  lib/librte_eal/common/include/rte_pci.h |  11 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   2 +-
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |   6 +
>  14 files changed, 899 insertions(+), 343 deletions(-)
> 
> -- 
> 1.9.0


[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> >   .repeated depends on CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS. By the way, 
> > looks like it does
> the same thing as you are trying to do with --single-file, but with 
> hugepages, doesn't it? I
> see it's currently used by ivshmem (which is AFAIK very immature and 
> half-abandoned).
> 
> Similar but not the same.
> --single-file: a single file for all mapped hugepages.
> SINGLE_FILE_SEGMENTS: a file per set of physically contiguous mapped
> hugepages (what DPDK calls memseg , memory segment). So there could be
> more than one file.

 Thank you for the explanation.

 By this time, i've done more testing. Current patchset breaks --no-huge. I did 
not study why:
--- cut ---
Program received signal SIGBUS, Bus error.
malloc_elem_init (elem=elem at entry=0x7fffe51e6000, heap=0x77fe5a1c, ms=ms 
at entry=0x77fb301c, size=size at entry=268435392) at 
/home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c:62
62  /home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c: No such file or 
directory.
Missing separate debuginfos, use: dnf debuginfo-install 
keyutils-libs-1.5.9-7.fc23.x86_64 krb5-libs-1.13.2-11.fc23.x86_64 
libcap-ng-0.7.7-2.fc23.x86_64 libcom_err-1.42.13-3.fc23.x86_64 
libselinux-2.4-4.fc23.x86_64 openssl-libs-1.0.2d-2.fc23.x86_64 
pcre-8.37-4.fc23.x86_64 zlib-1.2.8-9.fc23.x86_64
(gdb) where
#0  malloc_elem_init (elem=elem at entry=0x7fffe51e6000, heap=0x77fe5a1c, 
ms=ms at entry=0x77fb301c, size=size at entry=268435392)
at /home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c:62
#1  0x004a50b5 in malloc_heap_add_memseg (ms=0x77fb301c, 
heap=) at 
/home/p.fedin/dpdk/lib/librte_eal/common/malloc_heap.c:109
#2  rte_eal_malloc_heap_init () at 
/home/p.fedin/dpdk/lib/librte_eal/common/malloc_heap.c:232
#3  0x004be896 in rte_eal_memzone_init () at 
/home/p.fedin/dpdk/lib/librte_eal/common/eal_common_memzone.c:427
#4  0x0042ab02 in rte_eal_init (argc=argc at entry=11, argv=argv at 
entry=0x7fffeb80) at 
/home/p.fedin/dpdk/lib/librte_eal/linuxapp/eal/eal.c:799
#5  0x0066dfb9 in dpdk_init (argc=11, argv=0x7fffeb80) at 
lib/netdev-dpdk.c:2192
#6  0x0040ddd9 in main (argc=12, argv=0x7fffeb78) at 
vswitchd/ovs-vswitchd.c:74
--- cut ---

 And now i tend to think that we do not need --single-file at all. Because:
a) It's just a temporary workaround for "more than 8 regions" problem.
b) It's not compatible with physical hardware anyway.

 So i think that we could easily use "--no-huge --shared-mem" combination. We 
could address hugepages compatibility problem later.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Tetsuya Mukawa
On 2016/01/12 14:46, Tan, Jianfeng wrote:
>
> Hi Tetsuya,
>  
>
>> Hi Jianfeng and Xie,
>>
>> I guess my implementation and yours have a lot of common code, so I will
>> try to rebase my patch on yours.
>
> We also think so. And before you rebase your code, I think we can rely
> on Yuanhan's
> struct virtio_pci_ops to make the code structure brief and clear, as
> discussed in your
> patch's thread, i.e., we both rebase our code according to Yuanhan's
> code. Is that OK?
>

Yes, I agree with it.

Thanks,
Tetsuya

>
>>
>> BTW, one thing I need to change your memory allocation way is that
>> mmaped address should be under 44bit(32 + PAGE_SHIFT) to work with my
>> patch.
>> This is because VIRTIO_PCI_QUEUE_PFN register only accepts such address.
>> (I may need to add one more EAL parameter like "--mmap-under ")
>
> It makes sense.
>
> Thanks,
> Jianfeng
>
>>
>> Thanks,
>> Tetsuya
>



[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-12 Thread Yuanhan Liu
Modern (v1.0) virtio pci device defines several pci capabilities.
Each cap has a configure structure corresponding to it, and the
cap.bar and cap.offset fields tell us where to find it.

Firstly, we map the pci resources by rte_eal_pci_map_device().
We then could easily locate to a cfg structure by:

cfg_addr = dev->mem_resources[cap.bar].addr + cap.offset;

Therefore, the entrance of enabling modern (v1.0) pci device support
is to iterate the pci capability lists, and to locate some configs
we care; and they are:

- common cfg

  For generic virtio and virtuqueu configuration, such as setting/getting
  features, enabling a specific queue, and so on.

- nofity cfg

  Combining with `queue_notify_off' from common cfg, we could use it to
  notify a specific virt queue.

- device cfg

  Where virtio_net_config structure locates.

- isr cfg

  Where to read isr (interrupt status).

If any of above cap is not found, we fallback to the legacy virtio
handling.

If succeed, hw->vtpci_ops is assigned to modern_ops, where all
operations are implemented by reading/writing a (or few) specific
configuration space from above 4 cfg structures. And that's basically
how this patch works.

Besides those changes, virtio 1.0 introduces a new status field:
FEATURES_OK, which is set after features negotiation is done.

Last, set the VIRTIO_F_VERSION_1 feature flag.

Signed-off-by: Yuanhan Liu 
---

v2: - re-read status after setting FEATURES_OK to make sure status is
  set correctly.

- Add isr reading and config irq setting support.

- Define some pci macro on our own to not get the dependency of
  linux/pci_regs.h, as there should be no such file at non-Linux
  platform
---
 doc/guides/rel_notes/release_2_3.rst |   3 +
 drivers/net/virtio/virtio_ethdev.c   |  24 ++-
 drivers/net/virtio/virtio_ethdev.h   |   3 +-
 drivers/net/virtio/virtio_pci.c  | 335 ++-
 drivers/net/virtio/virtio_pci.h  |  67 +++
 drivers/net/virtio/virtqueue.h   |   2 +
 6 files changed, 429 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..c390d97 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,9 @@ DPDK Release 2.3
 New Features
 

+* **Virtio 1.0 support.**
+
+  Enabled virtio 1.0 support for virtio pmd driver.

 Resolved Issues
 ---
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 94e0c4a..1afaba4 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -927,7 +927,7 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
return virtio_send_command(hw->cvq, , , 1);
 }

-static void
+static int
 virtio_negotiate_features(struct virtio_hw *hw)
 {
uint64_t host_features;
@@ -949,6 +949,22 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features = vtpci_negotiate_features(hw, host_features);
PMD_INIT_LOG(DEBUG, "features after negotiate = %"PRIx64,
hw->guest_features);
+
+   if (hw->modern) {
+   if (!vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) {
+   PMD_INIT_LOG(ERR,
+   "VIRTIO_F_VERSION_1 features is not enabled.");
+   return -1;
+   }
+   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_FEATURES_OK);
+   if (!(vtpci_get_status(hw) & VIRTIO_CONFIG_STATUS_FEATURES_OK)) 
{
+   PMD_INIT_LOG(ERR,
+   "failed to set FEATURES_OK status!");
+   return -1;
+   }
+   }
+
+   return 0;
 }

 /*
@@ -1032,7 +1048,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

/* Tell the host we've known how to drive the device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-   virtio_negotiate_features(hw);
+   if (virtio_negotiate_features(hw) < 0)
+   return -1;

/* If host does not support status then disable LSC */
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
@@ -1043,7 +1060,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
rx_func_get(eth_dev);

/* Setting up rx_header size for the device */
-   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
+   vtpci_with_feature(hw, VIRTIO_F_VERSION_1))
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
else
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index ae2d47d..fed9571 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -64,7 +64,8 @@
 1u << VIRTIO_NET_F_CTRL_VQ   | \
 1u 

[dpdk-dev] [PATCH v2 6/7] eal: pci: export pci_map_device

2016-01-12 Thread Yuanhan Liu
Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
invoke pci_map_device internally for us. From that point view, there
is no need to export pci_map_device.

However, for virtio pmd driver, which is designed to work without
binding UIO (or something similar first), pci_map_device() will fail,
which ends up with virtio pmd driver being skipped. Therefore, we can
not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.

Therefore, this patch exports pci_map_device, and let virtio pmd
call it when necessary.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |  2 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  6 ++
 lib/librte_eal/common/eal_common_pci.c  |  2 +-
 lib/librte_eal/common/eal_private.h | 11 ---
 lib/librte_eal/common/include/rte_pci.h | 11 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c   |  2 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  6 ++
 7 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 6c21fbd..adb0915 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -93,7 +93,7 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)

 /* Map pci device */
 int
-pci_map_device(struct rte_pci_device *dev)
+rte_eal_pci_map_device(struct rte_pci_device *dev)
 {
int ret = -1;

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..b166c3c 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,9 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_pci_map_device;
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index dcfe947..486d921 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -188,7 +188,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
pci_config_space_set(dev);
 #endif
/* map resources for devices that use igb_uio */
-   ret = pci_map_device(dev);
+   ret = rte_eal_pci_map_device(dev);
if (ret != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 072e672..ae710b7 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -165,17 +165,6 @@ struct rte_pci_device;
 int pci_unbind_kernel_driver(struct rte_pci_device *dev);

 /**
- * Map this device
- *
- * This function is private to EAL.
- *
- * @return
- *   0 on success, negative on error and positive if no driver
- *   is found for the device.
- */
-int pci_map_device(struct rte_pci_device *dev);
-
-/**
  * Unmap this device
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 334c12e..e9e1725 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -485,6 +485,17 @@ int rte_eal_pci_read_config(const struct rte_pci_device 
*device,
  */
 int rte_eal_pci_write_config(const struct rte_pci_device *device,
 const void *buf, size_t len, off_t offset);
+/**
+ * Map this device
+ *
+ * This function is private to EAL.
+ *
+ * @return
+ *   0 on success, negative on error and positive if no driver
+ *   is found for the device.
+ */
+int rte_eal_pci_map_device(struct rte_pci_device *dev);
+

 #ifdef RTE_PCI_CONFIG
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index bc5b5be..a8cef37 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -124,7 +124,7 @@ pci_get_kernel_driver_by_path(const char *filename, char 
*dri_name)

 /* Map pci device */
 int
-pci_map_device(struct rte_pci_device *dev)
+rte_eal_pci_map_device(struct rte_pci_device *dev)
 {
int ret = -1;

diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index cbe175f..7b12282 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -138,3 +138,9 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_pci_map_device;
+} DPDK_2.2;
-- 
1.9.0



[dpdk-dev] [PATCH v2 5/7] virtio: retrieve hdr_size from hw->vtnet_hdr_size

2016-01-12 Thread Yuanhan Liu
The mergeable virtio net hdr format has been the standard and the
only virtio net hdr format since virtio 1.0. Therefore, we could
not hardcode hdr_size to "sizeof(struct virtio_net_hdr)" any more
at virtio_recv_pkts(), otherwise, there would be a mismatch of
hdr size from rte_vhost_enqueue_burst() and virtio_recv_pkts(),
leading a packet corruption.

Instead, we should retrieve it from hw->vtnet_hdr_size; we will
do proper settings at eth_virtio_dev_init().

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_rxtx.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index b7267c0..41a1366 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -560,7 +560,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
int error;
uint32_t i, nb_enqueued;
-   const uint32_t hdr_size = sizeof(struct virtio_net_hdr);
+   uint32_t hdr_size;

nb_used = VIRTQUEUE_NUSED(rxvq);

@@ -580,6 +580,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
hw = rxvq->hw;
nb_rx = 0;
nb_enqueued = 0;
+   hdr_size = hw->vtnet_hdr_size;

for (i = 0; i < num ; i++) {
rxm = rcv_pkts[i];
@@ -664,7 +665,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
uint32_t seg_num;
uint16_t extra_idx;
uint32_t seg_res;
-   const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   uint32_t hdr_size;

nb_used = VIRTQUEUE_NUSED(rxvq);

@@ -682,6 +683,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
seg_num = 0;
extra_idx = 0;
seg_res = 0;
+   hdr_size = hw->vtnet_hdr_size;

while (i < nb_used) {
struct virtio_net_hdr_mrg_rxbuf *header;
-- 
1.9.0



[dpdk-dev] [PATCH v2 4/7] viritio: switch to 64 bit features

2016-01-12 Thread Yuanhan Liu
Switch to 64 bit features, which virtio 1.0 supports.

While legacy virtio only supports 32 bit features, it complains aloud
and quit when trying to setting > 32 bit features.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c |  8 
 drivers/net/virtio/virtio_pci.c| 15 ++-
 drivers/net/virtio/virtio_pci.h| 12 ++--
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b57224d..94e0c4a 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -930,16 +930,16 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
 static void
 virtio_negotiate_features(struct virtio_hw *hw)
 {
-   uint32_t host_features;
+   uint64_t host_features;

/* Prepare guest_features: feature that driver wants to support */
hw->guest_features = VIRTIO_PMD_GUEST_FEATURES;
-   PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %"PRIx64,
hw->guest_features);

/* Read device(host) feature bits */
host_features = hw->vtpci_ops->get_features(hw);
-   PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "host_features before negotiate = %"PRIx64,
host_features);

/*
@@ -947,7 +947,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 * guest feature bits.
 */
hw->guest_features = vtpci_negotiate_features(hw, host_features);
-   PMD_INIT_LOG(DEBUG, "features after negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "features after negotiate = %"PRIx64,
hw->guest_features);
 }

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 03d623b..5eed57e 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -87,15 +87,20 @@ legacy_write_dev_config(struct virtio_hw *hw, uint64_t 
offset,
}
 }

-static uint32_t
+static uint64_t
 legacy_get_features(struct virtio_hw *hw)
 {
return VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
 }

 static void
-legacy_set_features(struct virtio_hw *hw, uint32_t features)
+legacy_set_features(struct virtio_hw *hw, uint64_t features)
 {
+   if ((features >> 32) != 0) {
+   PMD_DRV_LOG(ERR,
+   "only 32 bit features are allowed for legacy virtio!");
+   return;
+   }
VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_GUEST_FEATURES, features);
 }

@@ -451,10 +456,10 @@ vtpci_write_dev_config(struct virtio_hw *hw, uint64_t 
offset,
hw->vtpci_ops->write_dev_cfg(hw, offset, src, length);
 }

-uint32_t
-vtpci_negotiate_features(struct virtio_hw *hw, uint32_t host_features)
+uint64_t
+vtpci_negotiate_features(struct virtio_hw *hw, uint64_t host_features)
 {
-   uint32_t features;
+   uint64_t features;

/*
 * Limit negotiated features to what the driver, virtqueue, and
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index ee7d265..3fd86f6 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -175,8 +175,8 @@ struct virtio_pci_ops {
uint8_t (*get_status)(struct virtio_hw *hw);
void(*set_status)(struct virtio_hw *hw, uint8_t status);

-   uint32_t (*get_features)(struct virtio_hw *hw);
-   void (*set_features)(struct virtio_hw *hw, uint32_t features);
+   uint64_t (*get_features)(struct virtio_hw *hw);
+   void (*set_features)(struct virtio_hw *hw, uint64_t features);

uint8_t (*get_isr)(struct virtio_hw *hw);

@@ -191,7 +191,7 @@ struct virtio_pci_ops {
 struct virtio_hw {
struct virtqueue *cvq;
uint32_tio_base;
-   uint32_tguest_features;
+   uint64_tguest_features;
uint32_tmax_tx_queues;
uint32_tmax_rx_queues;
uint16_tvtnet_hdr_size;
@@ -271,9 +271,9 @@ outl_p(unsigned int data, unsigned int port)
outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg

 static inline int
-vtpci_with_feature(struct virtio_hw *hw, uint32_t bit)
+vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
 {
-   return (hw->guest_features & (1u << bit)) != 0;
+   return (hw->guest_features & (1ULL << bit)) != 0;
 }

 /*
@@ -286,7 +286,7 @@ void vtpci_reinit_complete(struct virtio_hw *);

 void vtpci_set_status(struct virtio_hw *, uint8_t);

-uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
+uint64_t vtpci_negotiate_features(struct virtio_hw *, uint64_t);

 void vtpci_write_dev_config(struct virtio_hw *, uint64_t, void *, int);

-- 
1.9.0



[dpdk-dev] [PATCH v2 3/7] virtio: move left pci stuff to virtio_pci.c

2016-01-12 Thread Yuanhan Liu
virtio_pci.c is a more proper place for pci stuff; virtio_ethdev is not.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 265 +---
 drivers/net/virtio/virtio_pci.c| 270 -
 2 files changed, 270 insertions(+), 265 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 6c1d3a0..b57224d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -36,10 +36,6 @@
 #include 
 #include 
 #include 
-#ifdef RTE_EXEC_ENV_LINUXAPP
-#include 
-#include 
-#endif

 #include 
 #include 
@@ -955,260 +951,6 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features);
 }

-#ifdef RTE_EXEC_ENV_LINUXAPP
-static int
-parse_sysfs_value(const char *filename, unsigned long *val)
-{
-   FILE *f;
-   char buf[BUFSIZ];
-   char *end = NULL;
-
-   f = fopen(filename, "r");
-   if (f == NULL) {
-   PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s",
-__func__, filename);
-   return -1;
-   }
-
-   if (fgets(buf, sizeof(buf), f) == NULL) {
-   PMD_INIT_LOG(ERR, "%s(): cannot read sysfs value %s",
-__func__, filename);
-   fclose(f);
-   return -1;
-   }
-   *val = strtoul(buf, , 0);
-   if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-   PMD_INIT_LOG(ERR, "%s(): cannot parse sysfs value %s",
-__func__, filename);
-   fclose(f);
-   return -1;
-   }
-   fclose(f);
-   return 0;
-}
-
-static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int 
buflen,
-   unsigned int *uio_num)
-{
-   struct dirent *e;
-   DIR *dir;
-   char dirname[PATH_MAX];
-
-   /* depending on kernel version, uio can be located in uio/uioX
-* or uio:uioX */
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-loc->domain, loc->bus, loc->devid, loc->function);
-   dir = opendir(dirname);
-   if (dir == NULL) {
-   /* retry with the parent directory */
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-loc->domain, loc->bus, loc->devid, loc->function);
-   dir = opendir(dirname);
-
-   if (dir == NULL) {
-   PMD_INIT_LOG(ERR, "Cannot opendir %s", dirname);
-   return -1;
-   }
-   }
-
-   /* take the first file starting with "uio" */
-   while ((e = readdir(dir)) != NULL) {
-   /* format could be uio%d ...*/
-   int shortprefix_len = sizeof("uio") - 1;
-   /* ... or uio:uio%d */
-   int longprefix_len = sizeof("uio:uio") - 1;
-   char *endptr;
-
-   if (strncmp(e->d_name, "uio", 3) != 0)
-   continue;
-
-   /* first try uio%d */
-   errno = 0;
-   *uio_num = strtoull(e->d_name + shortprefix_len, , 10);
-   if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-   snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num);
-   break;
-   }
-
-   /* then try uio:uio%d */
-   errno = 0;
-   *uio_num = strtoull(e->d_name + longprefix_len, , 10);
-   if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-   snprintf(buf, buflen, "%s/uio:uio%u", dirname,
-*uio_num);
-   break;
-   }
-   }
-   closedir(dir);
-
-   /* No uio resource found */
-   if (e == NULL) {
-   PMD_INIT_LOG(ERR, "Could not find uio resource");
-   return -1;
-   }
-
-   return 0;
-}
-
-static int
-virtio_has_msix(const struct rte_pci_addr *loc)
-{
-   DIR *d;
-   char dirname[PATH_MAX];
-
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
-loc->domain, loc->bus, loc->devid, loc->function);
-
-   d = opendir(dirname);
-   if (d)
-   closedir(d);
-
-   return (d != NULL);
-}
-
-/* Extract I/O port numbers from sysfs */
-static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev)
-{
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   unsigned long start, size;
-   unsigned int uio_num;
-
-   if (get_uio_dev(_dev->addr, dirname, sizeof(dirname), _num) < 0)
-   return -1;
-
-   /* get portio size */
-   snprintf(filename, sizeof(filename),
-"%s/portio/port0/size", dirname);
-   

[dpdk-dev] [PATCH v2 2/7] virtio: introduce struct virtio_pci_ops

2016-01-12 Thread Yuanhan Liu
Introduce struct virtio_pci_ops, to let legacy virtio (v0.95) and
modern virtio (1.0) have different implementation regarding to a
specific pci action, such as read host status.

With that, this patch reimplements all exported pci functions, in
a way like:

vtpci_foo_bar(struct virtio_hw *hw)
{
hw->vtpci_ops->foo_bar(hw);
}

So that we need pay attention to those pci related functions only
while adding virtio 1.0 support.

This patch introduced a new vtpci function, vtpci_init(), to do
proper virtio pci settings. It's pretty simple so far: just sets
hw->vtpci_ops to legacy_ops as we don't support 1.0 yet.

Signed-off-by: Yuanhan Liu 
---

v2: extra whitespace line removing, and comment on "reading status
after reset".

rename the badly taken op name "set_irq" to "set_config_irq".
---
 drivers/net/virtio/virtio_ethdev.c |  22 ++
 drivers/net/virtio/virtio_pci.c| 158 ++---
 drivers/net/virtio/virtio_pci.h|  27 +++
 drivers/net/virtio/virtqueue.h |   2 +-
 4 files changed, 166 insertions(+), 43 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d928339..6c1d3a0 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -272,9 +272,7 @@ virtio_dev_queue_release(struct virtqueue *vq) {

if (vq) {
hw = vq->hw;
-   /* Select and deactivate the queue */
-   VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, 
vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
+   hw->vtpci_ops->del_queue(hw, vq);

rte_free(vq->sw_ring);
rte_free(vq);
@@ -295,15 +293,13 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
struct virtio_hw *hw = dev->data->dev_private;
struct virtqueue *vq = NULL;

-   /* Write the virtqueue index to the Queue Select Field */
-   VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vtpci_queue_idx);
-   PMD_INIT_LOG(DEBUG, "selecting queue: %u", vtpci_queue_idx);
+   PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx);

/*
 * Read the virtqueue size from the Queue Size field
 * Always power of 2 and if 0 virtqueue does not exist
 */
-   vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
+   vq_size = hw->vtpci_ops->get_queue_num(hw, vtpci_queue_idx);
PMD_INIT_LOG(DEBUG, "vq_size: %u nb_desc:%u", vq_size, nb_desc);
if (vq_size == 0) {
PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
@@ -436,12 +432,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

-   /*
-* Set guest physical address of the virtqueue
-* in VIRTIO_PCI_QUEUE_PFN config register of device
-*/
-   VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN,
-   mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+   hw->vtpci_ops->setup_queue(hw, vq);
+
*pvq = vq;
return 0;
 }
@@ -950,7 +942,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features);

/* Read device(host) feature bits */
-   host_features = VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
+   host_features = hw->vtpci_ops->get_features(hw);
PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
host_features);

@@ -1287,6 +1279,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

pci_dev = eth_dev->pci_dev;

+   vtpci_init(pci_dev, hw);
+
if (virtio_resource_init(pci_dev) < 0)
return -1;

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 2245bec..9930efa 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -34,12 +34,11 @@

 #include "virtio_pci.h"
 #include "virtio_logs.h"
+#include "virtqueue.h"

-static uint8_t vtpci_get_status(struct virtio_hw *);
-
-void
-vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
-   void *dst, int length)
+static void
+legacy_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+  void *dst, int length)
 {
uint64_t off;
uint8_t *d;
@@ -60,9 +59,9 @@ vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
}
 }

-void
-vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset,
-   void *src, int length)
+static void
+legacy_write_dev_config(struct virtio_hw *hw, uint64_t offset,
+   void *src, int length)
 {
uint64_t off;
uint8_t *s;
@@ -83,30 +82,133 @@ vtpci_write_dev_config(struct virtio_hw *hw, uint64_t 
offset,
}
 }

+static uint32_t
+legacy_get_features(struct virtio_hw *hw)
+{
+   return VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
+}
+
+static void

[dpdk-dev] [PATCH v2 1/7] virtio: don't set vring address again at queue startup

2016-01-12 Thread Yuanhan Liu
As we have already set up it at virtio_dev_queue_setup(), and a vq
restart will not reset the settings.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_rxtx.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 74b39ef..b7267c0 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -339,11 +339,6 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
vq_update_avail_idx(vq);

PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
-
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
} else if (queue_type == VTNET_TQ) {
if (use_simple_rxtx) {
int mid_idx  = vq->vq_nentries >> 1;
@@ -362,16 +357,6 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
for (i = mid_idx; i < vq->vq_nentries; i++)
vq->vq_ring.avail->ring[i] = i;
}
-
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-   } else {
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
}
 }

-- 
1.9.0



[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver

2016-01-12 Thread Yuanhan Liu
v2: - fix a data corruption reported by Qian, due to hdr size mismatch.
  check detailes at ptach 5.

- Add missing config_irq and isr reading support from v1.

- fix comments from v1.

Almost all difference comes from virtio 1.0 are the PCI layout change:
the major configuration structures are stored at bar space, and their
location is stored at corresponding pci cap structure. Reading/parsing
them is one of the major work of patch 7.

To make handling virtio v1.0 and v0.95 co-exist well, this patch set
introduces a virtio_pci_ops structure, to add another layer so that
we could keep those vtpci_foo_bar "APIs". With that, we could do the
minimum change to add virtio 1.0 support.


---
Yuanhan Liu (7):
  virtio: don't set vring address again at queue startup
  virtio: introduce struct virtio_pci_ops
  virtio: move left pci stuff to virtio_pci.c
  viritio: switch to 64 bit features
  virtio: retrieve hdr_size from hw->vtnet_hdr_size
  eal: pci: export pci_map_device
  virtio: add 1.0 support

 doc/guides/rel_notes/release_2_3.rst|   3 +
 drivers/net/virtio/virtio_ethdev.c  | 301 +-
 drivers/net/virtio/virtio_ethdev.h  |   3 +-
 drivers/net/virtio/virtio_pci.c | 768 +++-
 drivers/net/virtio/virtio_pci.h | 102 +++-
 drivers/net/virtio/virtio_rxtx.c|  21 +-
 drivers/net/virtio/virtqueue.h  |   4 +-
 lib/librte_eal/bsdapp/eal/eal_pci.c |   2 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   6 +
 lib/librte_eal/common/eal_common_pci.c  |   2 +-
 lib/librte_eal/common/eal_private.h |  11 -
 lib/librte_eal/common/include/rte_pci.h |  11 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   2 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   6 +
 14 files changed, 899 insertions(+), 343 deletions(-)

-- 
1.9.0



[dpdk-dev] [PATCH v2 1/3] cmdline: increase command line buffer

2016-01-12 Thread Panu Matilainen
On 01/12/2016 12:49 PM, Nelio Laranjeiro wrote:
> Allow long command lines in testpmd (like flow director with IPv6, ...).
>
> Signed-off-by: John McNamara 
> Signed-off-by: Nelio Laranjeiro 
> ---
>   doc/guides/rel_notes/deprecation.rst | 5 -
>   lib/librte_cmdline/cmdline_rdline.h  | 2 +-
>   2 files changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index e94d4a2..9cb288c 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -44,8 +44,3 @@ Deprecation Notices
> and table action handlers will be updated:
> the pipeline parameter will be added, the packets mask parameter will be
> either removed (for input port action handler) or made input-only.
> -
> -* ABI changes are planned in cmdline buffer size to allow the use of long
> -  commands (such as RETA update in testpmd).  This should impact
> -  CMDLINE_PARSE_RESULT_BUFSIZE, STR_TOKEN_SIZE and RDLINE_BUF_SIZE.
> -  It should be integrated in release 2.3.
> diff --git a/lib/librte_cmdline/cmdline_rdline.h 
> b/lib/librte_cmdline/cmdline_rdline.h
> index b9aad9b..72e2dad 100644
> --- a/lib/librte_cmdline/cmdline_rdline.h
> +++ b/lib/librte_cmdline/cmdline_rdline.h
> @@ -93,7 +93,7 @@ extern "C" {
>   #endif
>
>   /* configuration */
> -#define RDLINE_BUF_SIZE 256
> +#define RDLINE_BUF_SIZE 512
>   #define RDLINE_PROMPT_SIZE  32
>   #define RDLINE_VT100_BUF_SIZE  8
>   #define RDLINE_HISTORY_BUF_SIZE BUFSIZ

Having to break a library ABI for a change like this is a bit ridiculous.

I didn't try it so could be wrong, but based on a quick look, struct 
rdline could easily be made opaque to consumers by just adding functions 
for allocating and freeing it.

- Panu -



[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> So are you suggesting to not introduce --single-file option but instead
> --shared-mem?
> AFAIK --single-file was trying to workaround the limitation of just
> being able to map 8 fds.

 Heh, yes, you're right... Indeed, sorry, i was not patient enough, i see it 
uses hpi->hugedir instead of using /dev/shm... I was confused by the code 
path... It seemed that --single-file is an alias to --no-hugepages.
 And the patch still changes mmap() mode to SHARED unconditionally, which is 
not good in terms of backwards compability (and this is explicitly noticed in 
the cover letter).

 So, let's try to sort out...
 a) By default we should still have MAP_PRIVATE
 b) Let's say that we need --shared-mem in order to make it MAP_SHARED. This 
can be combined with --no-hugepages if necessary (this is what i tried to 
implement based on the old RFC).
 c) Let's say that --single-file uses hugetlbfs but maps everything via single 
file. This still can be combined with --shared-mem.

 wouldn't this be more clear, more straightforward and implication-free?

 And if we agree on that, we could now try to decrease number of options:
 a) We could imply MAP_SHARED if cvio is used, because shared memory is 
mandatory in this case.
 b) (c) above again raises a question: doesn't it make 
CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS obsolete? Or may be we could use that one 
instead of --single-file (however i'm not a fan of compile-time configuration 
like this)?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Tetsuya Mukawa
On 2016/01/10 20:42, Jianfeng Tan wrote:
> This patchset is to provide high performance networking interface (virtio)
> for container-based DPDK applications. The way of starting DPDK apps in
> containers with ownership of NIC devices exclusively is beyond the scope.
> The basic idea here is to present a new virtual device (named eth_cvio),
> which can be discovered and initialized in container-based DPDK apps using
> rte_eal_init(). To minimize the change, we reuse already-existing virtio
> frontend driver code (driver/net/virtio/).
>  
> Compared to QEMU/VM case, virtio device framework (translates I/O port r/w
> operations into unix socket/cuse protocol, which is originally provided in
> QEMU), is integrated in virtio frontend driver. So this converged driver
> actually plays the role of original frontend driver and the role of QEMU
> device framework.
>  
> The major difference lies in how to calculate relative address for vhost.
> The principle of virtio is that: based on one or multiple shared memory
> segments, vhost maintains a reference system with the base addresses and
> length for each segment so that an address from VM comes (usually GPA,
> Guest Physical Address) can be translated into vhost-recognizable address
> (named VVA, Vhost Virtual Address). To decrease the overhead of address
> translation, we should maintain as few segments as possible. In VM's case,
> GPA is always locally continuous. In container's case, CVA (Container
> Virtual Address) can be used. Specifically:
> a. when set_base_addr, CVA address is used;
> b. when preparing RX's descriptors, CVA address is used;
> c. when transmitting packets, CVA is filled in TX's descriptors;
> d. in TX and CQ's header, CVA is used.
>  
> How to share memory? In VM's case, qemu always shares all physical layout
> to backend. But it's not feasible for a container, as a process, to share
> all virtual memory regions to backend. So only specified virtual memory
> regions (with type of shared) are sent to backend. It's a limitation that
> only addresses in these areas can be used to transmit or receive packets.
>
> Known issues
>
> a. When used with vhost-net, root privilege is required to create tap
> device inside.
> b. Control queue and multi-queue are not supported yet.
> c. When --single-file option is used, socket_id of the memory may be
> wrong. (Use "numactl -N x -m x" to work around this for now)
>  
> How to use?
>
> a. Apply this patchset.
>
> b. To compile container apps:
> $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
>
> c. To build a docker image using Dockerfile below.
> $: cat ./Dockerfile
> FROM ubuntu:latest
> WORKDIR /usr/src/dpdk
> COPY . /usr/src/dpdk
> ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/"
> $: docker build -t dpdk-app-l2fwd .
>
> d. Used with vhost-user
> $: ./examples/vhost/build/vhost-switch -c 3 -n 4 \
>   --socket-mem 1024,1024 -- -p 0x1 --stats 1
> $: docker run -i -t -v :/var/run/usvhost \
>   -v /dev/hugepages:/dev/hugepages \
>   dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \
>   --vdev=eth_cvio0,path=/var/run/usvhost -- -p 0x1
>
> f. Used with vhost-net
> $: modprobe vhost
> $: modprobe vhost-net
> $: docker run -i -t --privileged \
>   -v /dev/vhost-net:/dev/vhost-net \
>   -v /dev/net/tun:/dev/net/tun \
>   -v /dev/hugepages:/dev/hugepages \
>   dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \
>   --vdev=eth_cvio0,path=/dev/vhost-net -- -p 0x1
>
> By the way, it's not necessary to run in a container.
>
> Signed-off-by: Huawei Xie 
> Signed-off-by: Jianfeng Tan 
>
> Jianfeng Tan (4):
>   mem: add --single-file to create single mem-backed file
>   mem: add API to obstain memory-backed file info
>   virtio/vdev: add ways to interact with vhost
>   virtio/vdev: add a new vdev named eth_cvio
>
>  config/common_linuxapp |   5 +
>  drivers/net/virtio/Makefile|   4 +
>  drivers/net/virtio/vhost.c | 734 
> +
>  drivers/net/virtio/vhost.h | 192 
>  drivers/net/virtio/virtio_ethdev.c | 338 ++---
>  drivers/net/virtio/virtio_ethdev.h |   4 +
>  drivers/net/virtio/virtio_pci.h|  52 +-
>  drivers/net/virtio/virtio_rxtx.c   |  11 +-
>  drivers/net/virtio/virtio_rxtx_simple.c|  14 +-
>  drivers/net/virtio/virtqueue.h |  13 +-
>  lib/librte_eal/common/eal_common_options.c |  17 +
>  lib/librte_eal/common/eal_internal_cfg.h   |   1 +
>  lib/librte_eal/common/eal_options.h|   2 +
>  lib/librte_eal/common/include/rte_memory.h |  16 +
>  lib/librte_eal/linuxapp/eal/eal_memory.c   |  82 +++-
>  15 files changed, 1392 insertions(+), 93 deletions(-)
>  create mode 100644 

[dpdk-dev] [PATCH] i40e: fix VLAN bitmasks for hash/fdir input sets for tunnels

2016-01-12 Thread Andrey Chilikin
This patch adds missing VLAN bitmask for inner frame in case of
tunneling and fixes VLAN tags bitmasks for single or outer frame
in case of tunneling.

Signed-off-by: Andrey Chilikin 
---
 drivers/net/i40e/i40e_ethdev.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bf6220d..453276f 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -206,10 +206,12 @@
 #define I40E_REG_INSET_L2_DMAC   0xE000ULL
 /* Source MAC address */
 #define I40E_REG_INSET_L2_SMAC   0x1C00ULL
-/* VLAN tag in the outer L2 header */
-#define I40E_REG_INSET_L2_OUTER_VLAN 0x0080ULL
-/* VLAN tag in the inner L2 header */
-#define I40E_REG_INSET_L2_INNER_VLAN 0x0100ULL
+/* Outer (S-Tag) VLAN tag in the outer L2 header */
+#define I40E_REG_INSET_L2_OUTER_VLAN 0x0200ULL
+/* Inner (C-Tag) or single VLAN tag in the outer L2 header */
+#define I40E_REG_INSET_L2_INNER_VLAN 0x0080ULL
+/* Single VLAN tag in the inner L2 header */
+#define I40E_REG_INSET_TUNNEL_VLAN   0x0100ULL
 /* Source IPv4 address */
 #define I40E_REG_INSET_L3_SRC_IP40x00018000ULL
 /* Destination IPv4 address */
@@ -6777,7 +6779,7 @@ i40e_translate_input_set_reg(uint64_t input)
{I40E_INSET_SRC_PORT, I40E_REG_INSET_L4_SRC_PORT},
{I40E_INSET_DST_PORT, I40E_REG_INSET_L4_DST_PORT},
{I40E_INSET_SCTP_VT, I40E_REG_INSET_L4_SCTP_VERIFICATION_TAG},
-   {I40E_INSET_TUNNEL_ID, I40E_REG_INSET_TUNNEL_ID},
+   {I40E_INSET_VLAN_TUNNEL, I40E_REG_INSET_TUNNEL_VLAN},
{I40E_INSET_TUNNEL_DMAC,
I40E_REG_INSET_TUNNEL_L2_INNER_DST_MAC},
{I40E_INSET_TUNNEL_IPV4_DST, I40E_REG_INSET_TUNNEL_L3_DST_IP4},
-- 
1.7.4.1



[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Yuanhan Liu
On Tue, Jan 12, 2016 at 03:26:47PM +0900, Tetsuya Mukawa wrote:
> On 2016/01/12 15:14, Yuanhan Liu wrote:
> > On Tue, Jan 12, 2016 at 03:01:01PM +0900, Tetsuya Mukawa wrote:
> >> On 2016/01/12 14:46, Tan, Jianfeng wrote:
> >>> Hi Tetsuya,
> >>>  
> >>>
>  Hi Jianfeng and Xie,
> 
>  I guess my implementation and yours have a lot of common code, so I will
>  try to rebase my patch on yours.
> >>> We also think so. And before you rebase your code, I think we can rely
> >>> on Yuanhan's
> >>> struct virtio_pci_ops to make the code structure brief and clear, as
> >>> discussed in your
> >>> patch's thread, i.e., we both rebase our code according to Yuanhan's
> >>> code. Is that OK?
> >>>
> >> Yes, I agree with it.
> > I will send v2 out today, and hopefully someone will ACK and test it
> > soon.  After that, I'm also hoping Thomas could do a quick merge then.
> >
> > --yliu
> >
> 
> Hi Yuanhan,
> 
> Thanks, I will review and test it also.

Appreciate that!

--yliu


[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> Oh I get it and recognize the problem here. The actual problem lies in
> the API rte_eal_get_backfile_info().
> backfiles[i].size = hugepage_files[i].size;
> Should use statfs or hugepage_files[i].size * hugepage_files[i].repeated
> to calculate the total size.

 .repeated depends on CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS. By the way, looks 
like it does the same thing as you are trying to do with --single-file, but 
with hugepages, doesn't it? I see it's currently used by ivshmem (which is 
AFAIK very immature and half-abandoned).
 Or should we just move .repeated out of the #ifdef ?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k FTAG based forwarding

2016-01-12 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang Xiao W
> Sent: Tuesday, January 5, 2016 12:32 PM
> To: Chen, Jing D; Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: [dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k FTAG
> based forwarding
> 
> Add a brief introduction on FTAG, describes what's FTAG and how it works
> in forwarding, introduction on how to run fm10k with FTAG is also
> included.
> 
> Signed-off-by: Wang Xiao W 
> ---
>  doc/guides/nics/fm10k.rst | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst index
> 4206b7f..d82bf41 100644
> --- a/doc/guides/nics/fm10k.rst
> +++ b/doc/guides/nics/fm10k.rst
> @@ -34,6 +34,19 @@ FM10K Poll Mode Driver  The FM10K poll mode driver
> library provides support for the Intel FM1
>  (FM10K) family of 40GbE/100GbE adapters.
> 

Hi,

Some very minor comments.


> +FTAG Based Forwarding of FM10K
> +--

The Documentation Guidelines say to put a newline after section headers.

> +FTAG Based Forwarding is a unique feature of FM10K. The FM10K family of
> +NICs support the addition of a Fabric Tag (FTAG) to carry special
> information.
> +The FTAG is placed at the beginning of the frame, it contains
> +information such as where the packet comes from and goes, the vlan tag.

s/the vlan tag/and the vlan tag


> +In FTAG based forwarding mode, the switch logic forwards packets
> +according to glort (global resource tag) information, other than the

s/other/rather


> +mac and vlan table. Now this feature works only on PF.

s/Now/Currently


> +
> +To enable this feature, turn CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD to y in

In general variable and config names should be in fixed width quotes:

``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``


> +the configuration file. A unit test case fm10k_ftag_autotest is for

s/for/provided for

John.
-- 




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Yuanhan Liu
On Tue, Jan 12, 2016 at 03:01:01PM +0900, Tetsuya Mukawa wrote:
> On 2016/01/12 14:46, Tan, Jianfeng wrote:
> >
> > Hi Tetsuya,
> >  
> >
> >> Hi Jianfeng and Xie,
> >>
> >> I guess my implementation and yours have a lot of common code, so I will
> >> try to rebase my patch on yours.
> >
> > We also think so. And before you rebase your code, I think we can rely
> > on Yuanhan's
> > struct virtio_pci_ops to make the code structure brief and clear, as
> > discussed in your
> > patch's thread, i.e., we both rebase our code according to Yuanhan's
> > code. Is that OK?
> >
> 
> Yes, I agree with it.

I will send v2 out today, and hopefully someone will ACK and test it
soon.  After that, I'm also hoping Thomas could do a quick merge then.

--yliu



[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Sergio Gonzalez Monroy
On 12/01/2016 13:57, Pavel Fedin wrote:
>   Hello!
>
>> I might be missing something obvious here but, aside from having memory
>> SHARED which most DPDK apps using hugepages will have anyway, what is
>> the backward compatibility issues that you see here?
>   Heh, sorry once again for confusing. Indeed, with hugepages we always get 
> MAP_SHARED. I missed that. So, we indeed need
> --shared-mem only in addition to --no-huge.
>
>   Backwards compatibility issue is stated in the description of PATCH 1/4:
> --- cut ---
> b. possible ABI break, originally, --no-huge uses anonymous memory
> instead of file-backed way to create memory.
> --- cut ---
>   The patch unconditionally changes that to SHARED. That's all.

I should read more carefully!
Sorry about that, I thought you were the one with the ABI concerns.

Regarding ABI, I don't think there is any ABI issue with the change, we 
just have our memory file-backed and SHARED but we do that when using 
hugepages so I don't think it would be a huge issue.
But if folks have concerns about it, we could always keep old behavior 
by default and, as you suggest, introduce another option for changing 
the flag.

Sergio
> Kind regards,
> Pavel Fedin
> Senior Engineer
> Samsung Electronics Research center Russia
>
>



[dpdk-dev] [PATCH 09/11] doc: refresh headers list

2016-01-12 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Marchand
> Sent: Sunday, January 10, 2016 12:51 PM
> To: dev at dpdk.org
> Cc: thomas.monjalon at dpdk.org
> Subject: [dpdk-dev] [PATCH 09/11] doc: refresh headers list
> 
> Since we are going to remove a header in next commit, let's first refresh
> documentation.

Hi,

I don't like these parts of the docs that list files since they
go out of date quite easily and, in general, the same information
can be conveyed by just listing the directories. (That isn't
future-proof either but it should be less subject to change.)

In this case you could just remove everything in the console section
after the output from "ls x86_64-native-linuxapp-gcc" like this:


Each build directory contains include files, libraries, and applications like 
the following::

$ ls
app   tools
configMAINTAINERS
Makefile  GNUmakefile
drivers   mk
examples  pkg
doc   README
lib   scripts
LICENSE.GPL   LICENSE.LGPL
i686-native-linuxapp-gcc  x86_64-native-linuxapp-gcc
i686-native-linuxapp-icc  x86_64-native-linuxapp-icc

$ ls x86_64-native-linuxapp-gcc
app  build  include  kmod  lib  Makefile


John.
-- 




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

> >   BTW, i'm still unhappy about ABI breakage here. I think we could easily 
> > add --shared-mem
> option, which would simply change mapping mode to SHARED. So, we could use it 
> with both
> hugepages (default) and plain mmap (with --no-hugepages).
> 
> You mean, use "--no-hugepages --shared-mem" together, right?

 Yes. This would be perfectly backwards-compatible because.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Pavel Fedin
 Hello!

> Your guess makes sense because current implementation does not support
> multi-queues.
> 
>  From you log, only 0 and 1 are "ready for processing"; others are "not
> ready for processing".

 Yes, and if study it even more carefully, we see that we initialize all tx 
queues but only a single rx queue (#0).
 After some more code browsing and comparing the two patchsets i figured out 
that the problem is caused by inappropriate VIRTIO_NET_F_CTRL_VQ flag. In your 
RFC you used different capability set, while in v1 you seem to have forgotten 
about this.
 I suggest to temporarily move hw->guest_features assignment out of 
virtio_negotiate_features() into the caller, where we have eth_dev->dev_type, 
and can choose the right set depending on it.

 With all mentioned fixes i've got the ping running.
 Tested-by: Pavel Fedin 

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Tan, Jianfeng

Hi Tetsuya,


> Hi Jianfeng and Xie,
>
> I guess my implementation and yours have a lot of common code, so I will
> try to rebase my patch on yours.

We also think so. And before you rebase your code, I think we can rely 
on Yuanhan's
struct virtio_pci_ops to make the code structure brief and clear, as 
discussed in your
patch's thread, i.e., we both rebase our code according to Yuanhan's 
code. Is that OK?


>
> BTW, one thing I need to change your memory allocation way is that
> mmaped address should be under 44bit(32 + PAGE_SHIFT) to work with my patch.
> This is because VIRTIO_PCI_QUEUE_PFN register only accepts such address.
> (I may need to add one more EAL parameter like "--mmap-under ")

It makes sense.

Thanks,
Jianfeng

>
> Thanks,
> Tetsuya



[dpdk-dev] [PATCH 4/4] doc: update release note for VxLAN & NVGRE checksum off-load support

2016-01-12 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> Sent: Monday, January 11, 2016 7:07 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 4/4] doc: update release note for VxLAN & NVGRE
> checksum off-load support


> +* **Support VxLAN & NVGRE checksum off-load on X550**
> +
> +  * VxLAN & NVGRE RX/TX checksum off-load is supported on X550.
> +Provide RX/TX checksum off-load on both inner and outer IP
> +header and TCP header.
> +  * Support VxLAN port configuration. Although the default VxLAN
> +port number is 4789, it can be changed. We should make it
> +configable to meet the change.

Hi Wenzhou,

The release note text should be in the past tense. Something like this would be 
better:

* **Added support for VxLAN and NVGRE checksum off-load on X550.**

  * Added support for VxLAN and NVGRE RX/TX checksum off-load on
X550. RX/TX checksum off-load is provided on both inner and
outer IP header and TCP header.

  * Added functions to support for VxLAN port configuration. The
default VxLAN port number is 4789 but this can be updated
programmatically.



[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Sergio Gonzalez Monroy
On 12/01/2016 12:01, Pavel Fedin wrote:
>   Hello!
>
>>>.repeated depends on CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS. By the way, 
>>> looks like it does
>> the same thing as you are trying to do with --single-file, but with 
>> hugepages, doesn't it? I
>> see it's currently used by ivshmem (which is AFAIK very immature and 
>> half-abandoned).
>>
>> Similar but not the same.
>> --single-file: a single file for all mapped hugepages.
>> SINGLE_FILE_SEGMENTS: a file per set of physically contiguous mapped
>> hugepages (what DPDK calls memseg , memory segment). So there could be
>> more than one file.
>   Thank you for the explanation.
>
>   By this time, i've done more testing. Current patchset breaks --no-huge. I 
> did not study why:
> --- cut ---
> Program received signal SIGBUS, Bus error.
> malloc_elem_init (elem=elem at entry=0x7fffe51e6000, heap=0x77fe5a1c, 
> ms=ms at entry=0x77fb301c, size=size at entry=268435392) at 
> /home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c:62
> 62/home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c: No such file or 
> directory.
> Missing separate debuginfos, use: dnf debuginfo-install 
> keyutils-libs-1.5.9-7.fc23.x86_64 krb5-libs-1.13.2-11.fc23.x86_64 
> libcap-ng-0.7.7-2.fc23.x86_64 libcom_err-1.42.13-3.fc23.x86_64 
> libselinux-2.4-4.fc23.x86_64 openssl-libs-1.0.2d-2.fc23.x86_64 
> pcre-8.37-4.fc23.x86_64 zlib-1.2.8-9.fc23.x86_64
> (gdb) where
> #0  malloc_elem_init (elem=elem at entry=0x7fffe51e6000, heap=0x77fe5a1c, 
> ms=ms at entry=0x77fb301c, size=size at entry=268435392)
>  at /home/p.fedin/dpdk/lib/librte_eal/common/malloc_elem.c:62
> #1  0x004a50b5 in malloc_heap_add_memseg (ms=0x77fb301c, 
> heap=) at 
> /home/p.fedin/dpdk/lib/librte_eal/common/malloc_heap.c:109
> #2  rte_eal_malloc_heap_init () at 
> /home/p.fedin/dpdk/lib/librte_eal/common/malloc_heap.c:232
> #3  0x004be896 in rte_eal_memzone_init () at 
> /home/p.fedin/dpdk/lib/librte_eal/common/eal_common_memzone.c:427
> #4  0x0042ab02 in rte_eal_init (argc=argc at entry=11, argv=argv at 
> entry=0x7fffeb80) at 
> /home/p.fedin/dpdk/lib/librte_eal/linuxapp/eal/eal.c:799
> #5  0x0066dfb9 in dpdk_init (argc=11, argv=0x7fffeb80) at 
> lib/netdev-dpdk.c:2192
> #6  0x0040ddd9 in main (argc=12, argv=0x7fffeb78) at 
> vswitchd/ovs-vswitchd.c:74
> --- cut ---
>
>   And now i tend to think that we do not need --single-file at all. Because:
> a) It's just a temporary workaround for "more than 8 regions" problem.
> b) It's not compatible with physical hardware anyway.

That's a good summary.
I think --single-file was mostly solving the limit of vhost only mapping 
8 fds. We end up with a single memseg as we do with --no-huge except 
that they are hugepages (well, also in this patch mapped with shared 
instead of private).
Also, It would be compatible with physical hardware if using iommu and vfio.

Sergio

>   So i think that we could easily use "--no-huge --shared-mem" combination. 
> We could address hugepages compatibility problem later.
>
> Kind regards,
> Pavel Fedin
> Senior Engineer
> Samsung Electronics Research center Russia
>
>



[dpdk-dev] [PATCH 1/4] ixgbe: support UDP tunnel add/del

2016-01-12 Thread Thomas Monjalon
Hi,

2016-01-11 08:28, Lu, Wenzhuo:
> [Wenzhuo] The udp_tunnel_add and udp_tunnel_del have already existed. I just 
> use them. Honestly I agree with you they are not accurate name. Better change 
> them to udp_tunnel_port_add and udp_tunnel_port_del. But it should be a ABI 
> change if I?m not wrong. I think we can announce it this release and change 
> them in the next release. Would you agree?  Thanks.

You can introduce the new name and keep the old one for backward compat
while announcing its deprecation.
Thanks


[dpdk-dev] [PATCH 2/2] fm10k: update doc for Atwood Channel

2016-01-12 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michael Qiu
> Sent: Monday, January 11, 2016 7:28 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 2/2] fm10k: update doc for Atwood Channel
> 
> Atwood Channel is 20GbE NIC and belongs to Intel FM10K family, update the
> doc for it.
> 
> Signed-off-by: Michael Qiu 

Acked-by: John McNamara 



[dpdk-dev] [PATCH 1/2] fm10k: Add Atwood Channel Support

2016-01-12 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michael Qiu
> Sent: Monday, January 11, 2016 7:28 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 1/2] fm10k: Add Atwood Channel Support
> 
> Atwood Channel is intel 25G NIC, and this patch add the support in DPDK.
> 
> Signed-off-by: Michael Qiu

Acked-by: John McNamara 



[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Pavel Fedin
 Hello!

>> Should this be "hugepage->size = internal_config.memory"? Otherwise the 
>> vhost-user
>> memtable entry has a size of only 2MB.

> I don't think so. See the definition:

> 47 struct hugepage_file {
> 48 void *orig_va;  /**< virtual addr of first mmap() */
> 49 void *final_va; /**< virtual addr of 2nd mmap() */
> 50 uint64_t physaddr;  /**< physical addr */
> 51 size_t size;/**< the page size */
> 52 int socket_id;  /**< NUMA socket ID */
> 53 int file_id;/**< the '%d' in HUGEFILE_FMT */
> 54 int memseg_id;  /**< the memory segment to which page belongs 
> */
> 
> 55 #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
> 56 int repeated;   /**< number of times the page size is 
> repeated */   
> 
> 57 #endif
> 58 char filepath[MAX_HUGEPAGE_PATH]; /**< path to backing file on 
> filesystem */ 
>
> 59 };

> size stands for the page size instead of total size.

 But in this case host gets this page size for total region size, therefore 
qva_to_vva() fails.
 I haven't worked with hugepages, but i guess that with real hugepages we get 
one file per page, therefore page size == mapping size. With newly introduced 
--single-file we now have something that pretends to be a single 
"uber-huge-page", so we need to specify total size of the mapping here.

 BTW, i'm still unhappy about ABI breakage here. I think we could easily add 
--shared-mem option, which would simply change mapping mode to SHARED. So, we 
could use it with both hugepages (default) and plain mmap (with --no-hugepages).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Sergio Gonzalez Monroy
On 12/01/2016 11:37, Pavel Fedin wrote:
>   Hello!
>
>> So are you suggesting to not introduce --single-file option but instead
>> --shared-mem?
>> AFAIK --single-file was trying to workaround the limitation of just
>> being able to map 8 fds.
>   Heh, yes, you're right... Indeed, sorry, i was not patient enough, i see it 
> uses hpi->hugedir instead of using /dev/shm... I was confused by the code 
> path... It seemed that --single-file is an alias to --no-hugepages.
>   And the patch still changes mmap() mode to SHARED unconditionally, which is 
> not good in terms of backwards compability (and this is explicitly noticed in 
> the cover letter).

I might be missing something obvious here but, aside from having memory 
SHARED which most DPDK apps using hugepages will have anyway, what is 
the backward compatibility issues that you see here?

>
>   So, let's try to sort out...
>   a) By default we should still have MAP_PRIVATE
>   b) Let's say that we need --shared-mem in order to make it MAP_SHARED. This 
> can be combined with --no-hugepages if necessary (this is what i tried to 
> implement based on the old RFC).

--share-mem would only have meaning with --no-huge, right?

>   c) Let's say that --single-file uses hugetlbfs but maps everything via 
> single file. This still can be combined with --shared-mem.

By default, when using hugepages all mappings are SHARED for 
multiprocess model.
IMHO If you really want to have the ability to have private memory 
instead because you are not considering that model, then it might be 
more appropriate to have --private-mem or --no-shared-mem option instead.

Sergio
>   wouldn't this be more clear, more straightforward and implication-free?
>
>   And if we agree on that, we could now try to decrease number of options:
>   a) We could imply MAP_SHARED if cvio is used, because shared memory is 
> mandatory in this case.
>   b) (c) above again raises a question: doesn't it make 
> CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS obsolete? Or may be we could use that one 
> instead of --single-file (however i'm not a fan of compile-time configuration 
> like this)?
>
> Kind regards,
> Pavel Fedin
> Senior Engineer
> Samsung Electronics Research center Russia
>
>



[dpdk-dev] [PATCH v2 3/3] mlx5: increase RETA table size

2016-01-12 Thread Nelio Laranjeiro
ConnectX-4 NICs can handle at most 512 entries in RETA table.

Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_defs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index bb82c9a..ae5eda9 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -47,7 +47,7 @@
 #define MLX5_PMD_TX_PER_COMP_REQ 64

 /* RSS Indirection table size. */
-#define RSS_INDIRECTION_TABLE_SIZE 128
+#define RSS_INDIRECTION_TABLE_SIZE 512

 /* Maximum number of Scatter/Gather Elements per Work Request. */
 #ifndef MLX5_PMD_SGE_WR_N
-- 
2.1.4



[dpdk-dev] [PATCH v2 2/3] ethdev: change RETA type in rte_eth_rss_reta_entry64

2016-01-12 Thread Nelio Laranjeiro
Several NICs can handle 512 entries/queues in their RETA table, an 8 bit field
is not large enough for them.

Signed-off-by: Nelio Laranjeiro 
---
 app/test-pmd/cmdline.c   | 4 ++--
 doc/guides/rel_notes/deprecation.rst | 5 -
 lib/librte_ether/rte_ethdev.c| 2 +-
 lib/librte_ether/rte_ethdev.h| 2 +-
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..9c7cda0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -1767,7 +1767,7 @@ parse_reta_config(const char *str,
int i;
unsigned size;
uint16_t hash_index, idx, shift;
-   uint8_t nb_queue;
+   uint16_t nb_queue;
char s[256];
const char *p, *p0 = str;
char *end;
@@ -1800,7 +1800,7 @@ parse_reta_config(const char *str,
}

hash_index = (uint16_t)int_fld[FLD_HASH_INDEX];
-   nb_queue = (uint8_t)int_fld[FLD_QUEUE];
+   nb_queue = (uint16_t)int_fld[FLD_QUEUE];

if (hash_index >= nb_entries) {
printf("Invalid RETA hash index=%d\n", hash_index);
diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9cb288c..9930b5a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -15,11 +15,6 @@ Deprecation Notices
 * The ethdev structures rte_eth_link, rte_eth_dev_info and rte_eth_conf
   must be updated to support 100G link and to have a cleaner link speed API.

-* ABI changes is planned for the reta field in struct rte_eth_rss_reta_entry64
-  which handles at most 256 queues (8 bits) while newer NICs support larger
-  tables (512 queues).
-  It should be integrated in release 2.3.
-
 * ABI changes are planned for struct rte_eth_fdir_flow in order to support
   extend flow director's input set. The release 2.2 does not contain these ABI
   changes, but release 2.3 will, and no backwards compatibility is planned.
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed971b4..b0aa94d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1857,7 +1857,7 @@ rte_eth_check_reta_mask(struct rte_eth_rss_reta_entry64 
*reta_conf,
 static int
 rte_eth_check_reta_entry(struct rte_eth_rss_reta_entry64 *reta_conf,
 uint16_t reta_size,
-uint8_t max_rxq)
+uint16_t max_rxq)
 {
uint16_t i, idx, shift;

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bada8ad..8302a2d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -520,7 +520,7 @@ struct rte_eth_mirror_conf {
 struct rte_eth_rss_reta_entry64 {
uint64_t mask;
/**< Mask bits indicate which entries need to be updated/queried. */
-   uint8_t reta[RTE_RETA_GROUP_SIZE];
+   uint16_t reta[RTE_RETA_GROUP_SIZE];
/**< Group of 64 redirection table entries. */
 };

-- 
2.1.4



[dpdk-dev] [PATCH v2 1/3] cmdline: increase command line buffer

2016-01-12 Thread Nelio Laranjeiro
Allow long command lines in testpmd (like flow director with IPv6, ...).

Signed-off-by: John McNamara 
Signed-off-by: Nelio Laranjeiro 
---
 doc/guides/rel_notes/deprecation.rst | 5 -
 lib/librte_cmdline/cmdline_rdline.h  | 2 +-
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index e94d4a2..9cb288c 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -44,8 +44,3 @@ Deprecation Notices
   and table action handlers will be updated:
   the pipeline parameter will be added, the packets mask parameter will be
   either removed (for input port action handler) or made input-only.
-
-* ABI changes are planned in cmdline buffer size to allow the use of long
-  commands (such as RETA update in testpmd).  This should impact
-  CMDLINE_PARSE_RESULT_BUFSIZE, STR_TOKEN_SIZE and RDLINE_BUF_SIZE.
-  It should be integrated in release 2.3.
diff --git a/lib/librte_cmdline/cmdline_rdline.h 
b/lib/librte_cmdline/cmdline_rdline.h
index b9aad9b..72e2dad 100644
--- a/lib/librte_cmdline/cmdline_rdline.h
+++ b/lib/librte_cmdline/cmdline_rdline.h
@@ -93,7 +93,7 @@ extern "C" {
 #endif

 /* configuration */
-#define RDLINE_BUF_SIZE 256
+#define RDLINE_BUF_SIZE 512
 #define RDLINE_PROMPT_SIZE  32
 #define RDLINE_VT100_BUF_SIZE  8
 #define RDLINE_HISTORY_BUF_SIZE BUFSIZ
-- 
2.1.4



[dpdk-dev] [PATCH v2 0/3] ABI change for RETA, cmdline

2016-01-12 Thread Nelio Laranjeiro
Previous version of commit
"cmdline: increase command line buffer", had side effects and was breaking
some commands.

In this version, I only applied John McNamara's solution which consists in
increasing only RDLINE_BUF_SIZE define from 256 to 512 bytes [1].

[1] http://dpdk.org/ml/archives/dev/2015-November/027643.html

Nelio Laranjeiro (3):
  cmdline: increase command line buffer
  ethdev: change RETA type in rte_eth_rss_reta_entry64
  mlx5: increase RETA table size

 app/test-pmd/cmdline.c   |  4 ++--
 doc/guides/rel_notes/deprecation.rst | 10 --
 drivers/net/mlx5/mlx5_defs.h |  2 +-
 lib/librte_cmdline/cmdline_rdline.h  |  2 +-
 lib/librte_ether/rte_ethdev.c|  2 +-
 lib/librte_ether/rte_ethdev.h|  2 +-
 6 files changed, 6 insertions(+), 16 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Sergio Gonzalez Monroy
On 12/01/2016 11:07, Sergio Gonzalez Monroy wrote:
> Hi Pavel,
>
> On 12/01/2016 11:00, Pavel Fedin wrote:
>>   Hello!
>>
BTW, i'm still unhappy about ABI breakage here. I think we could 
 easily add --shared-mem

Could you elaborate a bit more on your concerns regarding ABI breakage ?

>>> option, which would simply change mapping mode to SHARED. So, we 
>>> could use it with both
>>> hugepages (default) and plain mmap (with --no-hugepages).
>>>
>>> You mean, use "--no-hugepages --shared-mem" together, right?
>>   Yes. This would be perfectly backwards-compatible because.
>
> So are you suggesting to not introduce --single-file option but 
> instead --shared-mem?
> AFAIK --single-file was trying to workaround the limitation of just 
> being able to map 8 fds.
>

My bad, I misread the posts.
Jianfeng pointed out that you are suggesting to have --shared-mem to 
have same functionality
with or without hugepages.

Sergio

> Sergio
>> Kind regards,
>> Pavel Fedin
>> Senior Engineer
>> Samsung Electronics Research center Russia
>>
>>
>



[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Pavel Fedin
 Hello!

> See my reply to "mem: add API to obstain memory-backed file info" for a 
> workaround. With fixes for that and the TUNSETVNETHDRSZ issue I was able to
> get traffic running over vhost-user.

 With ovs or test apps? I still have problems with ovs after this. Packets go 
from host to container, but not back. Here is host-side log (i added also GPA 
display in order to debug the problem you pointed at):
--- cut ---
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: new virtio 
connection is 38
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: new device, 
handle is 0
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_OWNER
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_GET_FEATURES
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_FEATURES
Jan 12 11:23:32 nfv_test_x86_64 kernel: device ovs-netdev entered promiscuous 
mode
Jan 12 11:23:32 nfv_test_x86_64 kernel: device ovs0 entered promiscuous mode
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_MEM_TABLE
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: mapped region 
0 fd:39 to:0x7f0ddea0 sz:0x2000 off:0x0 GPA:0x7f715900 
align:0x20
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_CALL
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring call 
idx:0 file:49
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_NUM
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_BASE
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_ADDR
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_KICK
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring kick 
idx:0 file:50
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: virtio is not 
ready for processing.
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_CALL
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring call 
idx:1 file:51
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_NUM
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_BASE
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_ADDR
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_KICK
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring kick 
idx:1 file:52
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: virtio is now 
ready for processing.
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_FEATURES
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_MEM_TABLE
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: mapped region 
0 fd:53 to:0x7f0ddea0 sz:0x2000 off:0x0 GPA:0x7f715900 
align:0x20
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_CALL
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring call 
idx:0 file:39
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_NUM
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_BASE
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_ADDR
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_KICK
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring kick 
idx:0 file:49
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: virtio is now 
ready for processing.
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_CALL
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring call 
idx:1 file:50
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_NUM
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_BASE
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_ADDR
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_KICK
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: vring kick 
idx:1 file:51
Jan 12 11:23:32 nfv_test_x86_64 ovs-vswitchd[3461]: VHOST_CONFIG: virtio is now 
ready for processing.
Jan 12 

[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Sergio Gonzalez Monroy
On 12/01/2016 11:22, Pavel Fedin wrote:
>   Hello!
>
>> Oh I get it and recognize the problem here. The actual problem lies in
>> the API rte_eal_get_backfile_info().
>> backfiles[i].size = hugepage_files[i].size;
>> Should use statfs or hugepage_files[i].size * hugepage_files[i].repeated
>> to calculate the total size.
>   .repeated depends on CONFIG_RTE_EAL_SIGLE_FILE_SEGMENTS. By the way, looks 
> like it does the same thing as you are trying to do with --single-file, but 
> with hugepages, doesn't it? I see it's currently used by ivshmem (which is 
> AFAIK very immature and half-abandoned).

Similar but not the same.
--single-file: a single file for all mapped hugepages.
SINGLE_FILE_SEGMENTS: a file per set of physically contiguous mapped 
hugepages (what DPDK calls memseg , memory segment). So there could be 
more than one file.

Sergio
>   Or should we just move .repeated out of the #ifdef ?
>
> Kind regards,
> Pavel Fedin
> Senior Engineer
> Samsung Electronics Research center Russia
>
>



[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-12 Thread Sergio Gonzalez Monroy
Hi Pavel,

On 12/01/2016 11:00, Pavel Fedin wrote:
>   Hello!
>
>>>BTW, i'm still unhappy about ABI breakage here. I think we could easily 
>>> add --shared-mem
>> option, which would simply change mapping mode to SHARED. So, we could use 
>> it with both
>> hugepages (default) and plain mmap (with --no-hugepages).
>>
>> You mean, use "--no-hugepages --shared-mem" together, right?
>   Yes. This would be perfectly backwards-compatible because.

So are you suggesting to not introduce --single-file option but instead 
--shared-mem?
AFAIK --single-file was trying to workaround the limitation of just 
being able to map 8 fds.

Sergio
> Kind regards,
> Pavel Fedin
> Senior Engineer
> Samsung Electronics Research center Russia
>
>



[dpdk-dev] [PATCH] doc: add a further ACL example

2016-01-12 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Antonio Fischetti
> Sent: Monday, January 11, 2016 5:45 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: add a further ACL example
> 
> Add a further ACL example where the elements of the search key are not
> entirely fitting into the 4 consecutive bytes of all input fields.
> 
> Signed-off-by: Antonio Fischetti 

Acked-by: John McNamara 



[dpdk-dev] [PATCH 4/4] virtio/vdev: add a new vdev named eth_cvio

2016-01-12 Thread Pavel Fedin
 Hello!

 See inline

> -Original Message-
> From: Jianfeng Tan [mailto:jianfeng.tan at intel.com]
> Sent: Sunday, January 10, 2016 2:43 PM
> To: dev at dpdk.org
> Cc: rich.lane at bigswitch.com; yuanhan.liu at linux.intel.com; mst at 
> redhat.com;
> nakajima.yoshihiro at lab.ntt.co.jp; huawei.xie at intel.com; mukawa at 
> igel.co.jp;
> p.fedin at samsung.com; michael.qiu at intel.com; ann.zhuangyanying at 
> huawei.com; Jianfeng Tan
> Subject: [PATCH 4/4] virtio/vdev: add a new vdev named eth_cvio
> 
> Add a new virtual device named eth_cvio, it can be used just like
> eth_ring, eth_null, etc.
> 
> Configured parameters include:
> - rx (optional, 1 by default): number of rx, only allowed to be
>  1 for now.
> - tx (optional, 1 by default): number of tx, only allowed to be
>  1 for now.
> - cq (optional, 0 by default): if ctrl queue is enabled, not
>  supported for now.
> - mac (optional): mac address, random value will be given if not
> specified.
> - queue_num (optional, 256 by default): size of virtqueue.
> - path (madatory): path of vhost, depends on the file type:
>  vhost-user is used if the given path points to
>  a unix socket; vhost-net is used if the given
>  path points to a char device.
> 
> The major difference with original virtio for vm is that, here we
> use virtual address instead of physical address for vhost to
> calculate relative address.
> 
> When enable CONFIG_RTE_VIRTIO_VDEV (enabled by default), the compiled
> library can be used in both VM and container environment.
> 
> Examples:
> a. Use vhost-net as a backend
> sudo numactl -N 1 -m 1 ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 \
> -m 1024 --no-pci --single-file --file-prefix=l2fwd \
> --vdev=eth_cvio0,mac=00:01:02:03:04:05,path=/dev/vhost-net \
> -- -p 0x1
> 
> b. Use vhost-user as a backend
> numactl -N 1 -m 1 ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 -m 1024 \
> --no-pci --single-file --file-prefix=l2fwd \
> --vdev=eth_cvio0,mac=00:01:02:03:04:05,path= \
> -- -p 0x1
> 
> Signed-off-by: Huawei Xie 
> Signed-off-by: Jianfeng Tan 
> ---
>  drivers/net/virtio/virtio_ethdev.c  | 338 
> +---
>  drivers/net/virtio/virtio_ethdev.h  |   1 +
>  drivers/net/virtio/virtio_pci.h |  24 +--
>  drivers/net/virtio/virtio_rxtx.c|  11 +-
>  drivers/net/virtio/virtio_rxtx_simple.c |  14 +-
>  drivers/net/virtio/virtqueue.h  |  13 +-
>  6 files changed, 302 insertions(+), 99 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index d928339..6e46060 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -56,6 +56,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "virtio_ethdev.h"
>  #include "virtio_pci.h"
> @@ -174,14 +175,14 @@ virtio_send_command(struct virtqueue *vq, struct 
> virtio_pmd_ctrl *ctrl,
>* One RX packet for ACK.
>*/
>   vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
> - vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
> + vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mem;
>   vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
>   vq->vq_free_cnt--;
>   i = vq->vq_ring.desc[head].next;
> 
>   for (k = 0; k < pkt_num; k++) {
>   vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
> - vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
> + vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mem
>   + sizeof(struct virtio_net_ctrl_hdr)
>   + sizeof(ctrl->status) + sizeof(uint8_t)*sum;
>   vq->vq_ring.desc[i].len = dlen[k];
> @@ -191,7 +192,7 @@ virtio_send_command(struct virtqueue *vq, struct 
> virtio_pmd_ctrl *ctrl,
>   }
> 
>   vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
> - vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
> + vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mem
>   + sizeof(struct virtio_net_ctrl_hdr);
>   vq->vq_ring.desc[i].len = sizeof(ctrl->status);
>   vq->vq_free_cnt--;
> @@ -374,68 +375,85 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
>   }
>   }
> 
> - /*
> -  * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
> -  * and only accepts 32 bit page frame number.
> -  * Check if the allocated physical memory exceeds 16TB.
> -  */
> - if ((mz->phys_addr + vq->vq_ring_size - 1) >> 
> (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
> - PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
> - rte_free(vq);
> - return -ENOMEM;
> - }
> -
>   memset(mz->addr, 0, sizeof(mz->len));
>   

[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Pavel Fedin
 Hello!

> > b) With --single-file - ovs runs, but doesn't get any packets at all. When 
> > i try to ping
> the container from within host side, it
> > counts drops on vhost-user port.
> Can you check the OVS in host side, if it prints out message of "virtio
> is now ready for processing"?

 No, i get errors:
--- cut ---
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: new virtio 
connection is 38
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: new device, 
handle is 0
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_OWNER
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_GET_FEATURES
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_FEATURES
Jan 12 10:27:43 nfv_test_x86_64 kernel: device ovs-netdev entered promiscuous 
mode
Jan 12 10:27:43 nfv_test_x86_64 kernel: device ovs0 entered promiscuous mode
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_MEM_TABLE
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: mapped 
region 0 fd:39 to:0x7f079c60 sz:0x20 off:0x0
align:0x20
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_CALL
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: vring call 
idx:0 file:49
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_NUM
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_BASE
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: read message 
VHOST_USER_SET_VRING_ADDR
Jan 12 10:27:43 nfv_test_x86_64 ovs-vswitchd[18858]: VHOST_CONFIG: (0) Failed 
to find desc ring address.
--- cut ---

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH v2 0/4] fix the issue that DPDK takes over virtio device blindly

2016-01-12 Thread Santosh Shukla
On Tue, Jan 12, 2016 at 8:32 AM, Xie, Huawei  wrote:
>
> On 1/5/2016 1:25 AM, Stephen Hemminger wrote:
> > On Mon,  4 Jan 2016 01:56:09 +0800
> > Huawei Xie  wrote:
> >
> >> v2 changes:
> >>  Remove unnecessary assignment of NULL to dev->data->mac_addrs
> >>  Ajust one comment's position
> >>  change LOG level from ERR to INFO
> >>
> >> virtio PMD doesn't set RTE_PCI_DRV_NEED_MAPPING in drv_flags of its
> >> eth_driver. It will try igb_uio and PORT IO in turn to configure
> >> virtio device. Even user in guest VM doesn't want to use virtio for
> >> DPDK, virtio PMD will take over the device blindly.
> >>
> >> The more serious problem is kernel driver is still manipulating the
> >> device, which causes driver conflict.
> >>
> >> This patch checks if there is any kernel driver manipulating the
> >> virtio device before virtio PMD uses port IO to configure the device.
> >>
> >> Huawei Xie (4):
> >>   eal: make the comment more accurate
> >>   eal: set kdrv to RTE_KDRV_NONE if kernel driver isn't manipulating the 
> >> device.
> >>   virtio: return 1 to tell the kernel we don't take over this device
> >>   virtio: check if any kernel driver is manipulating the virtio device
> >>
> >>  drivers/net/virtio/virtio_ethdev.c | 16 ++--
> >>  lib/librte_eal/common/eal_common_pci.c |  8 
> >>  lib/librte_eal/linuxapp/eal/eal_pci.c  |  2 +-
> >>  3 files changed, 19 insertions(+), 7 deletions(-)
> >>
> > Overall looks good, thanks for addressing this.
> >
> > It would be good to note that VFIO no-IOMMU mode should work for this
> > as well.
>
> It isn't implemented yet in virtio PMD. I could add a note in the commit
> message. Do you plan to implement this?
>

I can send vfio-noiommu patches for this one, as I am looking at
vfio-noiommu for virtio  for my arm v4 patch series. Stephen, let me
know if you already started working on this?

Also for some reason I can't find [3/4] patch, could you point me to
patch link? Thanks.
>
> >
>


[dpdk-dev] [PATCH 03/11] i40e: move pci device ids to driver

2016-01-12 Thread David Marchand
On Sun, Jan 10, 2016 at 9:02 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Sun, 10 Jan 2016 13:50:46 +0100
> David Marchand  wrote:
>
> > +{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_SFP_XL710) },
> > +{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_QEMU) },
> > +{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_KX_A) },
> > +{ RTE_PCI_DEVICE(I40E_INTEL_VENDOR_ID, I40E_DEV_ID_KX_B) }
>
> You should indent the initializers.
>

Ok will do, but I don't like having to break lines because of the
80-columns limit (which happens in some drivers).

-- 
David Marchand


[dpdk-dev] [PATCH v2 6/7] eal: pci: export pci_map_device

2016-01-12 Thread David Marchand
On Tue, Jan 12, 2016 at 7:59 AM, Yuanhan Liu 
wrote:

> Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
> invoke pci_map_device internally for us. From that point view, there
> is no need to export pci_map_device.
>
> However, for virtio pmd driver, which is designed to work without
> binding UIO (or something similar first), pci_map_device() will fail,
> which ends up with virtio pmd driver being skipped. Therefore, we can
> not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.
>
> Therefore, this patch exports pci_map_device, and let virtio pmd
> call it when necessary.
>

Well, if you introduce map function, I suppose, for hotplug, you would need
unmap.

[snip]

diff --git a/lib/librte_eal/common/include/rte_pci.h
> b/lib/librte_eal/common/include/rte_pci.h
> index 334c12e..e9e1725 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -485,6 +485,17 @@ int rte_eal_pci_read_config(const struct
> rte_pci_device *device,
>   */
>  int rte_eal_pci_write_config(const struct rte_pci_device *device,
>  const void *buf, size_t len, off_t offset);
> +/**
> + * Map this device
> + *
> + * This function is private to EAL.
> + *
> + * @return
> + *   0 on success, negative on error and positive if no driver
> + *   is found for the device.
> + */
> +int rte_eal_pci_map_device(struct rte_pci_device *dev);
> +
>

If you export it, then this can not be marked as private anymore.
Description could be better (I agree it was not that great before).
And a little comment on when to call: driver should not set
RTE_PCI_DRV_NEED_MAPPING flag if it wants to use it.

The rest looks good to me.

-- 
David Marchand


[dpdk-dev] [PATCH v2 0/4] fix the issue that DPDK takes over virtio device blindly

2016-01-12 Thread Xie, Huawei
On 1/12/2016 12:24 PM, Santosh Shukla wrote:
> On Tue, Jan 12, 2016 at 8:32 AM, Xie, Huawei  wrote:
>> On 1/5/2016 1:25 AM, Stephen Hemminger wrote:
>>> On Mon,  4 Jan 2016 01:56:09 +0800
>>> Huawei Xie  wrote:
>>>
 v2 changes:
  Remove unnecessary assignment of NULL to dev->data->mac_addrs
  Ajust one comment's position
  change LOG level from ERR to INFO

 virtio PMD doesn't set RTE_PCI_DRV_NEED_MAPPING in drv_flags of its
 eth_driver. It will try igb_uio and PORT IO in turn to configure
 virtio device. Even user in guest VM doesn't want to use virtio for
 DPDK, virtio PMD will take over the device blindly.

 The more serious problem is kernel driver is still manipulating the
 device, which causes driver conflict.

 This patch checks if there is any kernel driver manipulating the
 virtio device before virtio PMD uses port IO to configure the device.

 Huawei Xie (4):
   eal: make the comment more accurate
   eal: set kdrv to RTE_KDRV_NONE if kernel driver isn't manipulating the 
 device.
   virtio: return 1 to tell the kernel we don't take over this device
   virtio: check if any kernel driver is manipulating the virtio device

  drivers/net/virtio/virtio_ethdev.c | 16 ++--
  lib/librte_eal/common/eal_common_pci.c |  8 
  lib/librte_eal/linuxapp/eal/eal_pci.c  |  2 +-
  3 files changed, 19 insertions(+), 7 deletions(-)

>>> Overall looks good, thanks for addressing this.
>>>
>>> It would be good to note that VFIO no-IOMMU mode should work for this
>>> as well.
>> It isn't implemented yet in virtio PMD. I could add a note in the commit
>> message. Do you plan to implement this?
>>
> I can send vfio-noiommu patches for this one, as I am looking at
> vfio-noiommu for virtio  for my arm v4 patch series. Stephen, let me
> know if you already started working on this?
>
> Also for some reason I can't find [3/4] patch, could you point me to
> patch link? Thanks.
Thanks. Here is the patch: http://www.dpdk.org/dev/patchwork/patch/9720/



[dpdk-dev] [PATCH v2 0/4] fix the issue that DPDK takes over virtio device blindly

2016-01-12 Thread Xie, Huawei
On 1/5/2016 1:25 AM, Stephen Hemminger wrote:
> On Mon,  4 Jan 2016 01:56:09 +0800
> Huawei Xie  wrote:
>
>> v2 changes:
>>  Remove unnecessary assignment of NULL to dev->data->mac_addrs
>>  Ajust one comment's position
>>  change LOG level from ERR to INFO
>>
>> virtio PMD doesn't set RTE_PCI_DRV_NEED_MAPPING in drv_flags of its
>> eth_driver. It will try igb_uio and PORT IO in turn to configure
>> virtio device. Even user in guest VM doesn't want to use virtio for
>> DPDK, virtio PMD will take over the device blindly.
>>
>> The more serious problem is kernel driver is still manipulating the
>> device, which causes driver conflict.
>>
>> This patch checks if there is any kernel driver manipulating the
>> virtio device before virtio PMD uses port IO to configure the device.
>>
>> Huawei Xie (4):
>>   eal: make the comment more accurate
>>   eal: set kdrv to RTE_KDRV_NONE if kernel driver isn't manipulating the 
>> device.
>>   virtio: return 1 to tell the kernel we don't take over this device
>>   virtio: check if any kernel driver is manipulating the virtio device
>>
>>  drivers/net/virtio/virtio_ethdev.c | 16 ++--
>>  lib/librte_eal/common/eal_common_pci.c |  8 
>>  lib/librte_eal/linuxapp/eal/eal_pci.c  |  2 +-
>>  3 files changed, 19 insertions(+), 7 deletions(-)
>>
> Overall looks good, thanks for addressing this.
>
> It would be good to note that VFIO no-IOMMU mode should work for this
> as well.

It isn't implemented yet in virtio PMD. I could add a note in the commit
message. Do you plan to implement this?

>



[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Rich Lane
See my reply to "mem: add API to obstain memory-backed file info" for a
workaround. With fixes for that and the TUNSETVNETHDRSZ issue I was able to
get traffic running over vhost-user.