Re: [PATCH v3] can: Fix kernel panic at security_sock_rcv_skb

2017-02-10 Thread Oliver Hartkopp

Hello Dave, Greg,

On 01/30/2017 12:34 AM, David Miller wrote:

From: Eric Dumazet 
Date: Fri, 27 Jan 2017 08:11:44 -0800


From: Eric Dumazet 

Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()

The main problem seems that the sockets themselves are not RCU
protected.

If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.

Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.

 ...

Reported-by: Zhang Yanmin 
Signed-off-by: Eric Dumazet 


Applied and queued up for -stable, thanks Eric.



can you please check whether this upstream commit

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f1712c73714088a7252d276a57126d56c7d37e64

really was queued up for -stable?

This commit

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412

was posted later and already got into the 4.4 and 4.9 stable trees.

Best regards,
Oliver


[PATCH 1/1] net: ethernet: intel: e1000: msleep() is unreliable for anything <20ms

2017-02-10 Thread Saber Rezvani
Fix the checkpatch.pl issue:
WARNING: msleep < 20ms can sleep for up to 20ms; see
Documentation/timers/timers-howto.txt

Signed-off-by: Saber Rezvani 
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 975eeb8..fa47eab 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -207,7 +207,7 @@ static int e1000_set_settings(struct net_device *netdev,
}
 
while (test_and_set_bit(__E1000_RESETTING, &adapter->flags))
-   msleep(1);
+   usleep_range(1000, 5000);
 
if (ecmd->autoneg == AUTONEG_ENABLE) {
hw->autoneg = 1;
@@ -294,7 +294,7 @@ static int e1000_set_pauseparam(struct net_device *netdev,
adapter->fc_autoneg = pause->autoneg;
 
while (test_and_set_bit(__E1000_RESETTING, &adapter->flags))
-   msleep(1);
+   usleep_range(1000, 5000);
 
if (pause->rx_pause && pause->tx_pause)
hw->fc = E1000_FC_FULL;
@@ -592,7 +592,7 @@ static int e1000_set_ringparam(struct net_device *netdev,
return -EINVAL;
 
while (test_and_set_bit(__E1000_RESETTING, &adapter->flags))
-   msleep(1);
+   usleep_range(1000, 5000);
 
if (netif_running(adapter->netdev))
e1000_down(adapter);
@@ -869,7 +869,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
/* Disable all the interrupts */
ew32(IMC, 0x);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
/* Test each interrupt */
for (; i < 10; i++) {
@@ -887,7 +887,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
ew32(IMC, mask);
ew32(ICS, mask);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
if (adapter->test_icr & mask) {
*data = 3;
@@ -905,7 +905,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
ew32(IMS, mask);
ew32(ICS, mask);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
if (!(adapter->test_icr & mask)) {
*data = 4;
@@ -923,7 +923,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
ew32(IMC, ~mask & 0x7FFF);
ew32(ICS, ~mask & 0x7FFF);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
if (adapter->test_icr) {
*data = 5;
@@ -935,7 +935,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
/* Disable all the interrupts */
ew32(IMC, 0x);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
/* Unhook test interrupt handler */
free_irq(irq, netdev);
-- 
2.7.4




[PATCH 1/1] net: ethernet: intel: e1000: msleep() is unreliable for anything <20ms

2017-02-10 Thread Saber Rezvani
Fix the checkpatch.pl issue:
WARNING: msleep < 20ms can sleep for up to 20ms; see
Documentation/timers/timers-howto.txt

Signed-off-by: Saber Rezvani 
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 975eeb8..fa47eab 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -207,7 +207,7 @@ static int e1000_set_settings(struct net_device *netdev,
}
 
while (test_and_set_bit(__E1000_RESETTING, &adapter->flags))
-   msleep(1);
+   usleep_range(1000, 5000);
 
if (ecmd->autoneg == AUTONEG_ENABLE) {
hw->autoneg = 1;
@@ -294,7 +294,7 @@ static int e1000_set_pauseparam(struct net_device *netdev,
adapter->fc_autoneg = pause->autoneg;
 
while (test_and_set_bit(__E1000_RESETTING, &adapter->flags))
-   msleep(1);
+   usleep_range(1000, 5000);
 
if (pause->rx_pause && pause->tx_pause)
hw->fc = E1000_FC_FULL;
@@ -592,7 +592,7 @@ static int e1000_set_ringparam(struct net_device *netdev,
return -EINVAL;
 
while (test_and_set_bit(__E1000_RESETTING, &adapter->flags))
-   msleep(1);
+   usleep_range(1000, 5000);
 
if (netif_running(adapter->netdev))
e1000_down(adapter);
@@ -869,7 +869,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
/* Disable all the interrupts */
ew32(IMC, 0x);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
/* Test each interrupt */
for (; i < 10; i++) {
@@ -887,7 +887,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
ew32(IMC, mask);
ew32(ICS, mask);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
if (adapter->test_icr & mask) {
*data = 3;
@@ -905,7 +905,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
ew32(IMS, mask);
ew32(ICS, mask);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
if (!(adapter->test_icr & mask)) {
*data = 4;
@@ -923,7 +923,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
ew32(IMC, ~mask & 0x7FFF);
ew32(ICS, ~mask & 0x7FFF);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
if (adapter->test_icr) {
*data = 5;
@@ -935,7 +935,7 @@ static int e1000_intr_test(struct e1000_adapter *adapter, 
u64 *data)
/* Disable all the interrupts */
ew32(IMC, 0x);
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1, 11000);
 
/* Unhook test interrupt handler */
free_irq(irq, netdev);
-- 
2.7.4




Assalamu`Alaikum.

2017-02-10 Thread mohammad ouattara



Dear Sir/Madam.

Assalamu`Alaikum.

I am Dr mohammad ouattara, I have  ($10.6 Million us dollars) to transfer into 
your account,

I will send you more details about this deal and the procedures to follow when 
I receive a positive response from you, 

Have a great day,
Dr mohammad ouattara.


Re: [PATCH net-next 6/8] net: ethernet: annapurna: add wol helpers to the Alpine driver

2017-02-10 Thread Antoine Tenart
Hi!

On Mon, Feb 06, 2017 at 11:35:49AM +, David Laight wrote:
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On 
> Behalf Of Sergei Shtylyov
> > Sent: 03 February 2017 18:22
> > On 02/03/2017 09:12 PM, Antoine Tenart wrote:
> > 
> > > + if ((adapter) && (adapter->phy_exist) && (adapter->mdio_bus)) {
> > 
> > Now that's somewhat stupid looking... does the whole driver use this 
> > "style"?
> 
> Not only that, in one of the two functions it is followed by:
> 
> + device_set_wakeup_enable(&adapter->pdev->dev, adapter->wol);
> 
> Which assumes that 'adapter' is not NULL.
> Some verifiers will detect that as a possible NULL pointer dereference.
> 
> Pointers should only be checked for NULL if there are valid reasons
> why they can be NULL in that code path.
> Getting there with a NULL pointer dues to some race condition isn't one of 
> them.

Totally agree, for the NULL checks and for the useless parenthesis. I'll
try to catch other examples of this in the driver.

Thanks!

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: net: hix5hd2_gmac uninitialized net_device

2017-02-10 Thread Dongpo Li


On 2017/2/10 15:45, Marty Plummer wrote:
> On Fri, Feb 10, 2017 at 01:41:18AM -0600, Marty Plummer wrote:
>> Greetings.
>>
>> I think I may have found a bug with the hix5hd2_gmac driver; unless I'm
>> missing something, it appears that somehow the net_device struct is not
>> being initialized properly in the hix5hd2_dev_probe function.
>>
>> Having set up my devicetree properly (I hope, still new to this), I first
>> recieved an error when inserting the module:
>> "(unnamed net_device) (uninitialized): No irq resource"
>> while I very clearly have the interrupts property defined within this node.
>>
I think the error "No irq resource" happened for some other reason, has no 
relation with
the info "(unnamed net_device) (uninitialized):".
You can add more debug info to find bug.

>> Removing the phy-handle node for testing purposes, I get a similar message:
>> "(unnamed net_device) (uninitialized): not find phy-handle"
>>
>> So, it seams to my (admittedly inexperienced) mind that the ndev pointer is
>> not being initialized properly, or that the error checking at line 
>> is not functioning properly either, for it to have gotten so far along
>> into the function, only to fail at the attempt to access the ndev pointer.
>>
Yes, I agree with you that the ndev has not been initialized completely,
because the function "register_netdev" has not been called yet.
It's better to use the "dev_err" to replace the "netdev_err".

>> If you require more information from me, please let me know.
>>
>> Marty
> 
> Sorry, forgot the subject. Still getting the hang of mutt.
> 
> 

Regards,
Dongpo

.



[PATCH V5 for-next 15/21] RDMA/bnxt_re: Support post_recv

2017-02-10 Thread Selvin Xavier
Enables the fastpath verb ib_post_recv.

v3: Fixes sparse warnings
v5: Code cleanup to avoid goto statement in post_recv
routines, as per Leon's suggestions.

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 123 +++
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |   2 +
 drivers/infiniband/hw/bnxt_re/main.c |   2 +
 drivers/infiniband/hw/bnxt_re/qplib_fp.c | 100 +
 drivers/infiniband/hw/bnxt_re/qplib_fp.h |   8 ++
 5 files changed, 235 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 9401717..54d85bc 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -1635,6 +1635,52 @@ static int bnxt_re_build_qp1_send_v2(struct bnxt_re_qp 
*qp,
return rc;
 }
 
+/* For the MAD layer, it only provides the recv SGE the size of
+ * ib_grh + MAD datagram.  No Ethernet headers, Ethertype, BTH, DETH,
+ * nor RoCE iCRC.  The Cu+ solution must provide buffer for the entire
+ * receive packet (334 bytes) with no VLAN and then copy the GRH
+ * and the MAD datagram out to the provided SGE.
+ */
+static int bnxt_re_build_qp1_shadow_qp_recv(struct bnxt_re_qp *qp,
+   struct ib_recv_wr *wr,
+   struct bnxt_qplib_swqe *wqe,
+   int payload_size)
+{
+   struct bnxt_qplib_sge ref, sge;
+   u32 rq_prod_index;
+   struct bnxt_re_sqp_entries *sqp_entry;
+
+   rq_prod_index = bnxt_qplib_get_rq_prod_index(&qp->qplib_qp);
+
+   if (!bnxt_qplib_get_qp1_rq_buf(&qp->qplib_qp, &sge))
+   return -ENOMEM;
+
+   /* Create 1 SGE to receive the entire
+* ethernet packet
+*/
+   /* Save the reference from ULP */
+   ref.addr = wqe->sg_list[0].addr;
+   ref.lkey = wqe->sg_list[0].lkey;
+   ref.size = wqe->sg_list[0].size;
+
+   sqp_entry = &qp->rdev->sqp_tbl[rq_prod_index];
+
+   /* SGE 1 */
+   wqe->sg_list[0].addr = sge.addr;
+   wqe->sg_list[0].lkey = sge.lkey;
+   wqe->sg_list[0].size = BNXT_QPLIB_MAX_QP1_RQ_HDR_SIZE_V2;
+   sge.size -= wqe->sg_list[0].size;
+
+   sqp_entry->sge.addr = ref.addr;
+   sqp_entry->sge.lkey = ref.lkey;
+   sqp_entry->sge.size = ref.size;
+   /* Store the wrid for reporting completion */
+   sqp_entry->wrid = wqe->wr_id;
+   /* change the wqe->wrid to table index */
+   wqe->wr_id = rq_prod_index;
+   return 0;
+}
+
 static int is_ud_qp(struct bnxt_re_qp *qp)
 {
return qp->qplib_qp.type == CMDQ_CREATE_QP_TYPE_UD;
@@ -1981,6 +2027,83 @@ int bnxt_re_post_send(struct ib_qp *ib_qp, struct 
ib_send_wr *wr,
return rc;
 }
 
+static int bnxt_re_post_recv_shadow_qp(struct bnxt_re_dev *rdev,
+  struct bnxt_re_qp *qp,
+  struct ib_recv_wr *wr)
+{
+   struct bnxt_qplib_swqe wqe;
+   int rc = 0, payload_sz = 0;
+
+   memset(&wqe, 0, sizeof(wqe));
+   while (wr) {
+   /* House keeping */
+   memset(&wqe, 0, sizeof(wqe));
+
+   /* Common */
+   wqe.num_sge = wr->num_sge;
+   if (wr->num_sge > qp->qplib_qp.rq.max_sge) {
+   dev_err(rdev_to_dev(rdev),
+   "Limit exceeded for Receive SGEs");
+   rc = -EINVAL;
+   break;
+   }
+   payload_sz = bnxt_re_build_sgl(wr->sg_list, wqe.sg_list,
+  wr->num_sge);
+   wqe.wr_id = wr->wr_id;
+   wqe.type = BNXT_QPLIB_SWQE_TYPE_RECV;
+
+   rc = bnxt_qplib_post_recv(&qp->qplib_qp, &wqe);
+   if (rc)
+   break;
+
+   wr = wr->next;
+   }
+   if (!rc)
+   bnxt_qplib_post_recv_db(&qp->qplib_qp);
+   return rc;
+}
+
+int bnxt_re_post_recv(struct ib_qp *ib_qp, struct ib_recv_wr *wr,
+ struct ib_recv_wr **bad_wr)
+{
+   struct bnxt_re_qp *qp = container_of(ib_qp, struct bnxt_re_qp, ib_qp);
+   struct bnxt_qplib_swqe wqe;
+   int rc = 0, payload_sz = 0;
+
+   while (wr) {
+   /* House keeping */
+   memset(&wqe, 0, sizeof(wqe));
+
+   /* Common */
+   wqe.num_sge = wr->num_sge;
+   if (wr->num_sge > qp->qplib_qp.rq.max_sge) {
+   dev_err(rdev_to_dev(qp->rdev),
+   "Limit exceeded for Receive SGEs");
+   rc = -EINVAL;
+   *bad_wr = wr;
+   break;
+   }
+
+   payload_sz = bnxt_re_build_sgl(wr->sg_list, w

[PATCH V5 for-next 17/21] RDMA/bnxt_re: Handling dispatching of events to IB stack

2017-02-10 Thread Selvin Xavier
This patch implements events dispatching to the IB stack
based on NETDEV events received.

v2: Removed cleanup of the resources during driver unload since
we are calling unregister_netdevice_notifier first in the exit.

v3: Fixes cocci warnings and some sparse warnings
v5: Removes qp_wait parameter from bnxt_re_dev_stop() as it was never used

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/main.c | 59 
 1 file changed, 59 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index 95dbbbe..c84691a 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -712,6 +712,51 @@ static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev)
return rc;
 }
 
+static void bnxt_re_dispatch_event(struct ib_device *ibdev, struct ib_qp *qp,
+  u8 port_num, enum ib_event_type event)
+{
+   struct ib_event ib_event;
+
+   ib_event.device = ibdev;
+   if (qp)
+   ib_event.element.qp = qp;
+   else
+   ib_event.element.port_num = port_num;
+   ib_event.event = event;
+   ib_dispatch_event(&ib_event);
+}
+
+static bool bnxt_re_is_qp1_or_shadow_qp(struct bnxt_re_dev *rdev,
+   struct bnxt_re_qp *qp)
+{
+   return (qp->ib_qp.qp_type == IB_QPT_GSI) || (qp == rdev->qp1_sqp);
+}
+
+static void bnxt_re_dev_stop(struct bnxt_re_dev *rdev)
+{
+   int mask = IB_QP_STATE;
+   struct ib_qp_attr qp_attr;
+   struct bnxt_re_qp *qp;
+
+   qp_attr.qp_state = IB_QPS_ERR;
+   mutex_lock(&rdev->qp_lock);
+   list_for_each_entry(qp, &rdev->qp_list, list) {
+   /* Modify the state of all QPs except QP1/Shadow QP */
+   if (!bnxt_re_is_qp1_or_shadow_qp(rdev, qp)) {
+   if (qp->qplib_qp.state !=
+   CMDQ_MODIFY_QP_NEW_STATE_RESET &&
+   qp->qplib_qp.state !=
+   CMDQ_MODIFY_QP_NEW_STATE_ERR) {
+   bnxt_re_dispatch_event(&rdev->ibdev, &qp->ib_qp,
+  1, IB_EVENT_QP_FATAL);
+   bnxt_re_modify_qp(&qp->ib_qp, &qp_attr, mask,
+ NULL);
+   }
+   }
+   }
+   mutex_unlock(&rdev->qp_lock);
+}
+
 static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev, bool lock_wait)
 {
int i, rc;
@@ -871,6 +916,9 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
}
}
set_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
+   bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_PORT_ACTIVE);
+   bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_GID_CHANGE);
+
return 0;
 free_sctx:
bnxt_re_net_stats_ctx_free(rdev, rdev->qplib_ctx.stats.fw_id, true);
@@ -950,8 +998,19 @@ static void bnxt_re_task(struct work_struct *work)
"Failed to register with IB: %#x", rc);
break;
case NETDEV_UP:
+   bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1,
+  IB_EVENT_PORT_ACTIVE);
+   break;
case NETDEV_DOWN:
+   bnxt_re_dev_stop(rdev);
+   break;
case NETDEV_CHANGE:
+   if (!netif_carrier_ok(rdev->netdev))
+   bnxt_re_dev_stop(rdev);
+   else if (netif_carrier_ok(rdev->netdev))
+   bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1,
+  IB_EVENT_PORT_ACTIVE);
+   break;
default:
break;
}
-- 
2.5.5



[PATCH V5 for-next 16/21] RDMA/bnxt_re: Support poll_cq verb

2017-02-10 Thread Selvin Xavier
Enables the fastpath ib_poll_cq verb.

v2: Fixed sparse warnings
v3: Fixes endianness related warnings reported by sparse. Also, fixes
smatch and checkpatch warnings
v5: Uses ETH_P_IBOE macro for RoCE ethertype

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 522 
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |   1 +
 drivers/infiniband/hw/bnxt_re/main.c |  22 +-
 drivers/infiniband/hw/bnxt_re/qplib_fp.c | 560 ++-
 drivers/infiniband/hw/bnxt_re/qplib_fp.h |   7 +-
 5 files changed, 1107 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 54d85bc..33af2e3 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -2230,6 +2230,528 @@ struct ib_cq *bnxt_re_create_cq(struct ib_device *ibdev,
return ERR_PTR(rc);
 }
 
+static u8 __req_to_ib_wc_status(u8 qstatus)
+{
+   switch (qstatus) {
+   case CQ_REQ_STATUS_OK:
+   return IB_WC_SUCCESS;
+   case CQ_REQ_STATUS_BAD_RESPONSE_ERR:
+   return IB_WC_BAD_RESP_ERR;
+   case CQ_REQ_STATUS_LOCAL_LENGTH_ERR:
+   return IB_WC_LOC_LEN_ERR;
+   case CQ_REQ_STATUS_LOCAL_QP_OPERATION_ERR:
+   return IB_WC_LOC_QP_OP_ERR;
+   case CQ_REQ_STATUS_LOCAL_PROTECTION_ERR:
+   return IB_WC_LOC_PROT_ERR;
+   case CQ_REQ_STATUS_MEMORY_MGT_OPERATION_ERR:
+   return IB_WC_GENERAL_ERR;
+   case CQ_REQ_STATUS_REMOTE_INVALID_REQUEST_ERR:
+   return IB_WC_REM_INV_REQ_ERR;
+   case CQ_REQ_STATUS_REMOTE_ACCESS_ERR:
+   return IB_WC_REM_ACCESS_ERR;
+   case CQ_REQ_STATUS_REMOTE_OPERATION_ERR:
+   return IB_WC_REM_OP_ERR;
+   case CQ_REQ_STATUS_RNR_NAK_RETRY_CNT_ERR:
+   return IB_WC_RNR_RETRY_EXC_ERR;
+   case CQ_REQ_STATUS_TRANSPORT_RETRY_CNT_ERR:
+   return IB_WC_RETRY_EXC_ERR;
+   case CQ_REQ_STATUS_WORK_REQUEST_FLUSHED_ERR:
+   return IB_WC_WR_FLUSH_ERR;
+   default:
+   return IB_WC_GENERAL_ERR;
+   }
+   return 0;
+}
+
+static u8 __rawqp1_to_ib_wc_status(u8 qstatus)
+{
+   switch (qstatus) {
+   case CQ_RES_RAWETH_QP1_STATUS_OK:
+   return IB_WC_SUCCESS;
+   case CQ_RES_RAWETH_QP1_STATUS_LOCAL_ACCESS_ERROR:
+   return IB_WC_LOC_ACCESS_ERR;
+   case CQ_RES_RAWETH_QP1_STATUS_HW_LOCAL_LENGTH_ERR:
+   return IB_WC_LOC_LEN_ERR;
+   case CQ_RES_RAWETH_QP1_STATUS_LOCAL_PROTECTION_ERR:
+   return IB_WC_LOC_PROT_ERR;
+   case CQ_RES_RAWETH_QP1_STATUS_LOCAL_QP_OPERATION_ERR:
+   return IB_WC_LOC_QP_OP_ERR;
+   case CQ_RES_RAWETH_QP1_STATUS_MEMORY_MGT_OPERATION_ERR:
+   return IB_WC_GENERAL_ERR;
+   case CQ_RES_RAWETH_QP1_STATUS_WORK_REQUEST_FLUSHED_ERR:
+   return IB_WC_WR_FLUSH_ERR;
+   case CQ_RES_RAWETH_QP1_STATUS_HW_FLUSH_ERR:
+   return IB_WC_WR_FLUSH_ERR;
+   default:
+   return IB_WC_GENERAL_ERR;
+   }
+}
+
+static u8 __rc_to_ib_wc_status(u8 qstatus)
+{
+   switch (qstatus) {
+   case CQ_RES_RC_STATUS_OK:
+   return IB_WC_SUCCESS;
+   case CQ_RES_RC_STATUS_LOCAL_ACCESS_ERROR:
+   return IB_WC_LOC_ACCESS_ERR;
+   case CQ_RES_RC_STATUS_LOCAL_LENGTH_ERR:
+   return IB_WC_LOC_LEN_ERR;
+   case CQ_RES_RC_STATUS_LOCAL_PROTECTION_ERR:
+   return IB_WC_LOC_PROT_ERR;
+   case CQ_RES_RC_STATUS_LOCAL_QP_OPERATION_ERR:
+   return IB_WC_LOC_QP_OP_ERR;
+   case CQ_RES_RC_STATUS_MEMORY_MGT_OPERATION_ERR:
+   return IB_WC_GENERAL_ERR;
+   case CQ_RES_RC_STATUS_REMOTE_INVALID_REQUEST_ERR:
+   return IB_WC_REM_INV_REQ_ERR;
+   case CQ_RES_RC_STATUS_WORK_REQUEST_FLUSHED_ERR:
+   return IB_WC_WR_FLUSH_ERR;
+   case CQ_RES_RC_STATUS_HW_FLUSH_ERR:
+   return IB_WC_WR_FLUSH_ERR;
+   default:
+   return IB_WC_GENERAL_ERR;
+   }
+}
+
+static void bnxt_re_process_req_wc(struct ib_wc *wc, struct bnxt_qplib_cqe 
*cqe)
+{
+   switch (cqe->type) {
+   case BNXT_QPLIB_SWQE_TYPE_SEND:
+   wc->opcode = IB_WC_SEND;
+   break;
+   case BNXT_QPLIB_SWQE_TYPE_SEND_WITH_IMM:
+   wc->opcode = IB_WC_SEND;
+   wc->wc_flags |= IB_WC_WITH_IMM;
+   break;
+   case BNXT_QPLIB_SWQE_TYPE_SEND_WITH_INV:
+   wc->opcode = IB_WC_SEND;
+   wc->wc_flags |= IB_WC_WITH_INVALIDATE;
+   break;
+   case BNXT_QPLIB_SWQE_TYPE_RDMA_WRITE:
+   wc->opcode = IB_WC_RDMA_WRITE;
+   break;
+   case BNXT_QPLIB_SWQE_TYPE_RDMA_WRITE_WITH_IMM:
+  

[PATCH iproute2 2/2] actions: Add support for user cookies

2017-02-10 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it.  The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim 
---
 include/linux/pkt_cls.h |  2 +-
 tc/m_action.c   | 44 ++--
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index fef68c4..af17f3c 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -343,7 +343,7 @@ enum {
TCA_BPF_NAME,
TCA_BPF_FLAGS,
TCA_BPF_FLAGS_GEN,
-   TCA_BPF_DIGEST,
+   TCA_BPF_TAG,
__TCA_BPF_MAX,
 };
 
diff --git a/tc/m_action.c b/tc/m_action.c
index bb19df8..00bc219 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -150,18 +150,19 @@ new_cmd(char **argv)
 
 }
 
-int
-parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
+int parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
 {
int argc = *argc_p;
char **argv = *argv_p;
struct rtattr *tail, *tail2;
char k[16];
+   int act_ck_len = 0;
int ok = 0;
int eap = 0; /* expect action parameters */
 
int ret = 0;
int prio = 0;
+   unsigned char act_ck[TC_COOKIE_MAX_SIZE];
 
if (argc <= 0)
return -1;
@@ -215,16 +216,39 @@ done0:
addattr_l(n, MAX_MSG, ++prio, NULL, 0);
addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
 
-   ret = a->parse_aopt(a, &argc, &argv, TCA_ACT_OPTIONS, 
n);
+   ret = a->parse_aopt(a, &argc, &argv, TCA_ACT_OPTIONS,
+   n);
 
if (ret < 0) {
fprintf(stderr, "bad action parsing\n");
goto bad_val;
}
+
+   if (*argv && strcmp(*argv, "cookie") == 0) {
+   int slen;
+
+   NEXT_ARG();
+   slen = strlen(*argv);
+   if (slen > (TC_COOKIE_MAX_SIZE*2))
+   invarg("cookie cannot exceed %d\n",
+  *argv);
+
+   if (hex2mem(*argv, act_ck, slen/2) < 0)
+   invarg("cookie must be a hex string\n",
+  *argv);
+
+   act_ck_len = slen;
+   argc--;
+   argv++;
+   }
+
+   if (act_ck_len)
+   addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
+ (const void *)&act_ck, act_ck_len);
+
tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
ok++;
}
-
}
 
if (eap > 0) {
@@ -245,8 +269,7 @@ bad_val:
return -1;
 }
 
-static int
-tc_print_one_action(FILE *f, struct rtattr *arg)
+static int tc_print_one_action(FILE *f, struct rtattr *arg)
 {
 
struct rtattr *tb[TCA_ACT_MAX + 1];
@@ -274,8 +297,17 @@ tc_print_one_action(FILE *f, struct rtattr *arg)
return err;
 
if (show_stats && tb[TCA_ACT_STATS]) {
+
fprintf(f, "\tAction statistics:\n");
print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL);
+   if (tb[TCA_ACT_COOKIE]) {
+   int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]);
+   char b1[strsz+1];
+
+   fprintf(f, "\n\tcookie len %d %s ", strsz,
+   hexstring_n2a(RTA_DATA(tb[TCA

[PATCH V5 for-next 04/21] RDMA/bnxt_re: Enabling RoCE control path

2017-02-10 Thread Selvin Xavier
This patch covers the basic initialization of the HW interface.
Implements some of the slow path FW commands required for the
HW intialization. It also handles registration with the IB stack.

v2: Fix some of the sparse warnings
v3: Removed some more smatch and sparse warnings related to endianness.
Fixes cross compilation failure warnings. Also, fixes the retry logic
to avoid system hangs in case of delay or no response for the FW commands.
v4: Changes header file names
v5: Updating the PF FW communication channel offset as per the latest FW changes

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h|  15 +
 drivers/infiniband/hw/bnxt_re/main.c   | 424 -
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 608 
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 189 
 drivers/infiniband/hw/bnxt_re/qplib_res.c  | 738 +
 drivers/infiniband/hw/bnxt_re/qplib_res.h  | 165 +++
 drivers/infiniband/hw/bnxt_re/qplib_sp.c   | 163 +++
 drivers/infiniband/hw/bnxt_re/qplib_sp.h   |  42 ++
 8 files changed, 2343 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index 8ff2787..cbc2fb2 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -44,6 +44,11 @@
 
 #define BNXT_RE_DESC   "Broadcom NetXtreme-C/E RoCE Driver"
 
+#define BNXT_RE_MAX_QPC_COUNT  (64 * 1024)
+#define BNXT_RE_MAX_MRW_COUNT  (64 * 1024)
+#define BNXT_RE_MAX_SRQC_COUNT (64 * 1024)
+#define BNXT_RE_MAX_CQ_COUNT   (64 * 1024)
+
 struct bnxt_re_work {
struct work_struct  work;
unsigned long   event;
@@ -53,6 +58,7 @@ struct bnxt_re_work {
 
 #define BNXT_RE_MIN_MSIX   2
 #define BNXT_RE_MAX_MSIX   16
+#define BNXT_RE_AEQ_IDX0
 struct bnxt_re_dev {
struct ib_deviceibdev;
struct list_headlist;
@@ -70,6 +76,15 @@ struct bnxt_re_dev {
 
int id;
 
+   /* RCFW Channel */
+   struct bnxt_qplib_rcfw  rcfw;
+
+   /* Device Resources */
+   struct bnxt_qplib_dev_attr  dev_attr;
+   struct bnxt_qplib_ctx   qplib_ctx;
+   struct bnxt_qplib_res   qplib_res;
+   struct bnxt_qplib_dpi   dpi_privileged;
+
atomic_tqp_count;
struct mutexqp_lock;/* protect qp list */
struct list_headqp_list;
diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index eb3dc81..6fdf726 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -54,6 +54,10 @@
 
 #include "bnxt_ulp.h"
 #include "roce_hsi.h"
+#include "qplib_res.h"
+#include "qplib_sp.h"
+#include "qplib_fp.h"
+#include "qplib_rcfw.h"
 #include "bnxt_re.h"
 #include "bnxt.h"
 static char version[] =
@@ -181,6 +185,160 @@ static int bnxt_re_request_msix(struct bnxt_re_dev *rdev)
return rc;
 }
 
+static void bnxt_re_init_hwrm_hdr(struct bnxt_re_dev *rdev, struct input *hdr,
+ u16 opcd, u16 crid, u16 trid)
+{
+   hdr->req_type = cpu_to_le16(opcd);
+   hdr->cmpl_ring = cpu_to_le16(crid);
+   hdr->target_id = cpu_to_le16(trid);
+}
+
+static void bnxt_re_fill_fw_msg(struct bnxt_fw_msg *fw_msg, void *msg,
+   int msg_len, void *resp, int resp_max_len,
+   int timeout)
+{
+   fw_msg->msg = msg;
+   fw_msg->msg_len = msg_len;
+   fw_msg->resp = resp;
+   fw_msg->resp_max_len = resp_max_len;
+   fw_msg->timeout = timeout;
+}
+
+static int bnxt_re_net_ring_free(struct bnxt_re_dev *rdev, u16 fw_ring_id,
+bool lock_wait)
+{
+   struct bnxt_en_dev *en_dev = rdev->en_dev;
+   struct hwrm_ring_free_input req = {0};
+   struct hwrm_ring_free_output resp;
+   struct bnxt_fw_msg fw_msg;
+   bool do_unlock = false;
+   int rc = -EINVAL;
+
+   if (!en_dev)
+   return rc;
+
+   memset(&fw_msg, 0, sizeof(fw_msg));
+   if (lock_wait) {
+   rtnl_lock();
+   do_unlock = true;
+   }
+
+   bnxt_re_init_hwrm_hdr(rdev, (void *)&req, HWRM_RING_FREE, -1, -1);
+   req.ring_type = RING_ALLOC_REQ_RING_TYPE_CMPL;
+   req.ring_id = cpu_to_le16(fw_ring_id);
+   bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
+   sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
+   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+   if (rc)
+   dev_err(rdev_to_dev(rdev),
+   

[PATCH V5 for-next 20/21] RDMA/bnxt_re: Add QP event handling

2017-02-10 Thread Selvin Xavier
Implements callback handler for processing Async events related to a QP.
This patch also implements the control path command completion handling.

v3: Removes unwanted braces
v5: Added a debug print QP error notification

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c 
b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 6040786..23fb726 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -257,6 +257,46 @@ static int bnxt_qplib_process_func_event(struct 
bnxt_qplib_rcfw *rcfw,
return 0;
 }
 
+static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
+  struct creq_qp_event *qp_event)
+{
+   struct bnxt_qplib_crsq *crsq = &rcfw->crsq;
+   struct bnxt_qplib_hwq *cmdq = &rcfw->cmdq;
+   struct bnxt_qplib_crsqe *crsqe;
+   u16 cbit, cookie, blocked = 0;
+   unsigned long flags;
+   u32 sw_cons;
+
+   switch (qp_event->event) {
+   case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
+   dev_dbg(&rcfw->pdev->dev,
+   "QPLIB: Received QP error notification");
+   break;
+   default:
+   /* Command Response */
+   spin_lock_irqsave(&cmdq->lock, flags);
+   sw_cons = HWQ_CMP(crsq->cons, crsq);
+   crsqe = &crsq->crsq[sw_cons];
+   crsq->cons++;
+   memcpy(&crsqe->qp_event, qp_event, sizeof(crsqe->qp_event));
+
+   cookie = le16_to_cpu(crsqe->qp_event.cookie);
+   blocked = cookie & RCFW_CMD_IS_BLOCKING;
+   cookie &= RCFW_MAX_COOKIE_VALUE;
+   cbit = cookie % RCFW_MAX_OUTSTANDING_CMD;
+   if (!test_and_clear_bit(cbit, rcfw->cmdq_bitmap))
+   dev_warn(&rcfw->pdev->dev,
+"QPLIB: CMD bit %d was not requested", cbit);
+
+   cmdq->cons += crsqe->req_size;
+   spin_unlock_irqrestore(&cmdq->lock, flags);
+   if (!blocked)
+   wake_up(&rcfw->waitq);
+   break;
+   }
+   return 0;
+}
+
 /* SP - CREQ Completion handlers */
 static void bnxt_qplib_service_creq(unsigned long data)
 {
@@ -280,6 +320,15 @@ static void bnxt_qplib_service_creq(unsigned long data)
type = creqe->type & CREQ_BASE_TYPE_MASK;
switch (type) {
case CREQ_BASE_TYPE_QP_EVENT:
+   if (!bnxt_qplib_process_qp_event
+   (rcfw, (struct creq_qp_event *)creqe))
+   rcfw->creq_qp_event_processed++;
+   else {
+   dev_warn(&rcfw->pdev->dev, "QPLIB: crsqe with");
+   dev_warn(&rcfw->pdev->dev,
+"QPLIB: type = 0x%x not handled",
+type);
+   }
break;
case CREQ_BASE_TYPE_FUNC_EVENT:
if (!bnxt_qplib_process_func_event
-- 
2.5.5



[PATCH V5 for-next 00/21] Broadcom RoCE Driver (bnxt_re)

2017-02-10 Thread Selvin Xavier
This series introduces the RoCE driver for the Broadcom
NetXtreme-E 10/25/40/50G RoCE HCAs.
This driver is dependent on the bnxt_en NIC driver and is
based on the for-4.11 branch in linux-rdma repository.
bnxt_en changes required for this patch series are already
available afore mentioned branch.

These changes are available for your reference in
the bnxt_re_v5 branch of following repository.
https://github.com/Broadcom/linux-rdma-nxt/

Doug,
This patchset addresses review comments from you and Leon.
This series also includes some changes required for the latest FW.
Please consider applying this to linux-rdma tree.

Thanks,
Selvin Xavier

v4->v5:
  * Removes rdev ref_count as this is not necessary. 
  * Adds a check to see if the device is supporting RoCE
  * Updating the PF FW communication channel offset as per
the latest FW changes
  * Use min_t macro to calculate the number CQ and QP entries
  * Adds ib_udata parameter to create_ah verb
  * Uses ETH_P_IBOE macro for RoCE ethertype
  * Code refactoring based on the review comments from Leon

v3->v4:
  * Changes driver folder name to bnxt_re and remove the bnxt_re/bnxt
prefix from the individual files inside the driver folder.
  * Changes the file name bnxtre-abi to bnxt_re-abi.h to align with
convention -abi.h
  * Updates the Makefile and Kconfig file with new format change.

v2->v3:
  * Fix 0day build breakage
  * Fix cocci, kbuild robot, sparse, smatch and checkpatch warnings
  * Changed the filename bnxt_re_uverbs_abi.h  to bnxtre-abi.h
  * Removed the __packed qualifier from the uverbs structure and adjusted
the structure alignment to 64bits.
  * Added retry count to bail out in case of delayed or no response
to FW commands
  * Removed the debugfs support from this patch series
  * Changed some of the defines as inline functions based on Jason's comment
  * Split two functions to get rid of switch within switch construct
  * Removed bnxt_re_copy_to_udata as it is just a wrapper for ib_copy_to_udata
  * Added maintainers information to MAINTAINERS file

v1-> v2:
  * The license text in each file updated to reflect Dual license.
  * Makefile and Kconfig changes are pushed to the last patch
  * Moved bnxt_re_uverbs_abi.h to include/uapi/rdma folder
  * Remove duplicate structure definitions from bnxt_re_hsi.h as
it is available in the corresponding bnxt_en header file (bnxt_hsi.h)
  * Removed some unused code reported during code review.
  * Fixed few sparse warnings

Selvin Xavier (21):
  RDMA/bnxt_re: Add bnxt_re RoCE driver files
  RDMA/bnxt_re: Introducing autogenerated Host Software Interface(hsi)
file
  RDMA/bnxt_re: register with the NIC driver
  RDMA/bnxt_re: Enabling RoCE control path
  RDMA/bnxt_re: Adding Notification Queue support
  RDMA/bnxt_re: Support for PD, ucontext and mmap verbs
  RDMA/bnxt_re: Support for query and modify device verbs
  RDMA/bnxt_re: Adding support for port related verbs
  RDMA/bnxt_re: Support for GID related verbs
  RDMA/bnxt_re: Support for CQ verbs
  RDMA/bnxt_re: Support for AH verbs
  RDMA/bnxt_re: Support memory registration verbs
  RDMA/bnxt_re: Support QP verbs
  RDMA/bnxt_re: Support post_send verb
  RDMA/bnxt_re: Support post_recv
  RDMA/bnxt_re: Support poll_cq verb
  RDMA/bnxt_re: Handling dispatching of events to IB stack
  RDMA/bnxt_re: Support for DCB
  RDMA/bnxt_re: Set uverbs command mask
  RDMA/bnxt_re: Add QP event handling
  RDMA/bnxt_re: Add bnxt_re driver build support

 MAINTAINERS|   11 +
 drivers/infiniband/Kconfig |2 +
 drivers/infiniband/hw/Makefile |1 +
 drivers/infiniband/hw/bnxt_re/Kconfig  |9 +
 drivers/infiniband/hw/bnxt_re/Makefile |6 +
 drivers/infiniband/hw/bnxt_re/bnxt_re.h|  146 ++
 drivers/infiniband/hw/bnxt_re/ib_verbs.c   | 3202 
 drivers/infiniband/hw/bnxt_re/ib_verbs.h   |  197 ++
 drivers/infiniband/hw/bnxt_re/main.c   | 1315 
 drivers/infiniband/hw/bnxt_re/qplib_fp.c   | 2167 +++
 drivers/infiniband/hw/bnxt_re/qplib_fp.h   |  439 
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |  694 ++
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h |  231 ++
 drivers/infiniband/hw/bnxt_re/qplib_res.c  |  825 +++
 drivers/infiniband/hw/bnxt_re/qplib_res.h  |  223 ++
 drivers/infiniband/hw/bnxt_re/qplib_sp.c   |  838 
 drivers/infiniband/hw/bnxt_re/qplib_sp.h   |  160 ++
 drivers/infiniband/hw/bnxt_re/roce_hsi.h   | 2821 
 include/uapi/rdma/bnxt_re-abi.h|   89 +
 19 files changed, 13376 insertions(+)
 create mode 100644 drivers/infiniband/hw/bnxt_re/Kconfig
 create mode 100644 drivers/infiniband/hw/bnxt_re/Makefile
 create mode 100644 drivers/infiniband/hw/bnxt_re/bnxt_re.h
 create mode 100644 drivers/infiniband/hw/bnxt_re/ib_verbs.c
 create mode 100644 drivers/infiniband/hw/bnxt_re/ib_verbs.h
 create mode 100644 drivers/infiniband/hw/bnxt_re/main.c
 create mode 10064

[PATCH V5 for-next 10/21] RDMA/bnxt_re: Support for CQ verbs

2017-02-10 Thread Selvin Xavier
Implements support for create_cq, destroy_cq and req_notify_cq
verbs.

v3: Code cleanup based on errors reported by sparse on endianness check.
Removes unwanted macros.
v5: Use min_t macro to calculate the CQ entries

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 144 
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |  19 
 drivers/infiniband/hw/bnxt_re/main.c |   4 +
 drivers/infiniband/hw/bnxt_re/qplib_fp.c | 181 +++
 drivers/infiniband/hw/bnxt_re/qplib_fp.h |  50 +
 include/uapi/rdma/bnxt_re-abi.h  |  12 ++
 6 files changed, 410 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 1c9e1f4..f85d4c4 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -492,6 +492,150 @@ struct ib_pd *bnxt_re_alloc_pd(struct ib_device *ibdev,
return ERR_PTR(rc);
 }
 
+/* Completion Queues */
+int bnxt_re_destroy_cq(struct ib_cq *ib_cq)
+{
+   struct bnxt_re_cq *cq = container_of(ib_cq, struct bnxt_re_cq, ib_cq);
+   struct bnxt_re_dev *rdev = cq->rdev;
+   int rc;
+
+   rc = bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev), "Failed to destroy HW CQ");
+   return rc;
+   }
+   if (cq->umem && !IS_ERR(cq->umem))
+   ib_umem_release(cq->umem);
+
+   if (cq) {
+   kfree(cq->cql);
+   kfree(cq);
+   }
+   atomic_dec(&rdev->cq_count);
+   rdev->nq.budget--;
+   return 0;
+}
+
+struct ib_cq *bnxt_re_create_cq(struct ib_device *ibdev,
+   const struct ib_cq_init_attr *attr,
+   struct ib_ucontext *context,
+   struct ib_udata *udata)
+{
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+   struct bnxt_qplib_dev_attr *dev_attr = &rdev->dev_attr;
+   struct bnxt_re_cq *cq = NULL;
+   int rc, entries;
+   int cqe = attr->cqe;
+
+   /* Validate CQ fields */
+   if (cqe < 1 || cqe > dev_attr->max_cq_wqes) {
+   dev_err(rdev_to_dev(rdev), "Failed to create CQ -max exceeded");
+   return ERR_PTR(-EINVAL);
+   }
+   cq = kzalloc(sizeof(*cq), GFP_KERNEL);
+   if (!cq)
+   return ERR_PTR(-ENOMEM);
+
+   cq->rdev = rdev;
+   cq->qplib_cq.cq_handle = (u64)(unsigned long)(&cq->qplib_cq);
+
+   entries = roundup_pow_of_two(cqe + 1);
+   if (entries > dev_attr->max_cq_wqes + 1)
+   entries = dev_attr->max_cq_wqes + 1;
+
+   if (context) {
+   struct bnxt_re_cq_req req;
+   struct bnxt_re_ucontext *uctx = container_of
+   (context,
+struct bnxt_re_ucontext,
+ib_uctx);
+   if (ib_copy_from_udata(&req, udata, sizeof(req))) {
+   rc = -EFAULT;
+   goto fail;
+   }
+
+   cq->umem = ib_umem_get(context, req.cq_va,
+  entries * sizeof(struct cq_base),
+  IB_ACCESS_LOCAL_WRITE, 1);
+   if (IS_ERR(cq->umem)) {
+   rc = PTR_ERR(cq->umem);
+   goto fail;
+   }
+   cq->qplib_cq.sghead = cq->umem->sg_head.sgl;
+   cq->qplib_cq.nmap = cq->umem->nmap;
+   cq->qplib_cq.dpi = uctx->dpi;
+   } else {
+   cq->max_cql = min_t(u32, entries, MAX_CQL_PER_POLL);
+   cq->cql = kcalloc(cq->max_cql, sizeof(struct bnxt_qplib_cqe),
+ GFP_KERNEL);
+   if (!cq->cql) {
+   rc = -ENOMEM;
+   goto fail;
+   }
+
+   cq->qplib_cq.dpi = &rdev->dpi_privileged;
+   cq->qplib_cq.sghead = NULL;
+   cq->qplib_cq.nmap = 0;
+   }
+   cq->qplib_cq.max_wqe = entries;
+   cq->qplib_cq.cnq_hw_ring_id = rdev->nq.ring_id;
+
+   rc = bnxt_qplib_create_cq(&rdev->qplib_res, &cq->qplib_cq);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev), "Failed to create HW CQ");
+   goto fail;
+   }
+
+   cq->ib_cq.cqe = entries;
+   cq->cq_period = cq->qplib_cq.period;
+   rdev->nq.budget++;
+
+   atomic_inc(&rdev->cq_count);
+
+   if (context) {
+   struct bnxt_re_cq_resp resp;
+
+   resp.cqid = cq->qplib_cq.id;
+   resp.tail = cq->qplib_cq.hwq.cons;
+   resp.phase = cq->qplib_cq.period;
+   resp.rsvd = 0;
+   rc 

[PATCH V5 for-next 06/21] RDMA/bnxt_re: Support for PD, ucontext and mmap verbs

2017-02-10 Thread Selvin Xavier
This patch includes the uverbs ABI header file to enable user verbs.
Also, adds support for the Protection Domain, User Context and mmap
verbs.

v2: Moved the bnxt_re_uverbs_abi.h file to include/uapi/rdma folder.
Also, Fixed one sparse warning.
v3: Fixes some cross compile and sparse warnings. Changes the
bnxt_re_uverbs_abi.h file name to bnxtre-abi.h. Aligns the data
structure to 64 bit and eliminates __packed qualifier from
bnxtre-abi.h
v4: Changes the bnxtre_abi.h file name to bnxt_re-abi.h to follow
modulename-abi.h format. Also, modifies the include file name.
v5: Changes the year in copyright text

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  | 218 ++
 drivers/infiniband/hw/bnxt_re/ib_verbs.h  |  23 
 drivers/infiniband/hw/bnxt_re/main.c  |   7 +
 drivers/infiniband/hw/bnxt_re/qplib_res.c |  28 
 drivers/infiniband/hw/bnxt_re/qplib_res.h |   6 +
 drivers/infiniband/hw/bnxt_re/qplib_sp.h  |   4 +
 include/uapi/rdma/bnxt_re-abi.h   |  59 
 7 files changed, 345 insertions(+)
 create mode 100644 include/uapi/rdma/bnxt_re-abi.h

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 279c353..6589a41 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -35,3 +35,221 @@
  *
  * Description: IB Verbs interpreter
  */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "bnxt_ulp.h"
+
+#include "roce_hsi.h"
+#include "qplib_res.h"
+#include "qplib_sp.h"
+#include "qplib_fp.h"
+#include "qplib_rcfw.h"
+
+#include "bnxt_re.h"
+#include "ib_verbs.h"
+#include 
+
+/* Protection Domains */
+int bnxt_re_dealloc_pd(struct ib_pd *ib_pd)
+{
+   struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
+   struct bnxt_re_dev *rdev = pd->rdev;
+   int rc;
+
+   if (ib_pd->uobject && pd->dpi.dbr) {
+   struct ib_ucontext *ib_uctx = ib_pd->uobject->context;
+   struct bnxt_re_ucontext *ucntx;
+
+   /* Free DPI only if this is the first PD allocated by the
+* application and mark the context dpi as NULL
+*/
+   ucntx = container_of(ib_uctx, struct bnxt_re_ucontext, ib_uctx);
+
+   rc = bnxt_qplib_dealloc_dpi(&rdev->qplib_res,
+   &rdev->qplib_res.dpi_tbl,
+   &pd->dpi);
+   if (rc)
+   dev_err(rdev_to_dev(rdev), "Failed to deallocate HW 
DPI");
+   /* Don't fail, continue*/
+   ucntx->dpi = NULL;
+   }
+
+   rc = bnxt_qplib_dealloc_pd(&rdev->qplib_res,
+  &rdev->qplib_res.pd_tbl,
+  &pd->qplib_pd);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev), "Failed to deallocate HW PD");
+   return rc;
+   }
+
+   kfree(pd);
+   return 0;
+}
+
+struct ib_pd *bnxt_re_alloc_pd(struct ib_device *ibdev,
+  struct ib_ucontext *ucontext,
+  struct ib_udata *udata)
+{
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+   struct bnxt_re_ucontext *ucntx = container_of(ucontext,
+ struct bnxt_re_ucontext,
+ ib_uctx);
+   struct bnxt_re_pd *pd;
+   int rc;
+
+   pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+   if (!pd)
+   return ERR_PTR(-ENOMEM);
+
+   pd->rdev = rdev;
+   if (bnxt_qplib_alloc_pd(&rdev->qplib_res.pd_tbl, &pd->qplib_pd)) {
+   dev_err(rdev_to_dev(rdev), "Failed to allocate HW PD");
+   rc = -ENOMEM;
+   goto fail;
+   }
+
+   if (udata) {
+   struct bnxt_re_pd_resp resp;
+
+   if (!ucntx->dpi) {
+   /* Allocate DPI in alloc_pd to avoid failing of
+* ibv_devinfo and family of application when DPIs
+* are depleted.
+*/
+   if (bnxt_qplib_alloc_dpi(&rdev->qplib_res.dpi_tbl,
+&pd->dpi, ucntx)) {
+   rc = -ENOMEM;
+   goto dbfail;
+   }
+   ucntx->dpi = &pd->dpi;
+   }
+
+   resp.pdid = pd->qplib_pd.id;
+   /* Still allow mapping this DBR to the new user PD. */
+   resp.dpi = ucntx->dpi->dpi;
+   resp.dbr = (u64)ucntx->dpi->umdbr;
+
+   rc = ib_copy_to_udata(udata, &resp, siz

[PATCH V5 for-next 19/21] RDMA/bnxt_re: Set uverbs command mask

2017-02-10 Thread Selvin Xavier
This patch exports available uverbs command mask to the IB stack.
Also, populates some of the missing parameters in the ibdev structure
used for registration.

v4: Changes the include file names

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/main.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index 92a217b..6b9f117 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -62,6 +62,7 @@
 #include "qplib_rcfw.h"
 #include "bnxt_re.h"
 #include "ib_verbs.h"
+#include 
 #include "bnxt.h"
 static char version[] =
BNXT_RE_DESC " v" ROCE_DRV_MODULE_VERSION "\n";
@@ -432,8 +433,42 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
strlen(BNXT_RE_DESC) + 5);
ibdev->phys_port_cnt = 1;
 
+   bnxt_qplib_get_guid(rdev->netdev->dev_addr, (u8 *)&ibdev->node_guid);
+
ibdev->num_comp_vectors = 1;
ibdev->dma_device = &rdev->en_dev->pdev->dev;
+   ibdev->local_dma_lkey = BNXT_QPLIB_RSVD_LKEY;
+
+   /* User space */
+   ibdev->uverbs_abi_ver = BNXT_RE_ABI_VERSION;
+   ibdev->uverbs_cmd_mask =
+   (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) |
+   (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE)|
+   (1ull << IB_USER_VERBS_CMD_QUERY_PORT)  |
+   (1ull << IB_USER_VERBS_CMD_ALLOC_PD)|
+   (1ull << IB_USER_VERBS_CMD_DEALLOC_PD)  |
+   (1ull << IB_USER_VERBS_CMD_REG_MR)  |
+   (1ull << IB_USER_VERBS_CMD_REREG_MR)|
+   (1ull << IB_USER_VERBS_CMD_DEREG_MR)|
+   (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) |
+   (1ull << IB_USER_VERBS_CMD_CREATE_CQ)   |
+   (1ull << IB_USER_VERBS_CMD_RESIZE_CQ)   |
+   (1ull << IB_USER_VERBS_CMD_DESTROY_CQ)  |
+   (1ull << IB_USER_VERBS_CMD_CREATE_QP)   |
+   (1ull << IB_USER_VERBS_CMD_MODIFY_QP)   |
+   (1ull << IB_USER_VERBS_CMD_QUERY_QP)|
+   (1ull << IB_USER_VERBS_CMD_DESTROY_QP)  |
+   (1ull << IB_USER_VERBS_CMD_CREATE_SRQ)  |
+   (1ull << IB_USER_VERBS_CMD_MODIFY_SRQ)  |
+   (1ull << IB_USER_VERBS_CMD_QUERY_SRQ)   |
+   (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ) |
+   (1ull << IB_USER_VERBS_CMD_CREATE_AH)   |
+   (1ull << IB_USER_VERBS_CMD_MODIFY_AH)   |
+   (1ull << IB_USER_VERBS_CMD_QUERY_AH)|
+   (1ull << IB_USER_VERBS_CMD_DESTROY_AH);
+   /* POLL_CQ and REQ_NOTIFY_CQ is directly handled in libbnxt_re */
+
+   /* Kernel verbs */
ibdev->query_device = bnxt_re_query_device;
ibdev->modify_device= bnxt_re_modify_device;
 
-- 
2.5.5



[PATCH V5 for-next 12/21] RDMA/bnxt_re: Support memory registration verbs

2017-02-10 Thread Selvin Xavier
This patch implements the kernel and user memory region registration
supported by the bnxt_re driver.
This includes the user MR, FRMR, FMR and DMA MR support.

v3: Moves bnxt_qplib_map_tc2cos to DCB patch as the functionality is used
by DCB patch. Also, removed some unwanted macros.

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 376 +++
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |  44 
 drivers/infiniband/hw/bnxt_re/main.c |  11 +
 drivers/infiniband/hw/bnxt_re/qplib_sp.c | 289 
 drivers/infiniband/hw/bnxt_re/qplib_sp.h |  41 
 5 files changed, 761 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 18580e7..f43469e 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -640,6 +640,48 @@ int bnxt_re_query_ah(struct ib_ah *ib_ah, struct 
ib_ah_attr *ah_attr)
return 0;
 }
 
+static int __from_ib_access_flags(int iflags)
+{
+   int qflags = 0;
+
+   if (iflags & IB_ACCESS_LOCAL_WRITE)
+   qflags |= BNXT_QPLIB_ACCESS_LOCAL_WRITE;
+   if (iflags & IB_ACCESS_REMOTE_READ)
+   qflags |= BNXT_QPLIB_ACCESS_REMOTE_READ;
+   if (iflags & IB_ACCESS_REMOTE_WRITE)
+   qflags |= BNXT_QPLIB_ACCESS_REMOTE_WRITE;
+   if (iflags & IB_ACCESS_REMOTE_ATOMIC)
+   qflags |= BNXT_QPLIB_ACCESS_REMOTE_ATOMIC;
+   if (iflags & IB_ACCESS_MW_BIND)
+   qflags |= BNXT_QPLIB_ACCESS_MW_BIND;
+   if (iflags & IB_ZERO_BASED)
+   qflags |= BNXT_QPLIB_ACCESS_ZERO_BASED;
+   if (iflags & IB_ACCESS_ON_DEMAND)
+   qflags |= BNXT_QPLIB_ACCESS_ON_DEMAND;
+   return qflags;
+};
+
+static enum ib_access_flags __to_ib_access_flags(int qflags)
+{
+   enum ib_access_flags iflags = 0;
+
+   if (qflags & BNXT_QPLIB_ACCESS_LOCAL_WRITE)
+   iflags |= IB_ACCESS_LOCAL_WRITE;
+   if (qflags & BNXT_QPLIB_ACCESS_REMOTE_WRITE)
+   iflags |= IB_ACCESS_REMOTE_WRITE;
+   if (qflags & BNXT_QPLIB_ACCESS_REMOTE_READ)
+   iflags |= IB_ACCESS_REMOTE_READ;
+   if (qflags & BNXT_QPLIB_ACCESS_REMOTE_ATOMIC)
+   iflags |= IB_ACCESS_REMOTE_ATOMIC;
+   if (qflags & BNXT_QPLIB_ACCESS_MW_BIND)
+   iflags |= IB_ACCESS_MW_BIND;
+   if (qflags & BNXT_QPLIB_ACCESS_ZERO_BASED)
+   iflags |= IB_ZERO_BASED;
+   if (qflags & BNXT_QPLIB_ACCESS_ON_DEMAND)
+   iflags |= IB_ACCESS_ON_DEMAND;
+   return iflags;
+};
+
 /* Completion Queues */
 int bnxt_re_destroy_cq(struct ib_cq *ib_cq)
 {
@@ -784,6 +826,340 @@ int bnxt_re_req_notify_cq(struct ib_cq *ib_cq,
return 0;
 }
 
+/* Memory Regions */
+struct ib_mr *bnxt_re_get_dma_mr(struct ib_pd *ib_pd, int mr_access_flags)
+{
+   struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
+   struct bnxt_re_dev *rdev = pd->rdev;
+   struct bnxt_re_mr *mr;
+   u64 pbl = 0;
+   int rc;
+
+   mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+   if (!mr)
+   return ERR_PTR(-ENOMEM);
+
+   mr->rdev = rdev;
+   mr->qplib_mr.pd = &pd->qplib_pd;
+   mr->qplib_mr.flags = __from_ib_access_flags(mr_access_flags);
+   mr->qplib_mr.type = CMDQ_ALLOCATE_MRW_MRW_FLAGS_PMR;
+
+   /* Allocate and register 0 as the address */
+   rc = bnxt_qplib_alloc_mrw(&rdev->qplib_res, &mr->qplib_mr);
+   if (rc)
+   goto fail;
+
+   mr->qplib_mr.hwq.level = PBL_LVL_MAX;
+   mr->qplib_mr.total_size = -1; /* Infinte length */
+   rc = bnxt_qplib_reg_mr(&rdev->qplib_res, &mr->qplib_mr, &pbl, 0, false);
+   if (rc)
+   goto fail_mr;
+
+   mr->ib_mr.lkey = mr->qplib_mr.lkey;
+   if (mr_access_flags & (IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ |
+  IB_ACCESS_REMOTE_ATOMIC))
+   mr->ib_mr.rkey = mr->ib_mr.lkey;
+   atomic_inc(&rdev->mr_count);
+
+   return &mr->ib_mr;
+
+fail_mr:
+   bnxt_qplib_free_mrw(&rdev->qplib_res, &mr->qplib_mr);
+fail:
+   kfree(mr);
+   return ERR_PTR(rc);
+}
+
+int bnxt_re_dereg_mr(struct ib_mr *ib_mr)
+{
+   struct bnxt_re_mr *mr = container_of(ib_mr, struct bnxt_re_mr, ib_mr);
+   struct bnxt_re_dev *rdev = mr->rdev;
+   int rc = 0;
+
+   if (mr->npages && mr->pages) {
+   rc = bnxt_qplib_free_fast_reg_page_list(&rdev->qplib_res,
+   &mr->qplib_frpl);
+   kfree(mr->pages);
+   mr->npages = 0;
+   mr->pages = NULL;
+   }
+   rc = bnxt_qplib_free_mrw(&rdev->qplib_res, &mr->qplib_mr);
+
+   if (!IS_ERR(mr->ib_umem) && mr->ib_umem)
+   ib_umem_release(mr->ib_um

[PATCH V5 for-next 13/21] RDMA/bnxt_re: Support QP verbs

2017-02-10 Thread Selvin Xavier
This patch implements create_qp, destroy_qp, query_qp and modify_qp verbs.

v2: Fixed sparse warnings
v3: Splits __filter_modify_flags function to avoid nested switch cases.
Removes unwanted macros. Also, fix the endianness related issues
reported by sparse.
v5: Use kernel macros to find min value

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h  |  14 +
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 760 +++
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |  21 +
 drivers/infiniband/hw/bnxt_re/main.c |   6 +
 drivers/infiniband/hw/bnxt_re/qplib_fp.c | 862 +++
 drivers/infiniband/hw/bnxt_re/qplib_fp.h | 272 ++
 include/uapi/rdma/bnxt_re-abi.h  |  11 +
 7 files changed, 1946 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index 51ad6c2..9c7a53c 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -63,6 +63,14 @@ struct bnxt_re_work {
struct net_device   *vlan_dev;
 };
 
+struct bnxt_re_sqp_entries {
+   struct bnxt_qplib_sge sge;
+   u64 wrid;
+   /* For storing the actual qp1 cqe */
+   struct bnxt_qplib_cqe cqe;
+   struct bnxt_re_qp *qp1_qp;
+};
+
 #define BNXT_RE_MIN_MSIX   2
 #define BNXT_RE_MAX_MSIX   16
 #define BNXT_RE_AEQ_IDX0
@@ -110,6 +118,12 @@ struct bnxt_re_dev {
atomic_tmw_count;
/* Max of 2 lossless traffic class supported per port */
u16 cosq[2];
+
+   /* QP for for handling QP1 packets */
+   u32 sqp_id;
+   struct bnxt_re_qp   *qp1_sqp;
+   struct bnxt_re_ah   *sqp_ah;
+   struct bnxt_re_sqp_entries sqp_tbl[1024];
 };
 
 #define to_bnxt_re_dev(ptr, member)\
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index f43469e..d4d50e5 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -640,6 +640,482 @@ int bnxt_re_query_ah(struct ib_ah *ib_ah, struct 
ib_ah_attr *ah_attr)
return 0;
 }
 
+/* Queue Pairs */
+int bnxt_re_destroy_qp(struct ib_qp *ib_qp)
+{
+   struct bnxt_re_qp *qp = container_of(ib_qp, struct bnxt_re_qp, ib_qp);
+   struct bnxt_re_dev *rdev = qp->rdev;
+   int rc;
+
+   rc = bnxt_qplib_destroy_qp(&rdev->qplib_res, &qp->qplib_qp);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev), "Failed to destroy HW QP");
+   return rc;
+   }
+   if (ib_qp->qp_type == IB_QPT_GSI && rdev->qp1_sqp) {
+   rc = bnxt_qplib_destroy_ah(&rdev->qplib_res,
+  &rdev->sqp_ah->qplib_ah);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev),
+   "Failed to destroy HW AH for shadow QP");
+   return rc;
+   }
+
+   rc = bnxt_qplib_destroy_qp(&rdev->qplib_res,
+  &rdev->qp1_sqp->qplib_qp);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev),
+   "Failed to destroy Shadow QP");
+   return rc;
+   }
+   mutex_lock(&rdev->qp_lock);
+   list_del(&rdev->qp1_sqp->list);
+   atomic_dec(&rdev->qp_count);
+   mutex_unlock(&rdev->qp_lock);
+
+   kfree(rdev->sqp_ah);
+   kfree(rdev->qp1_sqp);
+   }
+
+   if (qp->rumem && !IS_ERR(qp->rumem))
+   ib_umem_release(qp->rumem);
+   if (qp->sumem && !IS_ERR(qp->sumem))
+   ib_umem_release(qp->sumem);
+
+   mutex_lock(&rdev->qp_lock);
+   list_del(&qp->list);
+   atomic_dec(&rdev->qp_count);
+   mutex_unlock(&rdev->qp_lock);
+   kfree(qp);
+   return 0;
+}
+
+static u8 __from_ib_qp_type(enum ib_qp_type type)
+{
+   switch (type) {
+   case IB_QPT_GSI:
+   return CMDQ_CREATE_QP1_TYPE_GSI;
+   case IB_QPT_RC:
+   return CMDQ_CREATE_QP_TYPE_RC;
+   case IB_QPT_UD:
+   return CMDQ_CREATE_QP_TYPE_UD;
+   case IB_QPT_RAW_ETHERTYPE:
+   return CMDQ_CREATE_QP_TYPE_RAW_ETHERTYPE;
+   default:
+   return IB_QPT_MAX;
+   }
+}
+
+static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd 
*pd,
+   struct bnxt_re_qp *qp, struct ib_udata *udata)
+{
+   struct bnxt_re_qp_req ureq;
+   struct bnxt_qplib_qp *qplib_qp = &qp->qplib_qp;
+   struct ib_umem *umem;
+   int bytes = 0;
+   struct ib_ucontext *context = pd->ib_pd.uobject->context;
+ 

[PATCH V5 for-next 05/21] RDMA/bnxt_re: Adding Notification Queue support

2017-02-10 Thread Selvin Xavier
Completion Notifcations are handled by Notification Queue (NQ). This
patch configures the NQs. Also, configures the Door bell page mapping

v3: Fixes some sparse warnings related to endianness checks
v4: Change include file names

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h   |   8 ++
 drivers/infiniband/hw/bnxt_re/main.c  |  52 +-
 drivers/infiniband/hw/bnxt_re/qplib_fp.c  | 161 ++
 drivers/infiniband/hw/bnxt_re/qplib_fp.h  |  60 +++
 drivers/infiniband/hw/bnxt_re/qplib_res.h |   6 ++
 5 files changed, 286 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index cbc2fb2..cac4096 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -59,6 +59,8 @@ struct bnxt_re_work {
 #define BNXT_RE_MIN_MSIX   2
 #define BNXT_RE_MAX_MSIX   16
 #define BNXT_RE_AEQ_IDX0
+#define BNXT_RE_NQ_IDX 1
+
 struct bnxt_re_dev {
struct ib_deviceibdev;
struct list_headlist;
@@ -76,9 +78,15 @@ struct bnxt_re_dev {
 
int id;
 
+   /* FP Notification Queue (CQ & SRQ) */
+   struct tasklet_struct   nq_task;
+
/* RCFW Channel */
struct bnxt_qplib_rcfw  rcfw;
 
+   /* NQ */
+   struct bnxt_qplib_nqnq;
+
/* Device Resources */
struct bnxt_qplib_dev_attr  dev_attr;
struct bnxt_qplib_ctx   qplib_ctx;
diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index 6fdf726..9091caf 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -551,6 +551,9 @@ static int bnxt_re_aeq_handler(struct bnxt_qplib_rcfw *rcfw,
 
 static void bnxt_re_cleanup_res(struct bnxt_re_dev *rdev)
 {
+   if (rdev->nq.hwq.max_elements)
+   bnxt_qplib_disable_nq(&rdev->nq);
+
if (rdev->qplib_res.rcfw)
bnxt_qplib_cleanup_res(&rdev->qplib_res);
 }
@@ -561,11 +564,32 @@ static int bnxt_re_init_res(struct bnxt_re_dev *rdev)
 
bnxt_qplib_init_res(&rdev->qplib_res);
 
+   if (rdev->msix_entries[BNXT_RE_NQ_IDX].vector <= 0)
+   return -EINVAL;
+
+   rc = bnxt_qplib_enable_nq(rdev->en_dev->pdev, &rdev->nq,
+ rdev->msix_entries[BNXT_RE_NQ_IDX].vector,
+ rdev->msix_entries[BNXT_RE_NQ_IDX].db_offset,
+ NULL,
+ NULL);
+
+   if (rc)
+   dev_err(rdev_to_dev(rdev), "Failed to enable NQ: %#x", rc);
+
return rc;
 }
 
 static void bnxt_re_free_res(struct bnxt_re_dev *rdev, bool lock_wait)
 {
+   if (rdev->nq.hwq.max_elements) {
+   bnxt_re_net_ring_free(rdev, rdev->nq.ring_id, lock_wait);
+   bnxt_qplib_free_nq(&rdev->nq);
+   }
+   if (rdev->qplib_res.dpi_tbl.max) {
+   bnxt_qplib_dealloc_dpi(&rdev->qplib_res,
+  &rdev->qplib_res.dpi_tbl,
+  &rdev->dpi_privileged);
+   }
if (rdev->qplib_res.rcfw) {
bnxt_qplib_free_res(&rdev->qplib_res);
rdev->qplib_res.rcfw = NULL;
@@ -587,8 +611,34 @@ static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev)
if (rc)
goto fail;
 
-   return 0;
+   rc = bnxt_qplib_alloc_dpi(&rdev->qplib_res.dpi_tbl,
+ &rdev->dpi_privileged,
+ rdev);
+   if (rc)
+   goto fail;
 
+   rdev->nq.hwq.max_elements = BNXT_RE_MAX_CQ_COUNT +
+   BNXT_RE_MAX_SRQC_COUNT + 2;
+   rc = bnxt_qplib_alloc_nq(rdev->en_dev->pdev, &rdev->nq);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev),
+   "Failed to allocate NQ memory: %#x", rc);
+   goto fail;
+   }
+   rc = bnxt_re_net_ring_alloc
+   (rdev, rdev->nq.hwq.pbl[PBL_LVL_0].pg_map_arr,
+rdev->nq.hwq.pbl[rdev->nq.hwq.level].pg_count,
+HWRM_RING_ALLOC_CMPL, BNXT_QPLIB_NQE_MAX_CNT - 1,
+rdev->msix_entries[BNXT_RE_NQ_IDX].ring_idx,
+&rdev->nq.ring_id);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev),
+   "Failed to allocate NQ ring: %#x", rc);
+   goto free_nq;
+   }
+   return 0;
+free_nq:
+   bnxt_qplib_free_nq(&rdev->nq);
 fail:
rdev->qplib_res.rcfw = NULL;
return rc;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c 
b/drivers/infiniband/hw/bnxt_r

[PATCH V5 for-next 01/21] RDMA/bnxt_re: Add bnxt_re RoCE driver files

2017-02-10 Thread Selvin Xavier
This patch adds the required skeletal files for Broadcom NetXtreme-E RoCE
driver. Also, added the load/unload routines for bnxt_re driver.

v2: Modified the license text to include Dual License
v4: Modifies the directory and file names as per the review comment
v5: Changes the year in copyright text

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h|  46 
 drivers/infiniband/hw/bnxt_re/ib_verbs.c   |  37 +
 drivers/infiniband/hw/bnxt_re/ib_verbs.h   |  42 +++
 drivers/infiniband/hw/bnxt_re/main.c   | 116 +
 drivers/infiniband/hw/bnxt_re/qplib_fp.c   |  37 +
 drivers/infiniband/hw/bnxt_re/qplib_fp.h   |  42 +++
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |  37 +
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h |  42 +++
 drivers/infiniband/hw/bnxt_re/qplib_res.c  |  37 +
 drivers/infiniband/hw/bnxt_re/qplib_res.h  |  42 +++
 drivers/infiniband/hw/bnxt_re/qplib_sp.c   |  37 +
 drivers/infiniband/hw/bnxt_re/qplib_sp.h   |  43 +++
 drivers/infiniband/hw/bnxt_re/roce_hsi.h   |  42 +++
 13 files changed, 600 insertions(+)
 create mode 100644 drivers/infiniband/hw/bnxt_re/bnxt_re.h
 create mode 100644 drivers/infiniband/hw/bnxt_re/ib_verbs.c
 create mode 100644 drivers/infiniband/hw/bnxt_re/ib_verbs.h
 create mode 100644 drivers/infiniband/hw/bnxt_re/main.c
 create mode 100644 drivers/infiniband/hw/bnxt_re/qplib_fp.c
 create mode 100644 drivers/infiniband/hw/bnxt_re/qplib_fp.h
 create mode 100644 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
 create mode 100644 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
 create mode 100644 drivers/infiniband/hw/bnxt_re/qplib_res.c
 create mode 100644 drivers/infiniband/hw/bnxt_re/qplib_res.h
 create mode 100644 drivers/infiniband/hw/bnxt_re/qplib_sp.c
 create mode 100644 drivers/infiniband/hw/bnxt_re/qplib_sp.h
 create mode 100644 drivers/infiniband/hw/bnxt_re/roce_hsi.h

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
new file mode 100644
index 000..6ba013d
--- /dev/null
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -0,0 +1,46 @@
+/*
+ * Broadcom NetXtreme-E RoCE driver.
+ *
+ * Copyright (c) 2016 - 2017, Broadcom. All rights reserved.  The term
+ * Broadcom refers to Broadcom Limited and/or its subsidiaries.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in
+ *the documentation and/or other materials provided with the
+ *distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Description: Slow Path Operators (header)
+ *
+ */
+
+#ifndef __BNXT_RE_H__
+#define __BNXT_RE_H__
+#define ROCE_DRV_MODULE_NAME   "bnxt_re"
+#define ROCE_DRV_MODULE_VERSION"1.0.0"
+
+#define BNXT_RE_DESC   "Broadcom NetXtreme-C/E RoCE Driver"
+#endif
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
new file mode 100644
index 000..279c353
--- /dev/null
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -0,0 +1,37 @@
+/*
+ * Broadcom NetXtreme-E RoCE driver.
+ *
+ * Copyright (c) 2016 - 2017, Broadcom. All rights reserved.  The term
+ * Broadcom refers to Broadcom Limited and/or its subsidiaries.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, avai

[PATCH V5 for-next 09/21] RDMA/bnxt_re: Support for GID related verbs

2017-02-10 Thread Selvin Xavier
Implements add GID, del GID,  get_netdev and pkey related verbs.

v3: Fixes some sparse warning related to endianness check. Removes
macros which are just wrapper for standard defines.

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  | 123 +
 drivers/infiniband/hw/bnxt_re/ib_verbs.h  |  18 +++
 drivers/infiniband/hw/bnxt_re/main.c  |   7 +
 drivers/infiniband/hw/bnxt_re/qplib_res.c |   5 +
 drivers/infiniband/hw/bnxt_re/qplib_res.h |   3 +
 drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 218 ++
 drivers/infiniband/hw/bnxt_re/qplib_sp.h  |  11 ++
 7 files changed, 385 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 41d9534..1c9e1f4 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -60,6 +60,22 @@
 #include "ib_verbs.h"
 #include 
 
+/* Device */
+struct net_device *bnxt_re_get_netdev(struct ib_device *ibdev, u8 port_num)
+{
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+   struct net_device *netdev = NULL;
+
+   rcu_read_lock();
+   if (rdev)
+   netdev = rdev->netdev;
+   if (netdev)
+   dev_hold(netdev);
+
+   rcu_read_unlock();
+   return netdev;
+}
+
 int bnxt_re_query_device(struct ib_device *ibdev,
 struct ib_device_attr *ib_attr,
 struct ib_udata *udata)
@@ -272,6 +288,113 @@ int bnxt_re_get_port_immutable(struct ib_device *ibdev, 
u8 port_num,
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
return 0;
 }
+
+int bnxt_re_query_pkey(struct ib_device *ibdev, u8 port_num,
+  u16 index, u16 *pkey)
+{
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+
+   /* Ignore port_num */
+
+   memset(pkey, 0, sizeof(*pkey));
+   return bnxt_qplib_get_pkey(&rdev->qplib_res,
+  &rdev->qplib_res.pkey_tbl, index, pkey);
+}
+
+int bnxt_re_query_gid(struct ib_device *ibdev, u8 port_num,
+ int index, union ib_gid *gid)
+{
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+   int rc = 0;
+
+   /* Ignore port_num */
+   memset(gid, 0, sizeof(*gid));
+   rc = bnxt_qplib_get_sgid(&rdev->qplib_res,
+&rdev->qplib_res.sgid_tbl, index,
+(struct bnxt_qplib_gid *)gid);
+   return rc;
+}
+
+int bnxt_re_del_gid(struct ib_device *ibdev, u8 port_num,
+   unsigned int index, void **context)
+{
+   int rc = 0;
+   struct bnxt_re_gid_ctx *ctx, **ctx_tbl;
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+   struct bnxt_qplib_sgid_tbl *sgid_tbl = &rdev->qplib_res.sgid_tbl;
+
+   /* Delete the entry from the hardware */
+   ctx = *context;
+   if (!ctx)
+   return -EINVAL;
+
+   if (sgid_tbl && sgid_tbl->active) {
+   if (ctx->idx >= sgid_tbl->max)
+   return -EINVAL;
+   ctx->refcnt--;
+   if (!ctx->refcnt) {
+   rc = bnxt_qplib_del_sgid
+   (sgid_tbl,
+&sgid_tbl->tbl[ctx->idx], true);
+   if (rc)
+   dev_err(rdev_to_dev(rdev),
+   "Failed to remove GID: %#x", rc);
+   ctx_tbl = sgid_tbl->ctx;
+   ctx_tbl[ctx->idx] = NULL;
+   kfree(ctx);
+   }
+   } else {
+   return -EINVAL;
+   }
+   return rc;
+}
+
+int bnxt_re_add_gid(struct ib_device *ibdev, u8 port_num,
+   unsigned int index, const union ib_gid *gid,
+   const struct ib_gid_attr *attr, void **context)
+{
+   int rc;
+   u32 tbl_idx = 0;
+   u16 vlan_id = 0x;
+   struct bnxt_re_gid_ctx *ctx, **ctx_tbl;
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+   struct bnxt_qplib_sgid_tbl *sgid_tbl = &rdev->qplib_res.sgid_tbl;
+
+   if ((attr->ndev) && is_vlan_dev(attr->ndev))
+   vlan_id = vlan_dev_vlan_id(attr->ndev);
+
+   rc = bnxt_qplib_add_sgid(sgid_tbl, (struct bnxt_qplib_gid *)gid,
+rdev->qplib_res.netdev->dev_addr,
+vlan_id, true, &tbl_idx);
+   if (rc == -EALREADY) {
+   ctx_tbl = sgid_tbl->ctx;
+   ctx_tbl[tbl_idx]->refcnt++;
+   *context = ctx_tbl[tbl_idx];
+   return 0;
+   }
+
+   if (rc < 0) {
+   dev_err(rdev_to_dev(rdev), "Failed to add GID: %#x", rc);
+   return rc;
+   }
+
+   ctx = kmalloc(sizeof

[PATCH V5 for-next 18/21] RDMA/bnxt_re: Support for DCB

2017-02-10 Thread Selvin Xavier
This patch queries the configured RoCE APP Priority on the host
using the dcbnl API and programs the RoCE FW with the corresponding
Traffic Class(es) for the priority.

v2: Fixed some sparse warning and cleanup of function
bnxt_re_query_hwrm_pri2cos

v3: Adds bnxt_qplib_map_tc2cos as a part of this patch. Uses
ROCE_V2_UDP_DPORT instead of BNXT_RE_ROCE_V2_PORT_NO.

v5: Uses ETH_P_IBOE macro for RoCE ethertype

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h  |   3 +
 drivers/infiniband/hw/bnxt_re/main.c | 141 +++
 drivers/infiniband/hw/bnxt_re/qplib_sp.c |  37 
 drivers/infiniband/hw/bnxt_re/qplib_sp.h |   1 +
 4 files changed, 182 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index 5032ca1..ebf7be8 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -93,6 +93,9 @@ struct bnxt_re_dev {
 
int id;
 
+   struct delayed_work worker;
+   u8  cur_prio_map;
+
/* FP Notification Queue (CQ & SRQ) */
struct tasklet_struct   nq_task;
 
diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index c84691a..92a217b 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -44,8 +44,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -726,6 +728,50 @@ static void bnxt_re_dispatch_event(struct ib_device 
*ibdev, struct ib_qp *qp,
ib_dispatch_event(&ib_event);
 }
 
+#define HWRM_QUEUE_PRI2COS_QCFG_INPUT_FLAGS_IVLAN  0x02
+static int bnxt_re_query_hwrm_pri2cos(struct bnxt_re_dev *rdev, u8 dir,
+ u64 *cid_map)
+{
+   struct hwrm_queue_pri2cos_qcfg_input req = {0};
+   struct bnxt *bp = netdev_priv(rdev->netdev);
+   struct hwrm_queue_pri2cos_qcfg_output resp;
+   struct bnxt_en_dev *en_dev = rdev->en_dev;
+   struct bnxt_fw_msg fw_msg;
+   u32 flags = 0;
+   u8 *qcfgmap, *tmp_map;
+   int rc = 0, i;
+
+   if (!cid_map)
+   return -EINVAL;
+
+   memset(&fw_msg, 0, sizeof(fw_msg));
+   bnxt_re_init_hwrm_hdr(rdev, (void *)&req,
+ HWRM_QUEUE_PRI2COS_QCFG, -1, -1);
+   flags |= (dir & 0x01);
+   flags |= HWRM_QUEUE_PRI2COS_QCFG_INPUT_FLAGS_IVLAN;
+   req.flags = cpu_to_le32(flags);
+   req.port_id = bp->pf.port_id;
+
+   bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
+   sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
+   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+   if (rc)
+   return rc;
+
+   if (resp.queue_cfg_info) {
+   dev_warn(rdev_to_dev(rdev),
+"Asymmetric cos queue configuration detected");
+   dev_warn(rdev_to_dev(rdev),
+" on device, QoS may not be fully functional\n");
+   }
+   qcfgmap = &resp.pri0_cos_queue_id;
+   tmp_map = (u8 *)cid_map;
+   for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+   tmp_map[i] = qcfgmap[i];
+
+   return rc;
+}
+
 static bool bnxt_re_is_qp1_or_shadow_qp(struct bnxt_re_dev *rdev,
struct bnxt_re_qp *qp)
 {
@@ -757,6 +803,80 @@ static void bnxt_re_dev_stop(struct bnxt_re_dev *rdev)
mutex_unlock(&rdev->qp_lock);
 }
 
+static u32 bnxt_re_get_priority_mask(struct bnxt_re_dev *rdev)
+{
+   u32 prio_map = 0, tmp_map = 0;
+   struct net_device *netdev;
+   struct dcb_app app;
+
+   netdev = rdev->netdev;
+
+   memset(&app, 0, sizeof(app));
+   app.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE;
+   app.protocol = ETH_P_IBOE;
+   tmp_map = dcb_ieee_getapp_mask(netdev, &app);
+   prio_map = tmp_map;
+
+   app.selector = IEEE_8021QAZ_APP_SEL_DGRAM;
+   app.protocol = ROCE_V2_UDP_DPORT;
+   tmp_map = dcb_ieee_getapp_mask(netdev, &app);
+   prio_map |= tmp_map;
+
+   if (!prio_map)
+   prio_map = -EFAULT;
+   return prio_map;
+}
+
+static void bnxt_re_parse_cid_map(u8 prio_map, u8 *cid_map, u16 *cosq)
+{
+   u16 prio;
+   u8 id;
+
+   for (prio = 0, id = 0; prio < 8; prio++) {
+   if (prio_map & (1 << prio)) {
+   cosq[id] = cid_map[prio];
+   id++;
+   if (id == 2) /* Max 2 tcs supported */
+   break;
+   }
+   }
+}
+
+static int bnxt_re_setup_qos(struct bnxt_re_dev *rdev)
+{
+   u8 prio_map = 0;
+   u64 cid_map;
+   int rc;
+
+   /* Get priority for roce */
+   rc = b

[PATCH net-next] gtp: add MAINTAINERS

2017-02-10 Thread Pablo Neira Ayuso
Add maintainers for this tunnel driver. Include main osmocom.org mailist
list too.

Signed-off-by: Pablo Neira Ayuso 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5864bbd99f8f..ce5dde23bd00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5638,6 +5638,14 @@ T:   git git://linuxtv.org/media_tree.git
 S: Odd Fixes
 F: drivers/media/usb/gspca/
 
+GTP (GPRS Tunneling Protocol)
+M: Pablo Neira Ayuso 
+M: Harald Welte 
+L: open...@lists.osmocom.org
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git
+S: Maintained
+F: drivers/net/gtp.c
+
 GUID PARTITION TABLE (GPT)
 M: Davidlohr Bueso 
 L: linux-...@vger.kernel.org
-- 
2.1.4



[PATCH V5 for-next 02/21] RDMA/bnxt_re: Introducing autogenerated Host Software Interface(hsi) file

2017-02-10 Thread Selvin Xavier
This patch introduces all the structures used by the driver for
communicating with the Hardware. This file is the equivalent of
the bnxt_hsi.h used by bnxt_en driver.

v2: Remove duplicate structure definitions from bnxt_en HSI file and
include bnxt_hsi.h from bnxt_en driver
v3: Remove the checkpatch warnings for coloumns more than 80 characters

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/roce_hsi.h | 2779 ++
 1 file changed, 2779 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/roce_hsi.h 
b/drivers/infiniband/hw/bnxt_re/roce_hsi.h
index 4e54a1d..fc23477 100644
--- a/drivers/infiniband/hw/bnxt_re/roce_hsi.h
+++ b/drivers/infiniband/hw/bnxt_re/roce_hsi.h
@@ -39,4 +39,2783 @@
 #ifndef __BNXT_RE_HSI_H__
 #define __BNXT_RE_HSI_H__
 
+/* include bnxt_hsi.h from bnxt_en driver */
+#include "bnxt_hsi.h"
+
+/* CMP Door Bell Format (4 bytes) */
+struct cmpl_doorbell {
+   __le32 key_mask_valid_idx;
+   #define CMPL_DOORBELL_IDX_MASK  0xffUL
+   #define CMPL_DOORBELL_IDX_SFT   0
+   #define CMPL_DOORBELL_RESERVED_MASK 0x300UL
+   #define CMPL_DOORBELL_RESERVED_SFT  24
+   #define CMPL_DOORBELL_IDX_VALID 0x400UL
+   #define CMPL_DOORBELL_MASK  0x800UL
+   #define CMPL_DOORBELL_KEY_MASK  0xf000UL
+   #define CMPL_DOORBELL_KEY_SFT   28
+   #define CMPL_DOORBELL_KEY_CMPL (0x2UL << 28)
+};
+
+/* Status Door Bell Format (4 bytes) */
+struct status_doorbell {
+   __le32 key_idx;
+   #define STATUS_DOORBELL_IDX_MASK0xffUL
+   #define STATUS_DOORBELL_IDX_SFT 0
+   #define STATUS_DOORBELL_RESERVED_MASK   0xf00UL
+   #define STATUS_DOORBELL_RESERVED_SFT24
+   #define STATUS_DOORBELL_KEY_MASK0xf000UL
+   #define STATUS_DOORBELL_KEY_SFT 28
+   #define STATUS_DOORBELL_KEY_STAT   (0x3UL << 28)
+};
+
+/* RoCE Host Structures */
+
+/* Doorbell Structures */
+/* 64b Doorbell Format (8 bytes) */
+struct dbr_dbr {
+   __le32 index;
+   #define DBR_DBR_INDEX_MASK  0xfUL
+   #define DBR_DBR_INDEX_SFT   0
+   #define DBR_DBR_RESERVED12_MASK 0xfff0UL
+   #define DBR_DBR_RESERVED12_SFT  20
+   __le32 type_xid;
+   #define DBR_DBR_XID_MASK0xfUL
+   #define DBR_DBR_XID_SFT 0
+   #define DBR_DBR_RESERVED8_MASK  0xff0UL
+   #define DBR_DBR_RESERVED8_SFT   20
+   #define DBR_DBR_TYPE_MASK   0xf000UL
+   #define DBR_DBR_TYPE_SFT28
+   #define DBR_DBR_TYPE_SQ(0x0UL << 28)
+   #define DBR_DBR_TYPE_RQ(0x1UL << 28)
+   #define DBR_DBR_TYPE_SRQ   (0x2UL << 28)
+   #define DBR_DBR_TYPE_SRQ_ARM   (0x3UL << 28)
+   #define DBR_DBR_TYPE_CQ(0x4UL << 28)
+   #define DBR_DBR_TYPE_CQ_ARMSE  (0x5UL << 28)
+   #define DBR_DBR_TYPE_CQ_ARMALL (0x6UL << 28)
+   #define DBR_DBR_TYPE_CQ_ARMENA (0x7UL << 28)
+   #define DBR_DBR_TYPE_SRQ_ARMENA(0x8UL << 28)
+   #define DBR_DBR_TYPE_CQ_CUTOFF_ACK (0x9UL << 28)
+   #define DBR_DBR_TYPE_NULL  (0xfUL << 28)
+};
+
+/* 32b Doorbell Format (4 bytes) */
+struct dbr_dbr32 {
+   __le32 type_abs_incr_xid;
+   #define DBR_DBR32_XID_MASK  0xfUL
+   #define DBR_DBR32_XID_SFT   0
+   #define DBR_DBR32_RESERVED4_MASK0xf0UL
+   #define DBR_DBR32_RESERVED4_SFT 20
+   #define DBR_DBR32_INCR_MASK 0xf00UL
+   #define DBR_DBR32_INCR_SFT  24
+   #define DBR_DBR32_ABS   0x1000UL
+   #define DBR_DBR32_TYPE_MASK 0xe000UL
+   #define DBR_DBR32_TYPE_SFT  29
+   #define DBR_DBR32_TYPE_SQ  (0x0UL <

[PATCH V5 for-next 14/21] RDMA/bnxt_re: Support post_send verb

2017-02-10 Thread Selvin Xavier
Enables the ib_post_send fastpath verb for posting Send work
requests on QPs.

v2: Fixed some sparse warnings
v3: Fixes endianness related warnings reported by sparse. Changes
some of the macros to inline functions.
v5: Uses ETH_P_IBOE macro for RoCE ethertype

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h  |   4 +
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 545 ++-
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |   2 +
 drivers/infiniband/hw/bnxt_re/main.c |   1 +
 drivers/infiniband/hw/bnxt_re/qplib_fp.c | 268 +++
 drivers/infiniband/hw/bnxt_re/qplib_fp.h |   6 +
 6 files changed, 823 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index 9c7a53c..5032ca1 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -129,6 +129,10 @@ struct bnxt_re_dev {
 #define to_bnxt_re_dev(ptr, member)\
container_of((ptr), struct bnxt_re_dev, member)
 
+#define BNXT_RE_ROCE_V1_PACKET 0
+#define BNXT_RE_ROCEV2_IPV4_PACKET 2
+#define BNXT_RE_ROCEV2_IPV6_PACKET 3
+
 static inline struct device *rdev_to_dev(struct bnxt_re_dev *rdev)
 {
if (rdev)
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index d4d50e5..9401717 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -60,6 +61,20 @@
 #include "ib_verbs.h"
 #include 
 
+static int bnxt_re_build_sgl(struct ib_sge *ib_sg_list,
+struct bnxt_qplib_sge *sg_list, int num)
+{
+   int i, total = 0;
+
+   for (i = 0; i < num; i++) {
+   sg_list[i].addr = ib_sg_list[i].addr;
+   sg_list[i].lkey = ib_sg_list[i].lkey;
+   sg_list[i].size = ib_sg_list[i].length;
+   total += sg_list[i].size;
+   }
+   return total;
+}
+
 /* Device */
 struct net_device *bnxt_re_get_netdev(struct ib_device *ibdev, u8 port_num)
 {
@@ -699,8 +714,6 @@ static u8 __from_ib_qp_type(enum ib_qp_type type)
return CMDQ_CREATE_QP_TYPE_RC;
case IB_QPT_UD:
return CMDQ_CREATE_QP_TYPE_UD;
-   case IB_QPT_RAW_ETHERTYPE:
-   return CMDQ_CREATE_QP_TYPE_RAW_ETHERTYPE;
default:
return IB_QPT_MAX;
}
@@ -874,7 +887,6 @@ struct ib_qp *bnxt_re_create_qp(struct ib_pd *ib_pd,
struct bnxt_re_dev *rdev = pd->rdev;
struct bnxt_qplib_dev_attr *dev_attr = &rdev->dev_attr;
struct bnxt_re_qp *qp;
-   struct bnxt_re_srq *srq;
struct bnxt_re_cq *cq;
int rc, entries;
 
@@ -1442,6 +1454,533 @@ int bnxt_re_query_qp(struct ib_qp *ib_qp, struct 
ib_qp_attr *qp_attr,
return 0;
 }
 
+/* Routine for sending QP1 packets for RoCE V1 an V2
+ */
+static int bnxt_re_build_qp1_send_v2(struct bnxt_re_qp *qp,
+struct ib_send_wr *wr,
+struct bnxt_qplib_swqe *wqe,
+int payload_size)
+{
+   struct ib_device *ibdev = &qp->rdev->ibdev;
+   struct bnxt_re_ah *ah = container_of(ud_wr(wr)->ah, struct bnxt_re_ah,
+ib_ah);
+   struct bnxt_qplib_ah *qplib_ah = &ah->qplib_ah;
+   struct bnxt_qplib_sge sge;
+   union ib_gid sgid;
+   u8 nw_type;
+   u16 ether_type;
+   struct ib_gid_attr sgid_attr;
+   union ib_gid dgid;
+   bool is_eth = false;
+   bool is_vlan = false;
+   bool is_grh = false;
+   bool is_udp = false;
+   u8 ip_version = 0;
+   u16 vlan_id = 0x;
+   void *buf;
+   int i, rc = 0, size;
+
+   memset(&qp->qp1_hdr, 0, sizeof(qp->qp1_hdr));
+
+   rc = ib_get_cached_gid(ibdev, 1,
+  qplib_ah->host_sgid_index, &sgid,
+  &sgid_attr);
+   if (rc) {
+   dev_err(rdev_to_dev(qp->rdev),
+   "Failed to query gid at index %d",
+   qplib_ah->host_sgid_index);
+   return rc;
+   }
+   if (sgid_attr.ndev) {
+   if (is_vlan_dev(sgid_attr.ndev))
+   vlan_id = vlan_dev_vlan_id(sgid_attr.ndev);
+   dev_put(sgid_attr.ndev);
+   }
+   /* Get network header type for this GID */
+   nw_type = ib_gid_to_network_type(sgid_attr.gid_type, &sgid);
+   switch (nw_type) {
+   case RDMA_NETWORK_IPV4:
+   nw_type = BNXT_RE_ROCEV2_IPV4_PACKET;
+   break;
+   case RDMA_NETWORK_IPV6:
+   nw_type = BNXT_RE_ROCEV2_IPV6_PACKET;
+   break;
+   default:
+   nw_type = BNXT_RE_

[PATCH V5 for-next 07/21] RDMA/bnxt_re: Support for query and modify device verbs

2017-02-10 Thread Selvin Xavier
Implements the query device and modify device verbs

v3: Fix sparse warnings related to endianness checks

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h   |  7 +++
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  | 90 +++
 drivers/infiniband/hw/bnxt_re/ib_verbs.h  |  6 +++
 drivers/infiniband/hw/bnxt_re/main.c  |  2 +
 drivers/infiniband/hw/bnxt_re/qplib_res.c | 17 ++
 drivers/infiniband/hw/bnxt_re/qplib_res.h |  1 +
 6 files changed, 123 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index cac4096..51ad6c2 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -44,6 +44,13 @@
 
 #define BNXT_RE_DESC   "Broadcom NetXtreme-C/E RoCE Driver"
 
+#define BNXT_RE_PAGE_SIZE_4K   BIT(12)
+#define BNXT_RE_PAGE_SIZE_8K   BIT(13)
+#define BNXT_RE_PAGE_SIZE_64K  BIT(16)
+#define BNXT_RE_PAGE_SIZE_2M   BIT(21)
+#define BNXT_RE_PAGE_SIZE_8M   BIT(23)
+#define BNXT_RE_PAGE_SIZE_1G   BIT(30)
+
 #define BNXT_RE_MAX_QPC_COUNT  (64 * 1024)
 #define BNXT_RE_MAX_MRW_COUNT  (64 * 1024)
 #define BNXT_RE_MAX_SRQC_COUNT (64 * 1024)
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 6589a41..5dae826 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -60,6 +60,96 @@
 #include "ib_verbs.h"
 #include 
 
+int bnxt_re_query_device(struct ib_device *ibdev,
+struct ib_device_attr *ib_attr,
+struct ib_udata *udata)
+{
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+   struct bnxt_qplib_dev_attr *dev_attr = &rdev->dev_attr;
+
+   memset(ib_attr, 0, sizeof(*ib_attr));
+
+   ib_attr->fw_ver = (u64)(unsigned long)(dev_attr->fw_ver);
+   bnxt_qplib_get_guid(rdev->netdev->dev_addr,
+   (u8 *)&ib_attr->sys_image_guid);
+   ib_attr->max_mr_size = ~0ull;
+   ib_attr->page_size_cap = BNXT_RE_PAGE_SIZE_4K | BNXT_RE_PAGE_SIZE_8K |
+BNXT_RE_PAGE_SIZE_64K | BNXT_RE_PAGE_SIZE_2M |
+BNXT_RE_PAGE_SIZE_8M | BNXT_RE_PAGE_SIZE_1G;
+
+   ib_attr->vendor_id = rdev->en_dev->pdev->vendor;
+   ib_attr->vendor_part_id = rdev->en_dev->pdev->device;
+   ib_attr->hw_ver = rdev->en_dev->pdev->subsystem_device;
+   ib_attr->max_qp = dev_attr->max_qp;
+   ib_attr->max_qp_wr = dev_attr->max_qp_wqes;
+   ib_attr->device_cap_flags =
+   IB_DEVICE_CURR_QP_STATE_MOD
+   | IB_DEVICE_RC_RNR_NAK_GEN
+   | IB_DEVICE_SHUTDOWN_PORT
+   | IB_DEVICE_SYS_IMAGE_GUID
+   | IB_DEVICE_LOCAL_DMA_LKEY
+   | IB_DEVICE_RESIZE_MAX_WR
+   | IB_DEVICE_PORT_ACTIVE_EVENT
+   | IB_DEVICE_N_NOTIFY_CQ
+   | IB_DEVICE_MEM_WINDOW
+   | IB_DEVICE_MEM_WINDOW_TYPE_2B
+   | IB_DEVICE_MEM_MGT_EXTENSIONS;
+   ib_attr->max_sge = dev_attr->max_qp_sges;
+   ib_attr->max_sge_rd = dev_attr->max_qp_sges;
+   ib_attr->max_cq = dev_attr->max_cq;
+   ib_attr->max_cqe = dev_attr->max_cq_wqes;
+   ib_attr->max_mr = dev_attr->max_mr;
+   ib_attr->max_pd = dev_attr->max_pd;
+   ib_attr->max_qp_rd_atom = dev_attr->max_qp_rd_atom;
+   ib_attr->max_qp_init_rd_atom = dev_attr->max_qp_rd_atom;
+   ib_attr->atomic_cap = IB_ATOMIC_HCA;
+   ib_attr->masked_atomic_cap = IB_ATOMIC_HCA;
+
+   ib_attr->max_ee_rd_atom = 0;
+   ib_attr->max_res_rd_atom = 0;
+   ib_attr->max_ee_init_rd_atom = 0;
+   ib_attr->max_ee = 0;
+   ib_attr->max_rdd = 0;
+   ib_attr->max_mw = dev_attr->max_mw;
+   ib_attr->max_raw_ipv6_qp = 0;
+   ib_attr->max_raw_ethy_qp = dev_attr->max_raw_ethy_qp;
+   ib_attr->max_mcast_grp = 0;
+   ib_attr->max_mcast_qp_attach = 0;
+   ib_attr->max_total_mcast_qp_attach = 0;
+   ib_attr->max_ah = dev_attr->max_ah;
+
+   ib_attr->max_fmr = dev_attr->max_fmr;
+   ib_attr->max_map_per_fmr = 1;   /* ? */
+
+   ib_attr->max_srq = dev_attr->max_srq;
+   ib_attr->max_srq_wr = dev_attr->max_srq_wqes;
+   ib_attr->max_srq_sge = dev_attr->max_srq_sges;
+
+   ib_attr->max_fast_reg_page_list_len = MAX_PBL_LVL_1_PGS;
+
+   ib_attr->max_pkeys = 1;
+   ib_attr->local_ca_ack_delay = 0;
+   return 0;
+}
+
+int bnxt_re_modify_device(struct ib_device *ibdev,
+ int device_modify_mask,
+  

[PATCH V5 for-next 11/21] RDMA/bnxt_re: Support for AH verbs

2017-02-10 Thread Selvin Xavier
This patch implements support for create_ah, destroy_ah, query_ah
and modify_ah verbs.

v3: Removes unwanted macros and does cleanup listed by checkpatch
v5: Adds ib_udata parameter to create_ah verb

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 148 +++
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |  12 +++
 drivers/infiniband/hw/bnxt_re/main.c |   4 +
 drivers/infiniband/hw/bnxt_re/qplib_sp.c |  94 
 drivers/infiniband/hw/bnxt_re/qplib_sp.h |  18 
 include/uapi/rdma/bnxt_re-abi.h  |   7 ++
 6 files changed, 283 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index f85d4c4..18580e7 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -492,6 +492,154 @@ struct ib_pd *bnxt_re_alloc_pd(struct ib_device *ibdev,
return ERR_PTR(rc);
 }
 
+/* Address Handles */
+int bnxt_re_destroy_ah(struct ib_ah *ib_ah)
+{
+   struct bnxt_re_ah *ah = container_of(ib_ah, struct bnxt_re_ah, ib_ah);
+   struct bnxt_re_dev *rdev = ah->rdev;
+   int rc;
+
+   rc = bnxt_qplib_destroy_ah(&rdev->qplib_res, &ah->qplib_ah);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev), "Failed to destroy HW AH");
+   return rc;
+   }
+   kfree(ah);
+   return 0;
+}
+
+struct ib_ah *bnxt_re_create_ah(struct ib_pd *ib_pd,
+   struct ib_ah_attr *ah_attr,
+   struct ib_udata *udata)
+{
+   struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
+   struct bnxt_re_dev *rdev = pd->rdev;
+   struct bnxt_re_ah *ah;
+   int rc;
+   u16 vlan_tag;
+   u8 nw_type;
+
+   struct ib_gid_attr sgid_attr;
+
+   if (!(ah_attr->ah_flags & IB_AH_GRH)) {
+   dev_err(rdev_to_dev(rdev), "Failed to alloc AH: GRH not set");
+   return ERR_PTR(-EINVAL);
+   }
+   ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
+   if (!ah)
+   return ERR_PTR(-ENOMEM);
+
+   ah->rdev = rdev;
+   ah->qplib_ah.pd = &pd->qplib_pd;
+
+   /* Supply the configuration for the HW */
+   memcpy(ah->qplib_ah.dgid.data, ah_attr->grh.dgid.raw,
+  sizeof(union ib_gid));
+   /*
+* If RoCE V2 is enabled, stack will have two entries for
+* each GID entry. Avoiding this duplicte entry in HW. Dividing
+* the GID index by 2 for RoCE V2
+*/
+   ah->qplib_ah.sgid_index = ah_attr->grh.sgid_index / 2;
+   ah->qplib_ah.host_sgid_index = ah_attr->grh.sgid_index;
+   ah->qplib_ah.traffic_class = ah_attr->grh.traffic_class;
+   ah->qplib_ah.flow_label = ah_attr->grh.flow_label;
+   ah->qplib_ah.hop_limit = ah_attr->grh.hop_limit;
+   ah->qplib_ah.sl = ah_attr->sl;
+   if (ib_pd->uobject &&
+   !rdma_is_multicast_addr((struct in6_addr *)
+   ah_attr->grh.dgid.raw) &&
+   !rdma_link_local_addr((struct in6_addr *)
+ ah_attr->grh.dgid.raw)) {
+   union ib_gid sgid;
+
+   rc = ib_get_cached_gid(&rdev->ibdev, 1,
+  ah_attr->grh.sgid_index, &sgid,
+  &sgid_attr);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev),
+   "Failed to query gid at index %d",
+   ah_attr->grh.sgid_index);
+   goto fail;
+   }
+   if (sgid_attr.ndev) {
+   if (is_vlan_dev(sgid_attr.ndev))
+   vlan_tag = vlan_dev_vlan_id(sgid_attr.ndev);
+   dev_put(sgid_attr.ndev);
+   }
+   /* Get network header type for this GID */
+   nw_type = ib_gid_to_network_type(sgid_attr.gid_type, &sgid);
+   switch (nw_type) {
+   case RDMA_NETWORK_IPV4:
+   ah->qplib_ah.nw_type = CMDQ_CREATE_AH_TYPE_V2IPV4;
+   break;
+   case RDMA_NETWORK_IPV6:
+   ah->qplib_ah.nw_type = CMDQ_CREATE_AH_TYPE_V2IPV6;
+   break;
+   default:
+   ah->qplib_ah.nw_type = CMDQ_CREATE_AH_TYPE_V1;
+   break;
+   }
+   rc = rdma_addr_find_l2_eth_by_grh(&sgid, &ah_attr->grh.dgid,
+ ah_attr->dmac, &vlan_tag,
+ &sgid_attr.ndev->ifindex,
+ NULL);
+   if (rc) {
+   dev_err(rdev_to_dev(rdev), "Failed to get dmac\n");
+   

[PATCH V5 for-next 08/21] RDMA/bnxt_re: Adding support for port related verbs

2017-02-10 Thread Selvin Xavier
Implements query_port, modify_port and port_immutable verbs

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c | 122 +++
 drivers/infiniband/hw/bnxt_re/ib_verbs.h |   7 ++
 drivers/infiniband/hw/bnxt_re/main.c |   4 +
 3 files changed, 133 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c 
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 5dae826..41d9534 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -150,6 +150,128 @@ int bnxt_re_modify_device(struct ib_device *ibdev,
return 0;
 }
 
+static void __to_ib_speed_width(struct net_device *netdev, u8 *speed, u8 
*width)
+{
+   struct ethtool_link_ksettings lksettings;
+   u32 espeed;
+
+   if (netdev->ethtool_ops && netdev->ethtool_ops->get_link_ksettings) {
+   memset(&lksettings, 0, sizeof(lksettings));
+   rtnl_lock();
+   netdev->ethtool_ops->get_link_ksettings(netdev, &lksettings);
+   rtnl_unlock();
+   espeed = lksettings.base.speed;
+   } else {
+   espeed = SPEED_UNKNOWN;
+   }
+   switch (espeed) {
+   case SPEED_1000:
+   *speed = IB_SPEED_SDR;
+   *width = IB_WIDTH_1X;
+   break;
+   case SPEED_1:
+   *speed = IB_SPEED_QDR;
+   *width = IB_WIDTH_1X;
+   break;
+   case SPEED_2:
+   *speed = IB_SPEED_DDR;
+   *width = IB_WIDTH_4X;
+   break;
+   case SPEED_25000:
+   *speed = IB_SPEED_EDR;
+   *width = IB_WIDTH_1X;
+   break;
+   case SPEED_4:
+   *speed = IB_SPEED_QDR;
+   *width = IB_WIDTH_4X;
+   break;
+   case SPEED_5:
+   break;
+   default:
+   *speed = IB_SPEED_SDR;
+   *width = IB_WIDTH_1X;
+   break;
+   }
+}
+
+/* Port */
+int bnxt_re_query_port(struct ib_device *ibdev, u8 port_num,
+  struct ib_port_attr *port_attr)
+{
+   struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev);
+   struct bnxt_qplib_dev_attr *dev_attr = &rdev->dev_attr;
+
+   memset(port_attr, 0, sizeof(*port_attr));
+
+   if (netif_running(rdev->netdev) && netif_carrier_ok(rdev->netdev)) {
+   port_attr->state = IB_PORT_ACTIVE;
+   port_attr->phys_state = 5;
+   } else {
+   port_attr->state = IB_PORT_DOWN;
+   port_attr->phys_state = 3;
+   }
+   port_attr->max_mtu = IB_MTU_4096;
+   port_attr->active_mtu = iboe_get_mtu(rdev->netdev->mtu);
+   port_attr->gid_tbl_len = dev_attr->max_sgid;
+   port_attr->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_REINIT_SUP |
+   IB_PORT_DEVICE_MGMT_SUP |
+   IB_PORT_VENDOR_CLASS_SUP |
+   IB_PORT_IP_BASED_GIDS;
+
+   /* Max MSG size set to 2G for now */
+   port_attr->max_msg_sz = 0x8000;
+   port_attr->bad_pkey_cntr = 0;
+   port_attr->qkey_viol_cntr = 0;
+   port_attr->pkey_tbl_len = dev_attr->max_pkey;
+   port_attr->lid = 0;
+   port_attr->sm_lid = 0;
+   port_attr->lmc = 0;
+   port_attr->max_vl_num = 4;
+   port_attr->sm_sl = 0;
+   port_attr->subnet_timeout = 0;
+   port_attr->init_type_reply = 0;
+   /* call the underlying netdev's ethtool hooks to query speed settings
+* for which we acquire rtnl_lock _only_ if it's registered with
+* IB stack to avoid race in the NETDEV_UNREG path
+*/
+   if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags))
+   __to_ib_speed_width(rdev->netdev, &port_attr->active_speed,
+   &port_attr->active_width);
+   return 0;
+}
+
+int bnxt_re_modify_port(struct ib_device *ibdev, u8 port_num,
+   int port_modify_mask,
+   struct ib_port_modify *port_modify)
+{
+   switch (port_modify_mask) {
+   case IB_PORT_SHUTDOWN:
+   break;
+   case IB_PORT_INIT_TYPE:
+   break;
+   case IB_PORT_RESET_QKEY_CNTR:
+   break;
+   default:
+   break;
+   }
+   return 0;
+}
+
+int bnxt_re_get_port_immutable(struct ib_device *ibdev, u8 port_num,
+  struct ib_port_immutable *immutable)
+{
+   struct ib_port_attr port_attr;
+
+   if (bnxt_re_query_port(ibdev, port_num, &port_attr))
+   return -EINVAL;
+
+   immutable->pkey_tbl_len = port_attr.pkey_tbl_len;
+   immutable->gid_tbl_len = port_attr.gid_tbl_len;
+   immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
+   immutable->core_cap_f

[PATCH v2 iproute2 1/2] utils: make hex2mem available to all users

2017-02-10 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

hex2mem() api is useful for parsing hexstrings which are then packed in
a stream of chars.

Signed-off-by: Jamal Hadi Salim 
---
 include/utils.h |  1 +
 ip/ipl2tp.c | 25 -
 lib/utils.c | 25 +
 3 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index dc1d6b9..22369e0 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -118,6 +118,7 @@ int get_be32(__be32 *val, const char *arg, int base);
 int get_be16(__be16 *val, const char *arg, int base);
 int get_addr64(__u64 *ap, const char *cp);
 
+int hex2mem(const char *buf, uint8_t *mem, int count);
 char *hexstring_n2a(const __u8 *str, int len, char *buf, int blen);
 __u8 *hexstring_a2n(const char *str, __u8 *buf, int blen, unsigned int *len);
 #define ADDR64_BUF_SIZE sizeof(":::")
diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 0f91aeb..88664c9 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -485,31 +485,6 @@ static int get_tunnel(struct l2tp_data *p)
  * Command parser
  */
 
-static int hex2mem(const char *buf, uint8_t *mem, int count)
-{
-   int i, j;
-   int c;
-
-   for (i = 0, j = 0; i < count; i++, j += 2) {
-   c = get_hex(buf[j]);
-   if (c < 0)
-   goto err;
-
-   mem[i] = c << 4;
-
-   c = get_hex(buf[j + 1]);
-   if (c < 0)
-   goto err;
-
-   mem[i] |= c;
-   }
-
-   return 0;
-
-err:
-   return -1;
-}
-
 static void usage(void) __attribute__((noreturn));
 
 static void usage(void)
diff --git a/lib/utils.c b/lib/utils.c
index 83c9d09..870c4f1 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -962,6 +962,31 @@ __u8 *hexstring_a2n(const char *str, __u8 *buf, int blen, 
unsigned int *len)
return buf;
 }
 
+int hex2mem(const char *buf, uint8_t *mem, int count)
+{
+   int i, j;
+   int c;
+
+   for (i = 0, j = 0; i < count; i++, j += 2) {
+   c = get_hex(buf[j]);
+   if (c < 0)
+   goto err;
+
+   mem[i] = c << 4;
+
+   c = get_hex(buf[j + 1]);
+   if (c < 0)
+   goto err;
+
+   mem[i] |= c;
+   }
+
+   return 0;
+
+err:
+   return -1;
+}
+
 int addr64_n2a(__u64 addr, char *buff, size_t len)
 {
__u16 *words = (__u16 *)&addr;
-- 
1.9.1



[PATCH v2 iproute2 2/2] actions: Add support for user cookies

2017-02-10 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_action.c   | 44 ++--
 1 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index bb19df8..00bc219 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -150,18 +150,19 @@ new_cmd(char **argv)
 
 }
 
-int
-parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
+int parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
 {
int argc = *argc_p;
char **argv = *argv_p;
struct rtattr *tail, *tail2;
char k[16];
+   int act_ck_len = 0;
int ok = 0;
int eap = 0; /* expect action parameters */
 
int ret = 0;
int prio = 0;
+   unsigned char act_ck[TC_COOKIE_MAX_SIZE];
 
if (argc <= 0)
return -1;
@@ -215,16 +216,39 @@ done0:
addattr_l(n, MAX_MSG, ++prio, NULL, 0);
addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
 
-   ret = a->parse_aopt(a, &argc, &argv, TCA_ACT_OPTIONS, 
n);
+   ret = a->parse_aopt(a, &argc, &argv, TCA_ACT_OPTIONS,
+   n);
 
if (ret < 0) {
fprintf(stderr, "bad action parsing\n");
goto bad_val;
}
+
+   if (*argv && strcmp(*argv, "cookie") == 0) {
+   int slen;
+
+   NEXT_ARG();
+   slen = strlen(*argv);
+   if (slen > (TC_COOKIE_MAX_SIZE*2))
+   invarg("cookie cannot exceed %d\n",
+  *argv);
+
+   if (hex2mem(*argv, act_ck, slen/2) < 0)
+   invarg("cookie must be a hex string\n",
+  *argv);
+
+   act_ck_len = slen;
+   argc--;
+   argv++;
+   }
+
+   if (act_ck_len)
+   addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
+ (const void *)&act_ck, act_ck_len);
+
tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
ok++;
}
-
}
 
if (eap > 0) {
@@ -245,8 +269,7 @@ bad_val:
return -1;
 }
 
-static int
-tc_print_one_action(FILE *f, struct rtattr *arg)
+static int tc_print_one_action(FILE *f, struct rtattr *arg)
 {
 
struct rtattr *tb[TCA_ACT_MAX + 1];
@@ -274,8 +297,17 @@ tc_print_one_action(FILE *f, struct rtattr *arg)
return err;
 
if (show_stats && tb[TCA_ACT_STATS]) {
+
fprintf(f, "\tAction statistics:\n");
print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL);
+   if (tb[TCA_ACT_COOKIE]) {
+   int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]);
+   char b1[strsz+1];
+
+   fprintf(f, "\n\tcookie len %d %s ", strsz,
+   hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]),
+ strsz, b1, sizeof(b1)));
+   }
fprintf(f, "\n");
}
 
-- 
1.9.1



Re: [PATCH iproute2 2/2] actions: Add support for user cookies

2017-02-10 Thread Jamal Hadi Salim

On 17-02-10 06:18 AM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it.  The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim 
---
 include/linux/pkt_cls.h |  2 +-
 tc/m_action.c   | 44 ++--
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index fef68c4..af17f3c 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -343,7 +343,7 @@ enum {
TCA_BPF_NAME,
TCA_BPF_FLAGS,
TCA_BPF_FLAGS_GEN,
-   TCA_BPF_DIGEST,
+   TCA_BPF_TAG,
__TCA_BPF_MAX,
 };




Sorry - above header wasnt supposed to be in the patch. Ignore.
I will send v2.

cheers,
jamal



Re: [PATCH for bnxt_re V4 03/21] RDMA/bnxt_re: register with the NIC driver

2017-02-10 Thread Selvin Xavier
On Tue, Feb 7, 2017 at 1:56 AM, Doug Ledford  wrote:
>> +static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev)
>> +{
>> +   int i = BNXT_RE_REF_WAIT_COUNT;
>> +
>> +   /* Wait for rdev refcount to come down */
>> +   while ((atomic_read(&rdev->ref_count) > 1) && i--)
>> +   msleep(100);
>> +
>> +   if (atomic_read(&rdev->ref_count) > 1)
>> +   dev_err(rdev_to_dev(rdev),
>> +   "Failed waiting for ref count to deplete %d",
>> +   atomic_read(&rdev->ref_count));
>> +
>> +   atomic_set(&rdev->ref_count, 0);
>> +   dev_put(rdev->netdev);
>> +   rdev->netdev = NULL;
>> +
>> +   mutex_lock(&bnxt_re_dev_lock);
>> +   list_del_rcu(&rdev->list);
>> +   mutex_unlock(&bnxt_re_dev_lock);
>> +
>> +   synchronize_rcu();
>> +   flush_workqueue(bnxt_re_wq);
>> +
>> +   ib_dealloc_device(&rdev->ibdev);
>> +   /* rdev is gone */
>> +}
>
> This looks bad.  Either your ref counting is right, and your ref count
> should go to 1, or you have an issue that won't be helped by forcibly
> removing the device.  In its current form, this looks like an oopser
> waiting to happen.
>
> If you know your ref counting is right, then simply wait until it goes
> to 1, don't have this bailout logic.  If you truly need to bailout for
> some reason, then you need to not free things.  Better to shutdown then
> leak and live than release and have a use after free data corrupter.

Thanks for your commet.
I reviewed the usage of ref_count again and it is not really required
in this patch series.
It is being incremented and decremented from the netdev notifier and
will be always 1
in bnxt_re_dev_remove. bnxt_re_dev_remove is also invoked from netdev
notifier, so
no need to wait in this function.  So, I have removed ref_count
variable and cleaned up
this function in the v5 patch set.

Thanks,
Selvin


[RFC PATCH v2] net: ethtool: add support for forward error correction modes

2017-02-10 Thread Vidya Sagar Ravipati
From: Vidya Sagar Ravipati 

Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds.
Various networking devices which support 25G/40G/100G provides ability
to manage supported FEC modes and the lack of FEC encoding control and
reporting today is a source for itneroperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of base link
codeword.

This patch set intends to provide option under ethtool to manage and report
FEC encoding settings for networking devices as per IEEE 802.3 bj, bm and by
specs.

set-fec/show-fec option(s) are  designed to provide  control and report
the FEC encoding on the link.

SET FEC option:
root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]

Encoding: Types of encoding
Off:  Turning off any encoding
RS :  enforcing RS-FEC encoding on supported speeds
BaseR  :  enforcing Base R encoding on supported speeds
Auto   :  IEEE defaults for the speed/medium combination

Here are a few examples of what we would expect if encoding=auto:
- if autoneg is on, we are  expecting FEC to be negotiated as on or off
  as long as protocol supports it
- if the hardware is capable of detecting the FEC encoding on it's
  receiver it will reconfigure its encoder to match
- in absence of the above, the configuration would be set to IEEE
  defaults.

>From our  understanding , this is essentially what most hardware/driver
combinations are doing today in the absence of a way for users to
control the behavior.

SHOW FEC option:
root@tor: ethtool --show-fec  swp1
FEC parameters for swp1:
Active FEC encodings: RS
Configured FEC encodings:  RS | BaseR

ETHTOOL DEVNAME output modification:

ethtool devname output:
root@tor:~# ethtool swp1
Settings for swp1:
root@hpe-7712-03:~# ethtool swp18
Settings for swp18:
Supported ports: [ FIBRE ]
Supported link modes:   4baseCR4/Full
4baseSR4/Full
4baseLR4/Full
10baseSR4/Full
10baseCR4/Full
10baseLR4_ER4/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: [RS | BaseR | None | Not reported]
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: [RS | BaseR | None | Not reported]
 One or more FEC modes
Speed: 10Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 106
Transceiver: internal
Auto-negotiation: off
Link detected: yes

This patch includes following changes
a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
  the new get_fecparam/set_fecparam callbacks, provides support
  for configuration of forward error correction modes.
b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
  are defined so that users can configure these fec modes for supported
  and advertising fields as part of link autonegotiation.

Signed-off-by: Vidya Sagar Ravipati 
---
 include/linux/ethtool.h  |  4 
 include/uapi/linux/ethtool.h | 48 +++-
 net/core/ethtool.c   | 34 +++
 3 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 9ded8c6..79a0bab 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -372,5 +372,9 @@ struct ethtool_ops {
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
  const struct ethtool_link_ksettings *);
+   int (*get_fecparam)(struct net_device *,
+ struct ethtool_fecparam *);
+   int (*set_fecparam)(struct net_device *,
+ struct ethtool_fecparam *);
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 3dc91a4..a7ab49e 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1238,6 +1238,47 @@ struct ethtool_per_queue_op {
chardata[];
 };
 
+/**
+ * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters
+ * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM
+ * @active_fec: FEC mode which is active on porte
+ * @fec: Bitmask of supported/configured FEC modes
+ * @rsvd: Reserved for future extensions. i.e FEC bypass feature.
+ *
+ * Drivers should reject a non-zero setting of @autoneg when
+ * autoneogotiation is disabled (or not supported) for the link.
+ *
+ */
+struct ethtool_fecparam {
+   __u32   cmd;
+   /* bitmask of FEC modes */
+   __u32   active_fec;
+   __u32   fec;
+   __u32   reserved;
+};
+
+/*

RE: [PATCH] netlink: move nla_put_{u8,u16,u32} out of line

2017-02-10 Thread David Laight
From: David Miller
> Sent: 09 February 2017 21:31
> From: Arnd Bergmann 
> Date: Wed,  8 Feb 2017 22:18:26 +0100
> 
> > When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large
> > stack frames in some functions. This goes unnoticed normally because
> > CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit
> > 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with
> > KASAN=y").
> >
> > The kernelci.org build bot however has the warning enabled and that led
> > me to investigate it a little further, as every build produces these 
> > warnings:
> >
> > net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is 
> > larger than 2048 bytes [-
...
> >
> > It turns out that there is a relatively simple workaround for the netlink
> > users that currently use a local variable in order to do the type 
> > conversion:
> > Moving the three functions (for each of the typical sizes) to lib/nlattr.c
> > avoids using local variables in the caller, which drastically reduces the
> > stack usage for nl80211 and br_netlink.
> >
> > It would be good if we could enable the frame size check after that again,
> > but that should be a separate patch and it requires some more testing
> > to see which the largest acceptable frame size should be.
...
> You should only extern these things when KASAN is enabled.
> 
> The reason is that uninlining these routines makes attribute emission
> more expensive and for some applications performance of this matters.

If performance of nla_put() matters, then adding 1, 2 and 4 byte attributes
ought to be doable without writing the values to memory and later doing
(I presume) a memcpy().

I also can't help feeling that the gcc KASAN stuff needs some way to
annotate an 'extern' to say that a value passed by reference isn't
treated as an array.
Otherwise I suspect you get a lot of bloat all over the place.

David



[RFC PATCH net-next v2] net: ethtool: add support for forward error correction modes

2017-02-10 Thread Vidya Sagar Ravipati
From: Vidya Sagar Ravipati 

Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds.
Various networking devices which support 25G/40G/100G provides ability
to manage supported FEC modes and the lack of FEC encoding control and
reporting today is a source for itneroperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of base link
codeword.

This patch set intends to provide option under ethtool to manage and report
FEC encoding settings for networking devices as per IEEE 802.3 bj, bm and by
specs.

set-fec/show-fec option(s) are  designed to provide  control and report
the FEC encoding on the link.

SET FEC option:
root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]

Encoding: Types of encoding
Off:  Turning off any encoding
RS :  enforcing RS-FEC encoding on supported speeds
BaseR  :  enforcing Base R encoding on supported speeds
Auto   :  IEEE defaults for the speed/medium combination

Here are a few examples of what we would expect if encoding=auto:
- if autoneg is on, we are  expecting FEC to be negotiated as on or off
  as long as protocol supports it
- if the hardware is capable of detecting the FEC encoding on it's
  receiver it will reconfigure its encoder to match
- in absence of the above, the configuration would be set to IEEE
  defaults.

>From our  understanding , this is essentially what most hardware/driver
combinations are doing today in the absence of a way for users to
control the behavior.

SHOW FEC option:
root@tor: ethtool --show-fec  swp1
FEC parameters for swp1:
Active FEC encodings: RS
Configured FEC encodings:  RS | BaseR

ETHTOOL DEVNAME output modification:

ethtool devname output:
root@tor:~# ethtool swp1
Settings for swp1:
root@hpe-7712-03:~# ethtool swp18
Settings for swp18:
Supported ports: [ FIBRE ]
Supported link modes:   4baseCR4/Full
4baseSR4/Full
4baseLR4/Full
10baseSR4/Full
10baseCR4/Full
10baseLR4_ER4/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: [RS | BaseR | None | Not reported]
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: [RS | BaseR | None | Not reported]
 One or more FEC modes
Speed: 10Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 106
Transceiver: internal
Auto-negotiation: off
Link detected: yes

This patch includes following changes
a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
  the new get_fecparam/set_fecparam callbacks, provides support
  for configuration of forward error correction modes.
b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
  are defined so that users can configure these fec modes for supported
  and advertising fields as part of link autonegotiation.

Signed-off-by: Vidya Sagar Ravipati 
---
 include/linux/ethtool.h  |  4 
 include/uapi/linux/ethtool.h | 48 +++-
 net/core/ethtool.c   | 34 +++
 3 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 9ded8c6..79a0bab 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -372,5 +372,9 @@ struct ethtool_ops {
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
  const struct ethtool_link_ksettings *);
+   int (*get_fecparam)(struct net_device *,
+ struct ethtool_fecparam *);
+   int (*set_fecparam)(struct net_device *,
+ struct ethtool_fecparam *);
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 3dc91a4..38dbfeb 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1238,6 +1238,47 @@ struct ethtool_per_queue_op {
chardata[];
 };
 
+/**
+ * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters
+ * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM
+ * @active_fec: FEC mode which is active on porte
+ * @fec: Bitmask of supported/configured FEC modes
+ * @rsvd: Reserved for future extensions. i.e FEC bypass feature.
+ *
+ * Drivers should reject a non-zero setting of @autoneg when
+ * autoneogotiation is disabled (or not supported) for the link.
+ *
+ */
+struct ethtool_fecparam {
+   __u32   cmd;
+   /* bitmask of FEC modes */
+   __u32   active_fec;
+   __u32   fec;
+   __u32   reserved;
+};
+
+/*

[PATCH] xen-netback: vif counters from int/long to u64

2017-02-10 Thread Mart van Santen
This patch fixes an issue where the type of counters in the queue(s)
and interface are not in sync (queue counters are int, interface
counters are long), causing incorrect reporting of tx/rx values
of the vif interface and unclear counter overflows.
This patch sets both counters to the u64 type.

Signed-off-by: Mart van Santen 
---
 drivers/net/xen-netback/common.h| 8 
 drivers/net/xen-netback/interface.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 3ce1f7d..530586b 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -113,10 +113,10 @@ struct xenvif_stats {
 * A subset of struct net_device_stats that contains only the
 * fields that are updated in netback.c for each queue.
 */
-   unsigned int rx_bytes;
-   unsigned int rx_packets;
-   unsigned int tx_bytes;
-   unsigned int tx_packets;
+   u64 rx_bytes;
+   u64 rx_packets;
+   u64 tx_bytes;
+   u64 tx_packets;
 
/* Additional stats used by xenvif */
unsigned long rx_gso_checksum_fixup;
diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 5795213..50fa169 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -221,10 +221,10 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned long rx_bytes = 0;
-   unsigned long rx_packets = 0;
-   unsigned long tx_bytes = 0;
-   unsigned long tx_packets = 0;
+   u64 rx_bytes = 0;
+   u64 rx_packets = 0;
+   u64 tx_bytes = 0;
+   u64 tx_packets = 0;
unsigned int index;
 
spin_lock(&vif->lock);
-- 
2.1.4



[PATCH V5 for-next 03/21] RDMA/bnxt_re: register with the NIC driver

2017-02-10 Thread Selvin Xavier
This patch handles the registration with the bnxt_en driver.
The bnxt_re driver first registers with netdev notifier chain and upon
receiving the NETDEV_REGISTER event, it registers with bnxt_en driver.

1. bnxt_en's ulp_probe function returns a structure that contains
   information  about the device and additional entry points.
2. bnxt_en driver returns 'struct bnxt_eth_dev' that contains set
   of operation  vectors that bnxt_re driver invokes later.
3. bnxt_request_msix() allows the bnxt_re driver to specify the
   number of MSI-X vectors that are needed.
4. bnxt_send_fw_msg () is used to send messages to the FW
5. bnxt_register_async_events() is used to register for async
   event callbacks.

v2: Remove some sparse warning. Also, remove some unused code from unreg
path.
v3: Removed condition checks for rdev reported during static code analysis.
Check the return value of try_module_get while getting bnxt_en
reference.

v5: rdev->ref_count is not used for any check. So removing the code that
operates on this ref_count. This will take care of the comments from
Doug and Leon. Code is refactored to avoid using 'goto' in the middle
of a switch/case construct. This patch also adds a check for RoCE
support in the device before proceeding with initialization.

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h |  49 
 drivers/infiniband/hw/bnxt_re/main.c| 424 
 2 files changed, 473 insertions(+)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index 6ba013d..8ff2787 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -43,4 +43,53 @@
 #define ROCE_DRV_MODULE_VERSION"1.0.0"
 
 #define BNXT_RE_DESC   "Broadcom NetXtreme-C/E RoCE Driver"
+
+struct bnxt_re_work {
+   struct work_struct  work;
+   unsigned long   event;
+   struct bnxt_re_dev  *rdev;
+   struct net_device   *vlan_dev;
+};
+
+#define BNXT_RE_MIN_MSIX   2
+#define BNXT_RE_MAX_MSIX   16
+struct bnxt_re_dev {
+   struct ib_deviceibdev;
+   struct list_headlist;
+   unsigned long   flags;
+#define BNXT_RE_FLAG_NETDEV_REGISTERED 0
+#define BNXT_RE_FLAG_IBDEV_REGISTERED  1
+#define BNXT_RE_FLAG_GOT_MSIX  2
+#define BNXT_RE_FLAG_RCFW_CHANNEL_EN   8
+#define BNXT_RE_FLAG_QOS_WORK_REG  16
+   struct net_device   *netdev;
+   unsigned intversion, major, minor;
+   struct bnxt_en_dev  *en_dev;
+   struct bnxt_msix_entry  msix_entries[BNXT_RE_MAX_MSIX];
+   int num_msix;
+
+   int id;
+
+   atomic_tqp_count;
+   struct mutexqp_lock;/* protect qp list */
+   struct list_headqp_list;
+
+   atomic_tcq_count;
+   atomic_tsrq_count;
+   atomic_tmr_count;
+   atomic_tmw_count;
+   /* Max of 2 lossless traffic class supported per port */
+   u16 cosq[2];
+};
+
+#define to_bnxt_re_dev(ptr, member)\
+   container_of((ptr), struct bnxt_re_dev, member)
+
+static inline struct device *rdev_to_dev(struct bnxt_re_dev *rdev)
+{
+   if (rdev)
+   return  &rdev->ibdev.dev;
+   return NULL;
+}
+
 #endif
diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index dbe642f..eb3dc81 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -38,10 +38,24 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "bnxt_ulp.h"
+#include "roce_hsi.h"
 #include "bnxt_re.h"
+#include "bnxt.h"
 static char version[] =
BNXT_RE_DESC " v" ROCE_DRV_MODULE_VERSION "\n";
 
@@ -55,6 +69,360 @@ static struct list_head bnxt_re_dev_list = 
LIST_HEAD_INIT(bnxt_re_dev_list);
 /* Mutex to protect the list of bnxt_re devices added */
 static DEFINE_MUTEX(bnxt_re_dev_lock);
 static struct workqueue_struct *bnxt_re_wq;
+
+/* for handling bnxt_en callbacks later */
+static void bnxt_re_stop(void *p)
+{
+}
+
+static void bnxt_re_start(void *p)
+{
+}
+
+static void bnxt_re_sriov_config(void *p, int num_vfs)
+{
+}
+
+static struct bnxt_ulp_ops bnxt_re_ulp_ops = {
+   .ulp_async_notifier = NULL,
+   .ulp_stop = bnxt_re_stop,
+   .ulp_start = bnxt_re_start,
+   .ulp_sriov_config = bnxt_re_sri

[PATCH net-next,v2] gtp: add MAINTAINERS

2017-02-10 Thread Pablo Neira Ayuso
From: Pablo Neira 

Add maintainers for this tunnel driver. Include main osmocom.org mailist
list too.

Signed-off-by: Pablo Neira Ayuso 
---
v2: Harald suggests osmocom-net-g...@lists.osmocom.org is better ML for this.

 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5864bbd99f8f..e2dfc4377f4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5638,6 +5638,14 @@ T:   git git://linuxtv.org/media_tree.git
 S: Odd Fixes
 F: drivers/media/usb/gspca/
 
+GTP (GPRS Tunneling Protocol)
+M: Pablo Neira Ayuso 
+M: Harald Welte 
+L: osmocom-net-g...@lists.osmocom.org
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git
+S: Maintained
+F: drivers/net/gtp.c
+
 GUID PARTITION TABLE (GPT)
 M: Davidlohr Bueso 
 L: linux-...@vger.kernel.org
-- 
2.1.4



Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency

2017-02-10 Thread Arnd Bergmann
On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli  wrote:
> On 02/09/2017 07:08 AM, Arnd Bergmann wrote:
> I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and
> I was not able to reproduce this, what am I missing?

In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and
we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in
the header as a oneline workaround, but I think that would be more confusing
to real users that try to use CONFIG_PHY=m without realizing why they lose
access to their switch.

Arnd


[PATCH net] packet: Do not call fanout_release from atomic contexts

2017-02-10 Thread Anoob Soman
Commit 6664498280cf ("packet: call fanout_release, while UNREGISTERING a
netdev"), unfortunately, introduced the following issues.

1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
rcu_read-side critical section. rcu_read_lock disables preemption, most often,
which prohibits calling sleeping functions.

[  ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side 
critical section!
[  ]
[  ] rcu_scheduler_active = 1, debug_locks = 0
[  ] 4 locks held by ovs-vswitchd/1969:
[  ]  #0:  (cb_lock){++}, at: [] genl_rcv+0x19/0x40
[  ]  #1:  (ovs_mutex){+.+.+.}, at: [] 
ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
[  ]  #2:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
[  ]  #3:  (rcu_read_lock){..}, at: [] 
packet_notifier+0x5/0x3f0
[  ]
[  ] Call Trace:
[  ]  [] dump_stack+0x85/0xc4
[  ]  [] lockdep_rcu_suspicious+0x107/0x110
[  ]  [] ___might_sleep+0x57/0x210
[  ]  [] __might_sleep+0x70/0x90
[  ]  [] mutex_lock_nested+0x3c/0x3a0
[  ]  [] ? vprintk_default+0x1f/0x30
[  ]  [] ? printk+0x4d/0x4f
[  ]  [] fanout_release+0x1d/0xe0
[  ]  [] packet_notifier+0x2f9/0x3f0

2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
"sleeping function called from invalid context"

[  ] BUG: sleeping function called from invalid context at 
kernel/locking/mutex.c:620
[  ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
[  ] INFO: lockdep is turned off.
[  ] Call Trace:
[  ]  [] dump_stack+0x85/0xc4
[  ]  [] ___might_sleep+0x202/0x210
[  ]  [] __might_sleep+0x70/0x90
[  ]  [] mutex_lock_nested+0x3c/0x3a0
[  ]  [] fanout_release+0x1d/0xe0
[  ]  [] packet_notifier+0x2f9/0x3f0

3. calling dev_remove_pack(&fanout->prot_hook), from inside
spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
-> synchronize_net(), which might sleep.

[  ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x0002
[  ] INFO: lockdep is turned off.
[  ] Call Trace:
[  ]  [] dump_stack+0x85/0xc4
[  ]  [] __schedule_bug+0x64/0x73
[  ]  [] __schedule+0x6b/0xd10
[  ]  [] schedule+0x6b/0x80
[  ]  [] schedule_timeout+0x38d/0x410
[  ]  [] synchronize_sched_expedited+0x53d/0x810
[  ]  [] synchronize_rcu_expedited+0xe/0x10
[  ]  [] synchronize_net+0x35/0x50
[  ]  [] dev_remove_pack+0x13/0x20
[  ]  [] fanout_release+0xbe/0xe0
[  ]  [] packet_notifier+0x2f9/0x3f0

4. fanout_release() races with calls from different CPU.

To fix the above problems, remove the call to fanout_release() under
rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
__fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
fanout->prot_hook is removed as well.

Signed-off-by: Anoob Soman 
---
 net/packet/af_packet.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d56ee46..0eb7230 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1497,6 +1497,8 @@ static void __fanout_link(struct sock *sk, struct 
packet_sock *po)
f->arr[f->num_members] = sk;
smp_wmb();
f->num_members++;
+   if (f->num_members == 1)
+   dev_add_pack(&f->prot_hook);
spin_unlock(&f->lock);
 }
 
@@ -1513,6 +1515,8 @@ static void __fanout_unlink(struct sock *sk, struct 
packet_sock *po)
BUG_ON(i >= f->num_members);
f->arr[i] = f->arr[f->num_members - 1];
f->num_members--;
+   if (f->num_members == 0)
+   __dev_remove_pack(&f->prot_hook);
spin_unlock(&f->lock);
 }
 
@@ -1687,7 +1691,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 
type_flags)
match->prot_hook.func = packet_rcv_fanout;
match->prot_hook.af_packet_priv = match;
match->prot_hook.id_match = match_fanout_group;
-   dev_add_pack(&match->prot_hook);
list_add(&match->list, &fanout_list);
}
err = -EINVAL;
@@ -1726,7 +1729,6 @@ static void fanout_release(struct sock *sk)
 
if (atomic_dec_and_test(&f->sk_ref)) {
list_del(&f->list);
-   dev_remove_pack(&f->prot_hook);
fanout_release_data(f);
kfree(f);
}
@@ -3900,7 +3902,6 @@ static int packet_notifier(struct notifier_block *this,
}
if (msg == NETDEV_UNREGISTER) {
packet_cached_dev_reset(po);
-   fanout_release(sk);
po->ifindex = -1;
if (po->prot_hook.dev)
dev_put(po->prot_hook.dev);
-- 
2.7.4



Re: linux-next: build failure after merge of the selinux tree

2017-02-10 Thread Paul Moore
On Thu, Feb 9, 2017 at 9:50 PM, Stephen Rothwell  wrote:
> Hi all,
>
> On Tue, 10 Jan 2017 12:27:03 +1100 Stephen Rothwell  
> wrote:
>>
>> After merging the selinux tree, today's linux-next build (x86_64
>> allmodconfig) failed like this:
>>
>> In file included from /home/sfr/next/next/security/selinux/avc.c:35:0:
>> /home/sfr/next/next/security/selinux/include/classmap.h:242:2: error: #error 
>> New address family defined, please update secclass_map.
>>  #error New address family defined, please update secclass_map.
>>   ^
>> /home/sfr/next/next/security/selinux/hooks.c: In function 
>> 'socket_type_to_security_class':
>> /home/sfr/next/next/security/selinux/hooks.c:1409:2: error: #error New 
>> address family defined, please update this function.
>>
>> Caused by commit
>>
>>   da69a5306ab9 ("selinux: support distinctions among all network address 
>> families")
>>
>> interacting with commit
>>
>>   ac7138746e14 ("smc: establish new socket family")
>>
>> from the net-next tree.
>>
>> I added the following merge fix patch:
>>
>> From: Stephen Rothwell 
>> Date: Tue, 10 Jan 2017 12:22:21 +1100
>> Subject: [PATCH] selinux: merge fix for "smc: establish new socket family"
>>
>> Signed-off-by: Stephen Rothwell 
>> ---
>>  security/selinux/hooks.c| 4 +++-
>>  security/selinux/include/classmap.h | 4 +++-
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index bada3cd42b9c..712fd0e7c91d 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -1405,7 +1405,9 @@ static inline u16 socket_type_to_security_class(int 
>> family, int type, int protoc
>>   return SECCLASS_KCM_SOCKET;
>>   case PF_QIPCRTR:
>>   return SECCLASS_QIPCRTR_SOCKET;
>> -#if PF_MAX > 43
>> + case PF_SMC:
>> + return SECCLASS_SMC_SOCKET;
>> +#if PF_MAX > 44
>>  #error New address family defined, please update this function.
>>  #endif
>>   }
>> diff --git a/security/selinux/include/classmap.h 
>> b/security/selinux/include/classmap.h
>> index 0dfd26d0b8d8..40f1d4f8bc2a 100644
>> --- a/security/selinux/include/classmap.h
>> +++ b/security/selinux/include/classmap.h
>> @@ -235,9 +235,11 @@ struct security_class_mapping secclass_map[] = {
>> { COMMON_SOCK_PERMS, NULL } },
>>   { "qipcrtr_socket",
>> { COMMON_SOCK_PERMS, NULL } },
>> + { "smc_socket",
>> +   { COMMON_SOCK_PERMS, NULL } },
>>   { NULL }
>>};
>>
>> -#if PF_MAX > 43
>> +#if PF_MAX > 44
>>  #error New address family defined, please update secclass_map.
>>  #endif
>> --
>> 2.10.2
>
> This now applies when I merge the security tree (as it merged the
> selinux tree, presumably).

Yes, James just pulled the SELinux tree yesterday.

-- 
paul moore
www.paul-moore.com


[PATCH net-next] mlx4: do not fire tasklet unless necessary

2017-02-10 Thread Eric Dumazet
From: Eric Dumazet 

All rx and rx netdev interrupts are handled by respectively
by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI.

But mlx4_eq_int() also fires a tasklet to service all items that were
queued via mlx4_add_cq_to_tasklet(), but this handler was not called
unless user cqe was handled.

This is very confusing, as "mpstat -I SCPU ..." show huge number of
tasklet invocations.

This patch saves this overhead, by carefully firing the tasklet directly
from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ.

Signed-off-by: Eric Dumazet 
Cc: Tariq Toukan 
Cc: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx4/cq.c |6 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c |9 +
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c 
b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 
6b8635378f1fcb2aae4e8ac390bcd09d552c2256..fa6d2354a0e910ee160863e3cbe21a512d77bf03
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -81,8 +81,9 @@ void mlx4_cq_tasklet_cb(unsigned long data)
 
 static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
 {
-   unsigned long flags;
struct mlx4_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
+   unsigned long flags;
+   bool kick;
 
spin_lock_irqsave(&tasklet_ctx->lock, flags);
/* When migrating CQs between EQs will be implemented, please note
@@ -92,7 +93,10 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
 */
if (list_empty_careful(&cq->tasklet_ctx.list)) {
atomic_inc(&cq->refcount);
+   kick = list_empty(&tasklet_ctx->list);
list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
+   if (kick)
+   tasklet_schedule(&tasklet_ctx->task);
}
spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 
0509996957d9664b612358dd805359f4bc67b8dc..39232b6a974f4b4b961d3b0b8634f04e6b9d0caa
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -494,7 +494,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct mlx4_eq 
*eq)
 {
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_eqe *eqe;
-   int cqn = -1;
+   int cqn;
int eqes_found = 0;
int set_ci = 0;
int port;
@@ -840,13 +840,6 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
mlx4_eq *eq)
 
eq_set_ci(eq, 1);
 
-   /* cqn is 24bit wide but is initialized such that its higher bits
-* are ones too. Thus, if we got any event, cqn's high bits should be 
off
-* and we need to schedule the tasklet.
-*/
-   if (!(cqn & ~0xff))
-   tasklet_schedule(&eq->tasklet_ctx.task);
-
return eqes_found;
 }
 




Re: [PATCH net-next v3 06/10] net: dsa: Migrate to device_find_class()

2017-02-10 Thread Greg KH
On Thu, Jan 19, 2017 at 04:51:55PM +, Russell King - ARM Linux wrote:
> (This is mainly for Greg's benefit to help him understand the issue.)
> 
> I think the diagram you gave initially made this confusing, as it
> talks about a CPU(sic) producing the "RGMII" and "MII-MGMT".
> 
> Let's instead show a better representation that hopefully helps Greg
> understand networking. :)
> 
> 
>   CPU
> System <-B->  Ethernet controller <-P-> } PHY <---> network cable
> } - - - - - - - or - - - - - - -
>   MDIO bus ---M---> } Switch <-P-> PHYs <--> network
>   `M^cables
> 
> 'B' can be an on-SoC bus or something like PCI.
> 
> 'P' are the high-speed connectivity between the ethernet controller and
> PHY which carries the packet data.  It has no addressing, it's a point
> to point link.  RGMII is just one wiring example, there are many
> different interfaces there (SGMII, AUI, XAUI, XGMII to name a few.)
> 
> 'M' are the MDIO bus, which is the bus by which ethernet PHYs and
> switches can be identified and controlled.
> 
> The MDIO bus has a bus_type, has host drivers which are sometimes
> part of the ethernet controller, but can also be stand-alone devices
> shared between multiple ethernet controllers.
> 
> PHYs are a kind of MDIO device which are members of the MDIO bus
> type.  Each PHY (and switch) has a numerical address, and identifying
> numbers within its register set which identifies the manufacturer
> and device type.  We have device_driver objects for these.
> 
> Expanding the above diagram to make it (hopefully) even clearer,
> we can have this classic setup:
> 
>   CPU
> System <-B-> Ethernet controller <-P-> PHY <---> network cable
>  MDIO bus ---M--^
> 
> Or, in the case of two DSA switches attached to an Ethernet controller:
> 
>  ||
> System <-B-> Ethernet controller <-P-> Switch <-P-> PHY1 <--> network cable
>  MDIO bus +--M--->   1<-P-> PHY2 <--> network cable
>   |  |...|
>   |  |<-P-> PHYn <--> network cable
>   |  |^...|  |
>   |   |  `---M---'
>   |   P
>   |   |
>   |  |v~~~|
>   `--> Switch <-P-> PHY1 <--> network cable
>  |   2...|
>  |<-P-> PHYn <--> network cable
>  ||  |
>  `---M---'
> 
> The problem that the DSA guys are trying to deal with is how to
> represent the link between the DSA switches (which are devices
> sitting off their controlling bus - the MDIO bus) and the ethernet
> controller associated with that collection of devices, be it a
> switch or PHY.

Why do they have to represent that link?  This is a driver that somehow
binds the two togther in some sort of "control plane"?

> Merely changing the parent/child relationships to try and solve
> one issue just creates exactly the same problem elsewhere.

Fair enough.

> So, I hope with these diagrams, you can see that trying to make
> the ethernet controller a child device of the DSA switches
> means that (eg) it's no longer a PCI device, which is rather
> absurd, especially when considering that what happens to the
> right of the ethernet controller in the diagrams above is
> normally external chips to the SoC or ethernet device.

Ok, thanks for the long explainations and diagrams.

_BUT_ my original point remains.  These new functions you all are trying
to get into the driver core, do NOT do what they say they are doing.
They are mucking around with a "known topology" and just happen to work
because the device you are trying to find shares a common parent with
yourself.

That is not what the function says it does, and as such, I do not want
that function in the driver core at all.

If you wish to keep it in your own subsystem, that's fine, but call it
what it really is:
hack_to_find_peer_device_on_random_bus()
and pass in a _real_ pointer to a bus type.  Not some random string
please.

Or better yet, have the DSA code accept pointers to the two devices in
the first place, so it "knows" what to do here in a much better way.
Right now it is a bad hack.  You all can not argue that is not true.

thanks,

greg k-h


[PATCH V5 for-next 21/21] RDMA/bnxt_re: Add bnxt_re driver build support

2017-02-10 Thread Selvin Xavier
Makefile and Kconfig changes for enabling bnxt_re compilation

v3: Adds list of MAINTAINERS of bnxt_re driver. Removes bnxt_re_debugfs.c
from Makefile as this file is no longer present

Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 MAINTAINERS| 11 +++
 drivers/infiniband/Kconfig |  2 ++
 drivers/infiniband/hw/Makefile |  1 +
 drivers/infiniband/hw/bnxt_re/Kconfig  |  9 +
 drivers/infiniband/hw/bnxt_re/Makefile |  6 ++
 5 files changed, 29 insertions(+)
 create mode 100644 drivers/infiniband/hw/bnxt_re/Kconfig
 create mode 100644 drivers/infiniband/hw/bnxt_re/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 5f0420a..468d2e8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2808,6 +2808,17 @@ L:   linux-arm-ker...@lists.infradead.org (moderated 
for non-subscribers)
 S: Maintained
 F: arch/arm64/boot/dts/broadcom/vulcan*
 
+BROADCOM NETXTREME-E ROCE DRIVER
+M: Selvin Xavier 
+M: Devesh Sharma 
+M: Somnath Kotur 
+M: Sriharsha Basavapatna 
+L: linux-r...@vger.kernel.org
+W: http://www.broadcom.com
+S: Supported
+F: drivers/infiniband/hw/bnxt_re/
+F: include/uapi/rdma/bnxt_re-abi.h
+
 BROCADE BFA FC SCSI DRIVER
 M: Anil Gurumurthy 
 M: Sudarsana Kalluru 
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 6709173..66f8602 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -92,4 +92,6 @@ source "drivers/infiniband/hw/hfi1/Kconfig"
 
 source "drivers/infiniband/hw/qedr/Kconfig"
 
+source "drivers/infiniband/hw/bnxt_re/Kconfig"
+
 endif # INFINIBAND
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index ed553de..34c93ab 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_INFINIBAND_USNIC)+= usnic/
 obj-$(CONFIG_INFINIBAND_HFI1)  += hfi1/
 obj-$(CONFIG_INFINIBAND_HNS)   += hns/
 obj-$(CONFIG_INFINIBAND_QEDR)  += qedr/
+obj-$(CONFIG_INFINIBAND_BNXT_RE)   += bnxt_re/
diff --git a/drivers/infiniband/hw/bnxt_re/Kconfig 
b/drivers/infiniband/hw/bnxt_re/Kconfig
new file mode 100644
index 000..cd0175e
--- /dev/null
+++ b/drivers/infiniband/hw/bnxt_re/Kconfig
@@ -0,0 +1,9 @@
+config INFINIBAND_BNXT_RE
+tristate "Broadcom Netxtreme HCA support"
+depends on ETHERNET && NETDEVICES && PCI && INET
+select NET_VENDOR_BROADCOM
+select BNXT
+---help---
+ This driver supports Broadcom NetXtreme-E 10/25/40/50 gigabit
+ RoCE HCAs.  To compile this driver as a module, choose M here:
+ the module will be called bnxt_re.
diff --git a/drivers/infiniband/hw/bnxt_re/Makefile 
b/drivers/infiniband/hw/bnxt_re/Makefile
new file mode 100644
index 000..036f84e
--- /dev/null
+++ b/drivers/infiniband/hw/bnxt_re/Makefile
@@ -0,0 +1,6 @@
+
+ccflags-y := -Idrivers/net/ethernet/broadcom/bnxt
+obj-$(CONFIG_INFINIBAND_BNXT_RE) += bnxt_re.o
+bnxt_re-y := main.o ib_verbs.o \
+qplib_res.o qplib_rcfw.o   \
+qplib_sp.o qplib_fp.o
-- 
2.5.5



[PATCH] orinoco: Use net_device_stats from struct net_device

2017-02-10 Thread Tobias Klauser
Instead of using a private copy of struct net_device_stats in
struct orinoco_private, use stats from struct net_device. Also remove
the now unnecessary .ndo_get_stats function.

Signed-off-by: Tobias Klauser 
---
 drivers/net/wireless/intersil/orinoco/main.c   | 27 ++
 drivers/net/wireless/intersil/orinoco/orinoco.h|  2 --
 .../net/wireless/intersil/orinoco/orinoco_usb.c|  6 ++---
 3 files changed, 9 insertions(+), 26 deletions(-)

diff --git a/drivers/net/wireless/intersil/orinoco/main.c 
b/drivers/net/wireless/intersil/orinoco/main.c
index 9d96b7c928f7..28cf97489001 100644
--- a/drivers/net/wireless/intersil/orinoco/main.c
+++ b/drivers/net/wireless/intersil/orinoco/main.c
@@ -294,14 +294,6 @@ int orinoco_stop(struct net_device *dev)
 }
 EXPORT_SYMBOL(orinoco_stop);
 
-struct net_device_stats *orinoco_get_stats(struct net_device *dev)
-{
-   struct orinoco_private *priv = ndev_priv(dev);
-
-   return &priv->stats;
-}
-EXPORT_SYMBOL(orinoco_get_stats);
-
 void orinoco_set_multicast_list(struct net_device *dev)
 {
struct orinoco_private *priv = ndev_priv(dev);
@@ -433,7 +425,7 @@ EXPORT_SYMBOL(orinoco_process_xmit_skb);
 static netdev_tx_t orinoco_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct orinoco_private *priv = ndev_priv(dev);
-   struct net_device_stats *stats = &priv->stats;
+   struct net_device_stats *stats = &dev->stats;
struct hermes *hw = &priv->hw;
int err = 0;
u16 txfid = priv->txfid;
@@ -593,10 +585,7 @@ static void __orinoco_ev_alloc(struct net_device *dev, 
struct hermes *hw)
 
 static void __orinoco_ev_tx(struct net_device *dev, struct hermes *hw)
 {
-   struct orinoco_private *priv = ndev_priv(dev);
-   struct net_device_stats *stats = &priv->stats;
-
-   stats->tx_packets++;
+   dev->stats.tx_packets++;
 
netif_wake_queue(dev);
 
@@ -605,8 +594,7 @@ static void __orinoco_ev_tx(struct net_device *dev, struct 
hermes *hw)
 
 static void __orinoco_ev_txexc(struct net_device *dev, struct hermes *hw)
 {
-   struct orinoco_private *priv = ndev_priv(dev);
-   struct net_device_stats *stats = &priv->stats;
+   struct net_device_stats *stats = &dev->stats;
u16 fid = hermes_read_regn(hw, TXCOMPLFID);
u16 status;
struct hermes_txexc_data hdr;
@@ -662,7 +650,7 @@ static void __orinoco_ev_txexc(struct net_device *dev, 
struct hermes *hw)
 void orinoco_tx_timeout(struct net_device *dev)
 {
struct orinoco_private *priv = ndev_priv(dev);
-   struct net_device_stats *stats = &priv->stats;
+   struct net_device_stats *stats = &dev->stats;
struct hermes *hw = &priv->hw;
 
printk(KERN_WARNING "%s: Tx timeout! "
@@ -749,7 +737,7 @@ static void orinoco_rx_monitor(struct net_device *dev, u16 
rxfid,
int len;
struct sk_buff *skb;
struct orinoco_private *priv = ndev_priv(dev);
-   struct net_device_stats *stats = &priv->stats;
+   struct net_device_stats *stats = &dev->stats;
struct hermes *hw = &priv->hw;
 
len = le16_to_cpu(desc->data_len);
@@ -840,7 +828,7 @@ static void orinoco_rx_monitor(struct net_device *dev, u16 
rxfid,
 void __orinoco_ev_rx(struct net_device *dev, struct hermes *hw)
 {
struct orinoco_private *priv = ndev_priv(dev);
-   struct net_device_stats *stats = &priv->stats;
+   struct net_device_stats *stats = &dev->stats;
struct iw_statistics *wstats = &priv->wstats;
struct sk_buff *skb = NULL;
u16 rxfid, status;
@@ -959,7 +947,7 @@ static void orinoco_rx(struct net_device *dev,
   struct sk_buff *skb)
 {
struct orinoco_private *priv = ndev_priv(dev);
-   struct net_device_stats *stats = &priv->stats;
+   struct net_device_stats *stats = &dev->stats;
u16 status, fc;
int length;
struct ethhdr *hdr;
@@ -2137,7 +2125,6 @@ static const struct net_device_ops orinoco_netdev_ops = {
.ndo_set_mac_address= eth_mac_addr,
.ndo_validate_addr  = eth_validate_addr,
.ndo_tx_timeout = orinoco_tx_timeout,
-   .ndo_get_stats  = orinoco_get_stats,
 };
 
 /* Allocate private data.
diff --git a/drivers/net/wireless/intersil/orinoco/orinoco.h 
b/drivers/net/wireless/intersil/orinoco/orinoco.h
index 5fa1c3e3713f..430862a6a24b 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco.h
+++ b/drivers/net/wireless/intersil/orinoco/orinoco.h
@@ -84,7 +84,6 @@ struct orinoco_private {
 
/* Net device stuff */
struct net_device *ndev;
-   struct net_device_stats stats;
struct iw_statistics wstats;
 
/* Hardware control variables */
@@ -206,7 +205,6 @@ int orinoco_process_xmit_skb(struct sk_buff *skb,
 /* Common ndo functions exported for reuse by orinoco_usb */
 int orinoco_open(struct net_device *dev);
 int orinoco_stop(struct net_device *dev);
-struct net_device_stats *orinoco_get_stats(struct net

Re: [PATCH] netlink: move nla_put_{u8,u16,u32} out of line

2017-02-10 Thread Arnd Bergmann
On Thu, Feb 9, 2017 at 6:00 PM, Arnd Bergmann  wrote:
> To reduce this risk, -fsanitize-address-use-after-scope is now split out
> into a separate Kconfig option, which cannot be selected at the same time
> as CONFIG_KASAN_INLINE, leading to stack frames that are smaller than 2
> kilobytes most of the time on x86_64. Now we can turn on the warning again
> that was disabled in commit 3f181b4 ("lib/Kconfig.debug: disable
> -Wframe-larger-than warnings with KASAN=y").
>
> The hope is that we can fix all code that still produces warnings, so far
> I have found four areas that are still affected (netlink, hisi-hns,
> dvb and tty/keyboard), and I have patches for all of them.

scratch that, my randconfig tests found too many remaining problems
with asan-stack=1 even when only one of CONFIG_KASAN_INLINE
and -fsanitize-address-use-after-scope is set.

I actually get results as bad as
fs/direct-io.c: In function 'do_direct_IO':
fs/direct-io.c:1057:1: error: the frame size of 7240 bytes is larger
than 2048 bytes [-Werror=frame-larger-than=]

with KASAN_OUTLINE=y and KASAN_EXTRA=n.

I need to investigate further to see if I can narrow it down to some
other configuration options.

   Arnd


[PATCH net-next] net: busy-poll: remove LL_FLUSH_FAILED and LL_FLUSH_BUSY

2017-02-10 Thread Eric Dumazet
From: Eric Dumazet 

Commit 79e7fff47b7b ("net: remove support for per driver
ndo_busy_poll()") made them obsolete.

Signed-off-by: Eric Dumazet 
---
 include/net/busy_poll.h |4 
 net/core/dev.c  |3 ---
 2 files changed, 7 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 
d73b849e29a6869e282103f329c3a02f4e1a6882..b8d637225a07ddd2c0183b75a42cd5c9c5a69851
 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -33,10 +33,6 @@ struct napi_struct;
 extern unsigned int sysctl_net_busy_read __read_mostly;
 extern unsigned int sysctl_net_busy_poll __read_mostly;
 
-/* return values from ndo_ll_poll */
-#define LL_FLUSH_FAILED-1
-#define LL_FLUSH_BUSY  -2
-
 static inline bool net_busy_loop_on(void)
 {
return sysctl_net_busy_poll;
diff --git a/net/core/dev.c b/net/core/dev.c
index 
0921609dfa81b70a13e0b4ca7852fde6ded7ed82..a6d9b1f071b330c3428c2a0a2d4eb13584911939
 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4998,9 +4998,6 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
LINUX_MIB_BUSYPOLLRXPACKETS, rc);
local_bh_enable();
 
-   if (rc == LL_FLUSH_FAILED)
-   break; /* permanent failure */
-
if (nonblock || !skb_queue_empty(&sk->sk_receive_queue) ||
busy_loop_timeout(end_time))
break;




Re: [PATCH net] packet: Do not call fanout_release from atomic contexts

2017-02-10 Thread Eric Dumazet
On Fri, 2017-02-10 at 12:39 +, Anoob Soman wrote:
> Commit 6664498280cf ("packet: call fanout_release, while UNREGISTERING a
> netdev"), unfortunately, introduced the following issues.
> 
> 1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
> rcu_read-side critical section. rcu_read_lock disables preemption, most often,
> which prohibits calling sleeping functions.
> 
> [  ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side 
> critical section!
> [  ]
> [  ] rcu_scheduler_active = 1, debug_locks = 0
> [  ] 4 locks held by ovs-vswitchd/1969:
> [  ]  #0:  (cb_lock){++}, at: [] genl_rcv+0x19/0x40
> [  ]  #1:  (ovs_mutex){+.+.+.}, at: [] 
> ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
> [  ]  #2:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
> [  ]  #3:  (rcu_read_lock){..}, at: [] 
> packet_notifier+0x5/0x3f0
> [  ]
> [  ] Call Trace:
> [  ]  [] dump_stack+0x85/0xc4
> [  ]  [] lockdep_rcu_suspicious+0x107/0x110
> [  ]  [] ___might_sleep+0x57/0x210
> [  ]  [] __might_sleep+0x70/0x90
> [  ]  [] mutex_lock_nested+0x3c/0x3a0
> [  ]  [] ? vprintk_default+0x1f/0x30
> [  ]  [] ? printk+0x4d/0x4f
> [  ]  [] fanout_release+0x1d/0xe0
> [  ]  [] packet_notifier+0x2f9/0x3f0
> 
> 2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
> "sleeping function called from invalid context"
> 
> [  ] BUG: sleeping function called from invalid context at 
> kernel/locking/mutex.c:620
> [  ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
> [  ] INFO: lockdep is turned off.
> [  ] Call Trace:
> [  ]  [] dump_stack+0x85/0xc4
> [  ]  [] ___might_sleep+0x202/0x210
> [  ]  [] __might_sleep+0x70/0x90
> [  ]  [] mutex_lock_nested+0x3c/0x3a0
> [  ]  [] fanout_release+0x1d/0xe0
> [  ]  [] packet_notifier+0x2f9/0x3f0
> 
> 3. calling dev_remove_pack(&fanout->prot_hook), from inside
> spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
> -> synchronize_net(), which might sleep.
> 
> [  ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x0002
> [  ] INFO: lockdep is turned off.
> [  ] Call Trace:
> [  ]  [] dump_stack+0x85/0xc4
> [  ]  [] __schedule_bug+0x64/0x73
> [  ]  [] __schedule+0x6b/0xd10
> [  ]  [] schedule+0x6b/0x80
> [  ]  [] schedule_timeout+0x38d/0x410
> [  ]  [] synchronize_sched_expedited+0x53d/0x810
> [  ]  [] synchronize_rcu_expedited+0xe/0x10
> [  ]  [] synchronize_net+0x35/0x50
> [  ]  [] dev_remove_pack+0x13/0x20
> [  ]  [] fanout_release+0xbe/0xe0
> [  ]  [] packet_notifier+0x2f9/0x3f0
> 
> 4. fanout_release() races with calls from different CPU.
> 
> To fix the above problems, remove the call to fanout_release() under
> rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
> netdev_run_todo will be happy that &dev->ptype_specific list is empty. In 
> order
> to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} 
> to
> __fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
> fanout->prot_hook is removed as well.
> 
> Signed-off-by: Anoob Soman 
> ---

Thanks for this work Anoob

For next submission please add these tags :

Fixes: 6664498280cf ("packet: call fanout_release, while UNREGISTERING a 
netdev")
Reported-by: Eric Dumazet 

Please read my comments below :

>  net/packet/af_packet.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index d56ee46..0eb7230 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1497,6 +1497,8 @@ static void __fanout_link(struct sock *sk, struct 
> packet_sock *po)
>   f->arr[f->num_members] = sk;
>   smp_wmb();
>   f->num_members++;
> + if (f->num_members == 1)
> + dev_add_pack(&f->prot_hook);
>   spin_unlock(&f->lock);
>  }
>  
> @@ -1513,6 +1515,8 @@ static void __fanout_unlink(struct sock *sk, struct 
> packet_sock *po)
>   BUG_ON(i >= f->num_members);
>   f->arr[i] = f->arr[f->num_members - 1];
>   f->num_members--;
> + if (f->num_members == 0)
> + __dev_remove_pack(&f->prot_hook);

Note that __dev_remove_pack(&f->prot_hook) wont respect one RCU grace
period.

>   spin_unlock(&f->lock);
>  }
>  
> @@ -1687,7 +1691,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 
> type_flags)
>   match->prot_hook.func = packet_rcv_fanout;
>   match->prot_hook.af_packet_priv = match;
>   match->prot_hook.id_match = match_fanout_group;
> - dev_add_pack(&match->prot_hook);
>   list_add(&match->list, &fanout_list);
>   }
>   err = -EINVAL;
> @@ -1726,7 +1729,6 @@ static void fanout_release(struct sock *sk)
>  
>   if (atomic_dec_and_test(&f->sk_ref)) {
>   list_del(&f->list);
> - dev_remove_pack(&f->prot_hook);

But here, a grace period was respected, before fanout_release_data() and
the problematic kfree(f)

You need to postpone these after one rcu grace perio

Re: [PATCH/RFC v3 net] ravb: unmap descriptors when freeing rings

2017-02-10 Thread Geert Uytterhoeven
On Wed, Jan 25, 2017 at 5:18 PM, Sergei Shtylyov
 wrote:
> On 01/24/2017 09:21 PM, Simon Horman wrote:
>
>> From: Kazuya Mizuguchi 
>>
>> "swiotlb buffer is full" errors occur after repeated initialisation of a
>> device - f.e. suspend/resume or ip link set up/down. This is because
>> memory
>> mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
>> is not released.  Resolve this problem by unmapping descriptors when
>> freeing rings.
>
>
>Could you look into the sh_eth driver which seems to have the same issue?

Indeed, after a few suspend/resume cycles on r8a7791/koelsch:

WARNING: CPU: 1 PID: 1699 at lib/dma-debug.c:517 add_dma_entry+0xfc/0x148
DMA-API: exceeded 7 overlapping mappings of cacheline 0x01a827e3

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Aw: RE: [PATCH net] net: hns: Fix the device being used for dma mapping during TX

2017-02-10 Thread Lino Sanfilippo


> Gesendet: Donnerstag, 09. Februar 2017 um 12:30 Uhr
> Von: "Salil Mehta" 
> An: "Lino Sanfilippo" 
> Cc: "da...@davemloft.net" , "Zhuangyuzeng (Yisen)" 
> , "mehta.salil@gmail.com" 
> , "netdev@vger.kernel.org" 
> , "linux-ker...@vger.kernel.org" 
> , Linuxarm , "Yankejian 
> (Hackim Yim)" 
> Betreff: RE: [PATCH net] net: hns: Fix the device being used for dma mapping 
> during TX
>
> > -Original Message-
> > From: Lino Sanfilippo [mailto:linosanfili...@gmx.de]
> > Sent: Thursday, February 09, 2017 10:25 AM
> > To: Salil Mehta
> > Cc: da...@davemloft.net; Salil Mehta; Zhuangyuzeng (Yisen);
> > mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> > ker...@vger.kernel.org; Linuxarm; Yankejian (Hackim Yim)
> > Subject: Re: [PATCH net] net: hns: Fix the device being used for dma
> > mapping during TX
> > 
> > Hi,
> > 
> > > From: Kejian Yan 
> > >
> > > This patch fixes the device being used to DMA map skb->data.
> > > Erroneous device assignment causes the crash when SMMU is enabled.
> > > This happens during TX since buffer gets DMA mapped with device
> > > correspondign to net_device and gets unmapped using the device
> > > related to DSAF.
> > >
> > > Signed-off-by: Kejian Yan 
> > > Reviewed-by: Yisen Zhuang 
> > > Signed-off-by: Salil Mehta 
> > > ---
> > >  drivers/net/ethernet/hisilicon/hns/hns_enet.c |2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> > b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> > > index 672b646..2b52a12 100644
> > > --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> > > +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> > > @@ -305,7 +305,7 @@ int hns_nic_net_xmit_hw(struct net_device *ndev,
> > >   struct hns_nic_ring_data *ring_data)
> > >  {
> > >   struct hns_nic_priv *priv = netdev_priv(ndev);
> > > - struct device *dev = priv->dev;
> > > + struct device *dev = ring_to_dev(ring_data->ring);
> > >   struct hnae_ring *ring = ring_data->ring;
> > >   struct netdev_queue *dev_queue;
> > >   struct skb_frag_struct *frag;
> > > --
> > 
> > I would say it should be the other way around: Use priv->dev for
> > mapping and
> > unmapping instead of ring_to_dev().
> Yes, you got it right. Ideally, it should be per-port and for
> legacy reasons we have it this way. In the current design, we have
> SMMU node per-dsaf and I guess we will not land in the right
> dma-ops if we use per-netdev platform-device/device right now.
> 

Ok, but how can it work if we set the DMA mask of the device object of 
the plaform_device (via dma_set_mask_and_coherent in the probe() function)
and do the actual mapping with a different device object? I dont know much about
 the low level dma handling but I can imagine that the mask is required to
do the mapping correctly. 

Regards,
Lino


RE: [Xen-devel] [PATCH] xen-netback: vif counters from int/long to u64

2017-02-10 Thread Paul Durrant
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Mart van Santen
> Sent: 10 February 2017 12:02
> To: Wei Liu ; Paul Durrant ;
> xen-de...@lists.xenproject.org; netdev@vger.kernel.org
> Cc: Mart van Santen 
> Subject: [Xen-devel] [PATCH] xen-netback: vif counters from int/long to u64
> 
> This patch fixes an issue where the type of counters in the queue(s)
> and interface are not in sync (queue counters are int, interface
> counters are long), causing incorrect reporting of tx/rx values
> of the vif interface and unclear counter overflows.
> This patch sets both counters to the u64 type.
> 
> Signed-off-by: Mart van Santen 

Looks sensible to me.

Reviewed-by: Paul Durrant 

> ---
>  drivers/net/xen-netback/common.h| 8 
>  drivers/net/xen-netback/interface.c | 8 
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
> netback/common.h
> index 3ce1f7d..530586b 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -113,10 +113,10 @@ struct xenvif_stats {
>* A subset of struct net_device_stats that contains only the
>* fields that are updated in netback.c for each queue.
>*/
> - unsigned int rx_bytes;
> - unsigned int rx_packets;
> - unsigned int tx_bytes;
> - unsigned int tx_packets;
> + u64 rx_bytes;
> + u64 rx_packets;
> + u64 tx_bytes;
> + u64 tx_packets;
> 
>   /* Additional stats used by xenvif */
>   unsigned long rx_gso_checksum_fixup;
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> index 5795213..50fa169 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -221,10 +221,10 @@ static struct net_device_stats
> *xenvif_get_stats(struct net_device *dev)
>  {
>   struct xenvif *vif = netdev_priv(dev);
>   struct xenvif_queue *queue = NULL;
> - unsigned long rx_bytes = 0;
> - unsigned long rx_packets = 0;
> - unsigned long tx_bytes = 0;
> - unsigned long tx_packets = 0;
> + u64 rx_bytes = 0;
> + u64 rx_packets = 0;
> + u64 tx_bytes = 0;
> + u64 tx_packets = 0;
>   unsigned int index;
> 
>   spin_lock(&vif->lock);
> --
> 2.1.4
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> https://lists.xen.org/xen-devel


[PATCH iproute2 1/2] utils: make hex2mem available to all users

2017-02-10 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

hex2mem() api is useful for parsing hexstrings which are then packed in
a stream of chars.

Signed-off-by: Jamal Hadi Salim 
---
 include/utils.h |  1 +
 ip/ipl2tp.c | 25 -
 lib/utils.c | 25 +
 3 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index dc1d6b9..22369e0 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -118,6 +118,7 @@ int get_be32(__be32 *val, const char *arg, int base);
 int get_be16(__be16 *val, const char *arg, int base);
 int get_addr64(__u64 *ap, const char *cp);
 
+int hex2mem(const char *buf, uint8_t *mem, int count);
 char *hexstring_n2a(const __u8 *str, int len, char *buf, int blen);
 __u8 *hexstring_a2n(const char *str, __u8 *buf, int blen, unsigned int *len);
 #define ADDR64_BUF_SIZE sizeof(":::")
diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 0f91aeb..88664c9 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -485,31 +485,6 @@ static int get_tunnel(struct l2tp_data *p)
  * Command parser
  */
 
-static int hex2mem(const char *buf, uint8_t *mem, int count)
-{
-   int i, j;
-   int c;
-
-   for (i = 0, j = 0; i < count; i++, j += 2) {
-   c = get_hex(buf[j]);
-   if (c < 0)
-   goto err;
-
-   mem[i] = c << 4;
-
-   c = get_hex(buf[j + 1]);
-   if (c < 0)
-   goto err;
-
-   mem[i] |= c;
-   }
-
-   return 0;
-
-err:
-   return -1;
-}
-
 static void usage(void) __attribute__((noreturn));
 
 static void usage(void)
diff --git a/lib/utils.c b/lib/utils.c
index 83c9d09..870c4f1 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -962,6 +962,31 @@ __u8 *hexstring_a2n(const char *str, __u8 *buf, int blen, 
unsigned int *len)
return buf;
 }
 
+int hex2mem(const char *buf, uint8_t *mem, int count)
+{
+   int i, j;
+   int c;
+
+   for (i = 0, j = 0; i < count; i++, j += 2) {
+   c = get_hex(buf[j]);
+   if (c < 0)
+   goto err;
+
+   mem[i] = c << 4;
+
+   c = get_hex(buf[j + 1]);
+   if (c < 0)
+   goto err;
+
+   mem[i] |= c;
+   }
+
+   return 0;
+
+err:
+   return -1;
+}
+
 int addr64_n2a(__u64 addr, char *buff, size_t len)
 {
__u16 *words = (__u16 *)&addr;
-- 
1.9.1



Re: [PATCH v3] can: Fix kernel panic at security_sock_rcv_skb

2017-02-10 Thread David Miller
From: Oliver Hartkopp 
Date: Fri, 10 Feb 2017 09:28:57 +0100

> can you please check whether this upstream commit
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f1712c73714088a7252d276a57126d56c7d37e64
> 
> really was queued up for -stable?

You never need to ask me this question, it is presented always, here:


http://patchwork.ozlabs.org/bundle/davem/stable/?submitter=&state=*&q=&archive=

And it is indeed there.


Re: [PATCH iproute] tc: matchall: Print skip flags when dumping a filter

2017-02-10 Thread Simon Horman
On Thu, Feb 09, 2017 at 03:10:14PM +0200, Or Gerlitz wrote:
> Print the skip flags when we dump a filter.
> 
> Signed-off-by: Or Gerlitz 
> Acked by: Yotam Gigi 

This appears to be consistent with other classifiers that support these
flags.

Reviewed-by: Simon Horman 

> ---
>  tc/f_matchall.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tc/f_matchall.c b/tc/f_matchall.c
> index 04e524e..ac48630 100644
> --- a/tc/f_matchall.c
> +++ b/tc/f_matchall.c
> @@ -130,6 +130,15 @@ static int matchall_print_opt(struct filter_util *qu, 
> FILE *f,
>   
> sprint_tc_classid(rta_getattr_u32(tb[TCA_MATCHALL_CLASSID]), b1));
>   }
>  
> + if (tb[TCA_MATCHALL_FLAGS]) {
> + __u32 flags = rta_getattr_u32(tb[TCA_MATCHALL_FLAGS]);
> +
> + if (flags & TCA_CLS_FLAGS_SKIP_HW)
> + fprintf(f, "\n  skip_hw");
> + if (flags & TCA_CLS_FLAGS_SKIP_SW)
> + fprintf(f, "\n  skip_sw");
> + }
> +
>   if (tb[TCA_MATCHALL_ACT])
>   tc_print_action(f, tb[TCA_MATCHALL_ACT]);
>  
> -- 
> 2.3.7
> 


[PATCH net-next] net: make net_device members garp_port and mrp_port conditional

2017-02-10 Thread Tobias Klauser
garp_port is only used in net/802/garp.c which is only compiled with
CONFIG_GARP enabled. Same goes for mrp_port which is only used in
net/802/mrp.c with CONFIG_MRP enabled.

Only include the two members in struct net_device if their respective
CONFIG_* is enabled. This saves a few bytes in struct net_device in case
CONFIG_GARP or CONFIG_MRP are not enabled.

Signed-off-by: Tobias Klauser 
---
 include/linux/netdevice.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 58afbd1cc659..7bb38f2c65c2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1863,8 +1863,12 @@ struct net_device {
struct pcpu_vstats __percpu *vstats;
};
 
+#if IS_ENABLED(CONFIG_GARP)
struct garp_port __rcu  *garp_port;
+#endif
+#if IS_ENABLED(CONFIG_MRP)
struct mrp_port __rcu   *mrp_port;
+#endif
 
struct device   dev;
const struct attribute_group *sysfs_groups[4];
-- 
2.11.0




Re: [PATCH] Make EN2 pin optional in the TRF7970A driver

2017-02-10 Thread Rob Herring
On Tue, Feb 07, 2017 at 06:22:04AM +0100, Heiko Schocher wrote:
> From: Guan Ben 
> 
> Make the EN2 pin optional. This is useful for boards,
> which have this pin fix wired, for example to ground.
> 
> Signed-off-by: Guan Ben 
> Signed-off-by: Mark Jonas 
> Signed-off-by: Heiko Schocher 
> 
> ---
> 
>  .../devicetree/bindings/net/nfc/trf7970a.txt   |  4 ++--
>  drivers/nfc/trf7970a.c | 26 
> --
>  2 files changed, 16 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
> b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> index 32b35a0..5889a3d 100644
> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> @@ -5,8 +5,8 @@ Required properties:
>  - spi-max-frequency: Maximum SPI frequency (<= 200).
>  - interrupt-parent: phandle of parent interrupt handler.
>  - interrupts: A single interrupt specifier.
> -- ti,enable-gpios: Two GPIO entries used for 'EN' and 'EN2' pins on the
> -  TRF7970A.
> +- ti,enable-gpios: One or two GPIO entries used for 'EN' and 'EN2' pins on 
> the
> +  TRF7970A. EN2 is optional.

Could EN ever be optional/fixed? If so, perhaps deprecate this property 
and do 2 properties, one for each pin.

Rob


net/llc: bug in llc_pdu_init_as_xid_cmd/skb_over_panic

2017-02-10 Thread Andrey Konovalov
Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

On commit 926af6273fc683cd98cd0ce7bf0d04a02eed6742.

A reproducer and .config are attached

kernel BUG at net/core/skbuff.c:105!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 6558 Comm: syz-executor4 Not tainted 4.10.0-rc7+ #126
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88003c49c480 task.stack: 88003a5c
RIP: 0010:skb_panic+0x16f/0x200 net/core/skbuff.c:101
RSP: 0018:88003a5c77d0 EFLAGS: 00010286
RAX: 0082 RBX: 88006be991c0 RCX: 
RDX: 0082 RSI: 814567fc RDI: ed00074b8eec
RBP: 88003a5c7838 R08: 0001 R09: 
R10: 0002 R11: 0001 R12: 85231ee0
R13: 834a6722 R14: 0003 R15: 88006c81c580
FS:  7f89298c7700() GS:88006de0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20ee5000 CR3: 58697000 CR4: 06e0
Call Trace:
 skb_over_panic net/core/skbuff.c:110 [inline]
 skb_put+0x18d/0x1d0 net/core/skbuff.c:1437
 llc_pdu_init_as_xid_cmd include/net/llc_pdu.h:377 [inline]
 llc_sap_action_send_xid_c+0x2a2/0x3b0 net/llc/llc_s_ac.c:82
 llc_exec_sap_trans_actions net/llc/llc_sap.c:152 [inline]
 llc_sap_next_state net/llc/llc_sap.c:181 [inline]
 llc_sap_state_process+0x26b/0x4e0 net/llc/llc_sap.c:212
 llc_build_and_send_xid_pkt+0x19f/0x200 net/llc/llc_sap.c:276
 llc_ui_sendmsg+0xad9/0x1430 net/llc/af_llc.c:938
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 ___sys_sendmsg+0x9d2/0xae0 net/socket.c:1985
 __sys_sendmsg+0x138/0x320 net/socket.c:2019
 SYSC_sendmsg net/socket.c:2030 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2026
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458b9
RSP: 002b:7f89298c6b58 EFLAGS: 0286 ORIG_RAX: 002e
RAX: ffda RBX: 0005 RCX: 004458b9
RDX: 00040085 RSI: 20001fc8 RDI: 0005
RBP: 006e1ae0 R08:  R09: 
R10:  R11: 0286 R12: 00708000
R13:  R14: c0206434 R15: 201fcfe0
Code: 00 00 00 48 89 54 24 10 48 c7 c7 60 19 23 85 48 89 74 24 08 4c
89 04 24 4c 89 ea 4c 89 7c 24 18 45 89 f0 4c 89 e6 e8 1e c0 38 fe <0f>
0b 4c 89 4d b8 4c 89 45 c0 48 89 75 c8 48 89 55 d0 e8 6a 5e
RIP: skb_panic+0x16f/0x200 net/core/skbuff.c:101 RSP: 88003a5c77d0
---[ end trace 89f0ca2ea5bc3ead ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


.config
Description: Binary data
// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_mmap
#define __NR_mmap 9
#endif
#ifndef __NR_socket
#define __NR_socket 41
#endif
#ifndef __NR_sendmsg
#define __NR_sendmsg 46
#endif

#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

const int kFailStatus = 67;
const int kErrorStatus = 68;
const int kRetryStatus = 69;

__attribute__((noreturn)) void doexit(int status)
{
  volatile unsigned i;
  syscall(__NR_exit_group, status);
  for (i = 0;; i++) {
  }
}

__attribute__((noreturn)) void fail(const char* msg, ...)
{
  int e = errno;
  fflush(stdout);
  va_list args;
  va_start(args, msg);
  vfprintf(stderr, msg, args);
  va_end(args);
  fprintf(stderr, " (errno %d)\n", e);
  doexit(e == ENOMEM ? kRetryStatus : kFailStatus);
}

__attribute__((noreturn)) void exitf(const char* msg, ...)
{
  int e = errno;
  fflush(stdout);
  va_list args;
  va_start(args, msg);
  vfprintf(stderr, msg, args);
  va_end(args);
  fprintf(stderr, " (errno %d)\n", e);
  doexit(kRetryStatus);
}

static int flag_debug;

void debug(const char* msg, ...)
{
  if (!flag_debug)
return;
  va_list args;
  va_start(args, msg);
  vfprintf(stdout, msg, args);
  va_end(args);
  fflush(stdout);
}

__thread int skip_segv;
__thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* uctx)
{
  uintptr_t addr = (uintptr_t)info->si_addr;
  const uintptr_t prog_start = 1 << 20;
  const uintptr_t prog_end = 100 << 20;
  if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED) &&
  (addr < prog_start || addr > prog_end)) {
debug("SIGSEGV on %p, skipping\n", addr);
_longjmp(segv_env, 1);
  }
  debug("SIGSEGV on %p, exiting\n", addr);
  doexit(sig);
  for (;;) {
  }
}

static void install_segv_handler()
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA

[PATCH] net: remove member 'max' of struct scm_fp_list

2017-02-10 Thread yuan linyu
From: yuan linyu 

SCM_MAX_FD can fully replace it.

Signed-off-by: yuan linyu 
---
 include/net/scm.h |  3 +--
 net/core/scm.c| 20 +---
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/include/net/scm.h b/include/net/scm.h
index 59fa93c..1301227 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -19,8 +19,7 @@ struct scm_creds {
 };
 
 struct scm_fp_list {
-   short   count;
-   short   max;
+   unsigned intcount;
struct user_struct  *user;
struct file *fp[SCM_MAX_FD];
 };
diff --git a/net/core/scm.c b/net/core/scm.c
index b6d8368..53679517 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -69,15 +69,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
int *fdp = (int*)CMSG_DATA(cmsg);
struct scm_fp_list *fpl = *fplp;
struct file **fpp;
-   int i, num;
-
-   num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
-
-   if (num <= 0)
-   return 0;
-
-   if (num > SCM_MAX_FD)
-   return -EINVAL;
+   unsigned int i, num;
 
if (!fpl)
{
@@ -86,18 +78,17 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
return -ENOMEM;
*fplp = fpl;
fpl->count = 0;
-   fpl->max = SCM_MAX_FD;
fpl->user = NULL;
}
-   fpp = &fpl->fp[fpl->count];
 
-   if (fpl->count + num > fpl->max)
+   num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
+   if (fpl->count + num > SCM_MAX_FD)
return -EINVAL;
 
/*
 *  Verify the descriptors and increment the usage count.
 */
-
+   fpp = &fpl->fp[fpl->count];
for (i=0; i< num; i++)
{
int fd = fdp[i];
@@ -112,7 +103,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
if (!fpl->user)
fpl->user = get_uid(current_user());
 
-   return num;
+   return 0;
 }
 
 void __scm_destroy(struct scm_cookie *scm)
@@ -341,7 +332,6 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
if (new_fpl) {
for (i = 0; i < fpl->count; i++)
get_file(fpl->fp[i]);
-   new_fpl->max = new_fpl->count;
new_fpl->user = get_uid(fpl->user);
}
return new_fpl;
-- 
2.7.4




re: sfc: process RX event inner checksum flags

2017-02-10 Thread Colin Ian King
Hi there,


not sure if this is a bug, or intentional, but CoverityScan picked up a
mismatch in arguments when calling efx_ef10_handle_rx_event_error() with
commit "sfc: process RX event inner checksum flags" that landed in
linux-next:

  CID 1402067 (#1 of 1): Arguments in wrong order
(SWAPPED_ARGUMENTS)swapped_arguments: The positions of arguments in the
call to efx_ef10_handle_rx_event_errors do not match the ordering of the
parameters:

rx_l3_class is passed to rx_encap_hdr
rx_l4_class is passed to rx_l3_class
rx_encap_hdr is passed to rx_l4_class


The function in question has the prototype:

static u16 efx_ef10_handle_rx_event_errors(struct efx_channel *channel,
  unsigned int n_packets,
  unsigned int rx_encap_hdr,
  unsigned int rx_l3_class,
  unsigned int rx_l4_class,
  const efx_qword_t *event)

...where as it it being called using:

flags |= efx_ef10_handle_rx_event_errors(channel, n_packets,
rx_l3_class, rx_l4_class, rx_encap_hdr, event);

Is this a bug or intentional?

Colin


Re: [PATCH] net: remove member 'max' of struct scm_fp_list

2017-02-10 Thread David Miller
From: yuan linyu 
Date: Fri, 10 Feb 2017 20:11:13 +0800

> From: yuan linyu 
> 
> SCM_MAX_FD can fully replace it.
> 
> Signed-off-by: yuan linyu 

I don't think so:

> @@ -341,7 +332,6 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
>   if (new_fpl) {
>   for (i = 0; i < fpl->count; i++)
>   get_file(fpl->fp[i]);
> - new_fpl->max = new_fpl->count;
>   new_fpl->user = get_uid(fpl->user);

It's not set the SCM_MAX_FD here, it's set to whatever fpl->count is.

In other words, your patch breaks things.


Re: [PATCH RFC net] net/mlx5e: Add preemption enable/disable around TC statistics upcall

2017-02-10 Thread Or Gerlitz
On Fri, Feb 10, 2017 at 3:34 AM, Jakub Kicinski
 wrote:
> On Thu,  9 Feb 2017 17:38:43 +0200, Or Gerlitz wrote:
>> Running with CONFIG_PREEMPT set, I get a
>>
>> BUG: using smp_processor_id() in preemptible [] code: tc/3793
>>
>> asserion from the TC action (mirred) stats_update callback, when the do
>>
>>   _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets)
>>
>> As done by commit 66860be "nfp: bpf: allow offloaded filters to update 
>> stats",
>> disabling/enabling preemption around the TC upcall solves that.
>>
>> Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter statistics 
>> support')
>> Signed-off-by: Or Gerlitz 
>> ---
>>
>> I marked it as RFC, since I wasn't fully sure on the nature of the
>> problem, nor if this is the direction we should take to the fix.

> I think it's the right fix

Do you under the problem? what's wrong with the call done in the TC
action code w.r.t preemption?

does it make sense to do this (say) 100K times/sec?

> for net-next we could perhaps redo the
> tcf_action_stats_update() helper so that it takes care of preemption and
> the iteration so more people don't trip over this?

maybe, lets 1st understand that deeper, hopefully you can assist..


Re: [PATCH net-next 4/4] net/sched: cls_bpf: Use skip flags to reflect HW offload status

2017-02-10 Thread Or Gerlitz
On Fri, Feb 10, 2017 at 3:22 AM, Jakub Kicinski  wrote:
> On Thu,  9 Feb 2017 16:18:08 +0200, Or Gerlitz wrote:
>> Currently there is no way of querying whether a filter is
>> offloaded to HW or not when using both policy (no flag).
>>
>> Reuse the skip flags to show the insertion status by setting
>> the skip_hw flag in case the filter wasn't offloaded.
>>
>> Signed-off-by: Or Gerlitz 
>> ---
>>  net/sched/cls_bpf.c | 17 +
>>  1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
>> index d9c9701..91ba90d 100644
>> --- a/net/sched/cls_bpf.c
>> +++ b/net/sched/cls_bpf.c
>> @@ -185,14 +185,23 @@ static int cls_bpf_offload(struct tcf_proto *tp, 
>> struct cls_bpf_prog *prog,
>>   return -EINVAL;
>>   }
>>   } else {
>> - if (!tc_should_offload(dev, tp, prog->gen_flags))
>> - return skip_sw ? -EINVAL : 0;
>> + if (!tc_should_offload(dev, tp, prog->gen_flags)) {
>> + if (tc_skip_sw(prog->gen_flags))
>> + return -EINVAL;
>> + prog->gen_flags |= TCA_CLS_FLAGS_SKIP_HW;
>> + return 0;
>> + }
>>   cmd = TC_CLSBPF_ADD;
>>   }
>>
>>   ret = cls_bpf_offload_cmd(tp, obj, cmd);
>> - if (ret)
>> - return skip_sw ? ret : 0;
>> +
>> + if (ret) {
>> + if (skip_sw)
>> + return ret;
>> + prog->gen_flags |= TCA_CLS_FLAGS_SKIP_HW;
>> + return 0;
>> + }
>>
>>   obj->offloaded = true;
>
> In cls_bpf we do store information about whether program is offloaded or
> not already (see the @offloaded member).  Could we simplify the code
> thanks to this?

yeah, I felt like I don't fully understand the role of the offloaded
member. As I wrote, this patch is compile tested only, I will be happy
if you can test it post here a better version, I don't think we need
to add/change the flags semantics, see next

> I'm obviously all for reporting whether tc objects are offloaded or not
> but let me ask perhaps the silly question of why reuse the SKIP_HW flag?
> We don't have to worry about flag bits running out, could it be clearer
> to users to report whether object is present in HW using a new flag?  Or
> even two flags for present/non-present so user doesn't have to ponder
> what no flag means (old kernel or not offloaded?). I don't really mind
> either way I'm just wondering what the motivation was and maybe how
> others feel.

yeah, the flags are a bit confusing to some people, but it's all about
polarity..

when the flags were introduced few of us where in favor of "positive"
polarity, that is with possibly three values: "sw only" "hw only" and
"both" but that JJJ (Jiri/John/Jamal) consensus was to pick a
"negative" polarity of "skip sw" "skip hw" and "default" which means
the filter is in SW and possibly in HW. I think we can live with that
semantics and this small series just helps for the default case, allow
user-space to know if the filter was offloaded using the existing
fields.

I am not in favor of making this more complex...

thanks for the feedback and review

Or.


Re: [patch net-next 0/7] mlxsw: Identical routes handling

2017-02-10 Thread David Miller
From: Jiri Pirko 
Date: Thu,  9 Feb 2017 10:28:37 +0100

> From: Jiri Pirko 
> 
> Ido says:
> 
> The kernel can store several FIB aliases that share the same prefix and
> length. These aliases can differ in other parameters such as TOS and
> metric, which are taken into account during lookup.
> 
> Offloading devices might not have the same flexibility, allowing only a
> single route with the same prefix and length to be reflected. mlxsw is
> one such device.
> 
> This patchset aims to correctly handle this situation in the mlxsw
> driver. The first four patches introduce small changes in the IPv4 FIB
> code, so that listeners of the FIB notification chain will be able to
> correctly handle identical routes.
> 
> The last three patches build on top of previous work and introduce the
> necessary changes in the mlxsw driver. The biggest change is the
> introduction of a FIB node, where identical routes are chained, instead
> of a primitive reference counting. This is explained in detail in the
> fifth patch.

Looks good, series applied, thanks Jiri and Ido.

I think you took care of this properly, but just always make sure that
if a delete event is emitted the object is not in the table any longer
and cannot be discovered by a parallel thread of execution at that
point.

Likewise a good rule of thumb is to make sure the object is
discoverable when you emit the add event.

Thanks again.


[PATCH net-next 2/2] afs: Use core kernel UUID generation

2017-02-10 Thread David Howells
From: Arnd Bergmann 

AFS uses a time based UUID to identify the host itself.  This requires
getting a timestamp which is currently done through the getnstimeofday()
interface that we want to eventually get rid of.

Instead of replacing it with a ktime-based interface, simply remove the
entire function and use generate_random_uuid() instead, which has a v4
("completely random") UUID instead of the time-based one.

Signed-off-by: Arnd Bergmann 
Signed-off-by: David Howells 
---

 fs/afs/internal.h   |   11 +--
 fs/afs/main.c   |   48 +---
 fs/afs/netdevices.c |   21 -
 3 files changed, 6 insertions(+), 74 deletions(-)

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 79061fa17168..8acf3670e756 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -561,6 +561,11 @@ extern int afs_mntpt_check_symlink(struct afs_vnode *, 
struct key *);
 extern void afs_mntpt_kill_timer(void);
 
 /*
+ * netdevices.c
+ */
+extern int afs_get_ipv4_interfaces(struct afs_interface *, size_t, bool);
+
+/*
  * proc.c
  */
 extern int afs_proc_init(void);
@@ -624,12 +629,6 @@ extern int afs_fs_init(void);
 extern void afs_fs_exit(void);
 
 /*
- * use-rtnetlink.c
- */
-extern int afs_get_ipv4_interfaces(struct afs_interface *, size_t, bool);
-extern int afs_get_MAC_address(u8 *, size_t);
-
-/*
  * vlclient.c
  */
 extern int afs_vl_get_entry_by_name(struct in_addr *, struct key *,
diff --git a/fs/afs/main.c b/fs/afs/main.c
index a07c14df3fd1..51d7d17bca57 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -35,50 +35,6 @@ struct uuid_v1 afs_uuid;
 struct workqueue_struct *afs_wq;
 
 /*
- * get a client UUID
- */
-static int __init afs_get_client_UUID(void)
-{
-   struct timespec ts;
-   u64 uuidtime;
-   u16 clockseq, hi_v;
-   int ret;
-
-   /* read the MAC address of one of the external interfaces and construct
-* a UUID from it */
-   ret = afs_get_MAC_address(afs_uuid.node, sizeof(afs_uuid.node));
-   if (ret < 0)
-   return ret;
-
-   getnstimeofday(&ts);
-   uuidtime = (u64) ts.tv_sec * 1000 * 1000 * 10;
-   uuidtime += ts.tv_nsec / 100;
-   uuidtime += UUID_TO_UNIX_TIME;
-   afs_uuid.time_low = htonl(uuidtime);
-   afs_uuid.time_mid = htons(uuidtime >> 32);
-   hi_v = (uuidtime >> 48) & UUID_TIMEHI_MASK;
-   hi_v |= UUID_VERSION_TIME;
-   afs_uuid.time_hi_and_version = htons(hi_v);
-
-   get_random_bytes(&clockseq, 2);
-   afs_uuid.clock_seq_low = clockseq;
-   afs_uuid.clock_seq_hi_and_reserved =
-   (clockseq >> 8) & UUID_CLOCKHI_MASK;
-   afs_uuid.clock_seq_hi_and_reserved |= UUID_VARIANT_STD;
-
-   _debug("AFS UUID: %08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x",
-  ntohl(afs_uuid.time_low),
-  ntohs(afs_uuid.time_mid),
-  ntohs(afs_uuid.time_hi_and_version),
-  afs_uuid.clock_seq_hi_and_reserved,
-  afs_uuid.clock_seq_low,
-  afs_uuid.node[0], afs_uuid.node[1], afs_uuid.node[2],
-  afs_uuid.node[3], afs_uuid.node[4], afs_uuid.node[5]);
-
-   return 0;
-}
-
-/*
  * initialise the AFS client FS module
  */
 static int __init afs_init(void)
@@ -87,9 +43,7 @@ static int __init afs_init(void)
 
printk(KERN_INFO "kAFS: Red Hat AFS client v0.1 registering.\n");
 
-   ret = afs_get_client_UUID();
-   if (ret < 0)
-   return ret;
+   generate_random_uuid((unsigned char *)&afs_uuid);
 
/* create workqueue */
ret = -ENOMEM;
diff --git a/fs/afs/netdevices.c b/fs/afs/netdevices.c
index 7ad36506c256..40b2bab3e401 100644
--- a/fs/afs/netdevices.c
+++ b/fs/afs/netdevices.c
@@ -12,27 +12,6 @@
 #include "internal.h"
 
 /*
- * get a MAC address from a random ethernet interface that has a real one
- * - the buffer will normally be 6 bytes in size
- */
-int afs_get_MAC_address(u8 *mac, size_t maclen)
-{
-   struct net_device *dev;
-   int ret = -ENODEV;
-
-   BUG_ON(maclen != ETH_ALEN);
-
-   rtnl_lock();
-   dev = __dev_getfirstbyhwtype(&init_net, ARPHRD_ETHER);
-   if (dev) {
-   memcpy(mac, dev->dev_addr, maclen);
-   ret = 0;
-   }
-   rtnl_unlock();
-   return ret;
-}
-
-/*
  * get a list of this system's interface IPv4 addresses, netmasks and MTUs
  * - maxbufs must be at least 1
  * - returns the number of interface records in the buffer



[PATCH net-next 0/2] afs: Use system UUID generation

2017-02-10 Thread David Howells

There is now a general function for generating a UUID and AFS should make
use of it.  It's also been recommended to me that I switch to using random
rather than time plus MAC address-based UUIDs which this function does.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20170210

David
---
Arnd Bergmann (1):
  afs: Use core kernel UUID generation

David Howells (1):
  afs: Move UUID struct to linux/uuid.h


 fs/afs/cmservice.c   |   28 ++--
 fs/afs/internal.h|   38 +++---
 fs/afs/main.c|   49 ++---
 fs/afs/netdevices.c  |   21 -
 include/linux/uuid.h |   24 
 5 files changed, 47 insertions(+), 113 deletions(-)



Re: sfc: process RX event inner checksum flags

2017-02-10 Thread Edward Cree
On 10/02/17 16:14, Colin Ian King wrote:
> Hi there,
>
>
> not sure if this is a bug, or intentional, but CoverityScan picked up a
> mismatch in arguments when calling efx_ef10_handle_rx_event_error() with
> commit "sfc: process RX event inner checksum flags"

It's a bug, thanks for the catch (and thanks Coverity as well).

Will send a patch shortly.

-Ed



Re: [patch net-next 0/6] sched: cls_api: small cleanup

2017-02-10 Thread David Miller
From: Jiri Pirko 
Date: Thu,  9 Feb 2017 14:38:54 +0100

> This patchset makes couple of things in cls_api code a bit nicer and easier
> for reader to digest.

Series applied, thanks Jiri.


[PATCH net-next 1/2] afs: Move UUID struct to linux/uuid.h

2017-02-10 Thread David Howells
Move the afs_uuid struct to linux/uuid.h, rename it to uuid_v1 and change
the u16/u32 fields to __be16/__be32 instead so that the structure can be
cast to a 16-octet network-order buffer.

Signed-off-by: David Howells 
Reviewed-by: Arnd Bergmann request = kmalloc(sizeof(struct afs_uuid), GFP_KERNEL);
+   call->request = kmalloc(sizeof(struct uuid_v1), GFP_KERNEL);
if (!call->request)
return -ENOMEM;
 
b = call->buffer;
r = call->request;
-   r->time_low = ntohl(b[0]);
-   r->time_mid = ntohl(b[1]);
-   r->time_hi_and_version  = ntohl(b[2]);
+   r->time_low = b[0];
+   r->time_mid = htons(ntohl(b[1]));
+   r->time_hi_and_version  = htons(ntohl(b[2]));
r->clock_seq_hi_and_reserved= ntohl(b[3]);
r->clock_seq_low= ntohl(b[4]);
 
@@ -454,7 +454,7 @@ static int afs_deliver_cb_probe(struct afs_call *call)
 static void SRXAFSCB_ProbeUuid(struct work_struct *work)
 {
struct afs_call *call = container_of(work, struct afs_call, work);
-   struct afs_uuid *r = call->request;
+   struct uuid_v1 *r = call->request;
 
struct {
__be32  match;
@@ -477,7 +477,7 @@ static void SRXAFSCB_ProbeUuid(struct work_struct *work)
  */
 static int afs_deliver_cb_probe_uuid(struct afs_call *call)
 {
-   struct afs_uuid *r;
+   struct uuid_v1 *r;
unsigned loop;
__be32 *b;
int ret;
@@ -503,15 +503,15 @@ static int afs_deliver_cb_probe_uuid(struct afs_call 
*call)
}
 
_debug("unmarshall UUID");
-   call->request = kmalloc(sizeof(struct afs_uuid), GFP_KERNEL);
+   call->request = kmalloc(sizeof(struct uuid_v1), GFP_KERNEL);
if (!call->request)
return -ENOMEM;
 
b = call->buffer;
r = call->request;
-   r->time_low = ntohl(b[0]);
-   r->time_mid = ntohl(b[1]);
-   r->time_hi_and_version  = ntohl(b[2]);
+   r->time_low = b[0];
+   r->time_mid = htons(ntohl(b[1]));
+   r->time_hi_and_version  = htons(ntohl(b[2]));
r->clock_seq_hi_and_reserved= ntohl(b[3]);
r->clock_seq_low= ntohl(b[4]);
 
@@ -569,9 +569,9 @@ static void SRXAFSCB_TellMeAboutYourself(struct work_struct 
*work)
memset(&reply, 0, sizeof(reply));
reply.ia.nifs = htonl(nifs);
 
-   reply.ia.uuid[0] = htonl(afs_uuid.time_low);
-   reply.ia.uuid[1] = htonl(afs_uuid.time_mid);
-   reply.ia.uuid[2] = htonl(afs_uuid.time_hi_and_version);
+   reply.ia.uuid[0] = afs_uuid.time_low;
+   reply.ia.uuid[1] = htonl(ntohs(afs_uuid.time_mid));
+   reply.ia.uuid[2] = htonl(ntohs(afs_uuid.time_hi_and_version));
reply.ia.uuid[3] = htonl((s8) afs_uuid.clock_seq_hi_and_reserved);
reply.ia.uuid[4] = htonl((s8) afs_uuid.clock_seq_low);
for (loop = 0; loop < 6; loop++)
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 65504e218d35..79061fa17168 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "afs.h"
@@ -407,30 +408,6 @@ struct afs_interface {
unsignedmtu;/* MTU of interface */
 };
 
-/*
- * UUID definition [internet draft]
- * - the timestamp is a 60-bit value, split 32/16/12, and goes in 100ns
- *   increments since midnight 15th October 1582
- *   - add AFS_UUID_TO_UNIX_TIME to convert unix time in 100ns units to UUID
- * time
- * - the clock sequence is a 14-bit counter to avoid duplicate times
- */
-struct afs_uuid {
-   u32 time_low;   /* low part of 
timestamp */
-   u16 time_mid;   /* mid part of 
timestamp */
-   u16 time_hi_and_version;/* high part of 
timestamp and version  */
-#define AFS_UUID_TO_UNIX_TIME  0x01b21dd213814000ULL
-#define AFS_UUID_TIMEHI_MASK   0x0fff
-#define AFS_UUID_VERSION_TIME  0x1000  /* time-based UUID */
-#define AFS_UUID_VERSION_NAME  0x3000  /* name-based UUID */
-#define AFS_UUID_VERSION_RANDOM0x4000  /* (pseudo-)random generated 
UUID */
-   u8  clock_seq_hi_and_reserved;  /* clock seq hi and 
variant */
-#define AFS_UUID_CLOCKHI_MASK  0x3f
-#define AFS_UUID_VARIANT_STD   0x80
-   u8  clock_seq_low;  /* clock seq low */
-   u8  node[6];/* spatially unique 
node ID (MAC addr) */
-};
-
 /*/
 /*
  

[PATCH net] at803x: insure minimum delay for SGMII link AN completion ckeck

2017-02-10 Thread Claudiu Manoil
Commit: f62265b "at803x: double check SGMII side autoneg"
introduced a regression for the p1010rdb board which has
two of the ethernet controllers (eTSEC) connected through
SGMII links to external Atheros SGMII AR8033 PHYs.
The issue consists in a dead link for these ports, and is
100% reproducible on kernel 4.9 (and later):

root@p1010rdb-pb:~# ifconfig eth2 172.16.1.1
[  203.274263] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
root@p1010rdb-pb:~# [  206.408255] 803x_aneg_done: SGMII link is not ok

root@p1010rdb-pb:~# ethtool eth2
Settings for eth2:
Supported ports: [ MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Link partner advertised link modes:  10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 1000baseT/Half 1000baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: MII
PHYAD: 2
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Current message level: 0x003f (63)
   drv probe link timer ifdown ifup
Link detected: no

Insuring up to 100 usecs for the SGMII link side AN to complete
proves to be enough to have a working SGMII link, for this board.
The need for a delay for the SGMII link side may be explained by
the fact that there are two levels of auto-negotiation (AN) for a
SGMII link.  First the PHY autonegotiates the link parameters w/
its link partner over the copper link. In the second stage, the
AN results are then passed to the eTSEC MAC over the SGMII link
using the Clause 37 auto-negotiation functionality.  While the
aneg_done() hook is called by the phylib state machine to check
for the completion of the 1st stage AN of the external PHY,
there's no mechanism to insure proper AN completion of the internal
SGMII link (which is actually handled on the eTSEC side by a
"internal PHY", called TBI).

Fixes: f62265b "at803x: double check SGMII side autoneg"

Signed-off-by: Claudiu Manoil 
---
 drivers/net/phy/at803x.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index a52b560..55fa7c4 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -366,6 +366,7 @@ static void at803x_link_change_notify(struct phy_device 
*phydev)
 static int at803x_aneg_done(struct phy_device *phydev)
 {
int ccr;
+   int timeout = 100; /* usecs */
 
int aneg_done = genphy_aneg_done(phydev);
if (aneg_done != BMSR_ANEGCOMPLETE)
@@ -383,7 +384,13 @@ static int at803x_aneg_done(struct phy_device *phydev)
phy_write(phydev, AT803X_REG_CHIP_CONFIG, ccr & ~AT803X_BT_BX_REG_SEL);
 
/* check if the SGMII link is OK. */
-   if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) {
+   do {
+   if (phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)
+   break;
+   udelay(1);
+   } while (--timeout);
+
+   if (!timeout) {
pr_warn("803x_aneg_done: SGMII link is not ok\n");
aneg_done = 0;
}
-- 
1.7.11.7



Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-10 Thread David Miller
From: Tom Herbert 
Date: Thu, 9 Feb 2017 20:55:34 -0800

> On Thu, Feb 9, 2017 at 7:33 PM, David Miller  wrote:
>> From: Tom Herbert 
>> Date: Thu, 9 Feb 2017 18:29:54 -0800
>>
>>> So we have thousands or LOC coming into drivers every day anyway with
>>> all those properties anyway, so this "restricted" environment solves
>>> at best 1% of the problem.
>>
>> What you must understand is that no matter what someone outside of
>> upstream writes into an eBPF program, it's safe, and we can absolutely
>> prove this with the verifier and the invariants of the execution
>> environment.
>>
> This is the exact same argument the userspace stack proponents will
> use-- put your stack in userspace and you can't crash the host.

Sounds like we can therefore meet that requirement and keep them in
the kernel networking path, which supports all of our values and goals
precisely.


Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()

2017-02-10 Thread Ben Greear

On 02/09/2017 11:03 PM, Valo, Kalle wrote:

Ben Greear  writes:


On 02/07/2017 01:14 AM, Valo, Kalle wrote:

Adrian Chadd  writes:


Removing this method makes the diff to FreeBSD larger, as "vif" in
FreeBSD is a different pointer.

(Yes, I have ath10k on freebsd working and I'd like to find a way to
reduce the diff moving forward.)


I don't like this "(void *) vif->drv_priv" style that much either but
apparently it's commonly used in Linux wireless code and already parts
of ath10k. So this patch just unifies the coding style.


Surely the code compiles to the same thing, so why add a patch that
makes it more difficult for Adrian and makes the code no easier to read
for the rest of us?


Because that's the coding style used already in Linux. It's great to see
that parts of ath10k can be used also in other systems but in principle
I'm not very fond of the idea starting to reject valid upstream patches
because of driver forks.


There are lots of people trying to maintain out-of-tree or backported patches 
to ath10k,
and every time there is a meaningless style change, that just makes us
waste more time on useless work instead of having time to work on more important
matters.

Thanks,
Ben


I think backports project is doing it right, it's not limiting upstream
development in any way and handles all the API changes internally. Maybe
FreeBSD could do something similar?




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



[PATCH v3 net-next 5/9] sunvnet: add memory barrier before check for tx enable

2017-02-10 Thread Shannon Nelson
In order to allow the underlying LDC and outstanding memory operations
to potentially catch up with the driver's Tx requests, add a memory
barrier before checking again for available tx descriptors.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/sunvnet_common.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet_common.c 
b/drivers/net/ethernet/sun/sunvnet_common.c
index 624ad65..05fe85f 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -1429,6 +1429,7 @@ int sunvnet_start_xmit_common(struct sk_buff *skb, struct 
net_device *dev,
dr->prod = (dr->prod + 1) & (VNET_TX_RING_SIZE - 1);
if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
netif_tx_stop_queue(txq);
+   smp_rmb();
if (vnet_tx_dring_avail(dr) > VNET_TX_WAKEUP_THRESH(dr))
netif_tx_wake_queue(txq);
}
-- 
1.7.1



[PATCH v3 net-next 7/9] sunvnet: remove extra rcu_read_unlocks

2017-02-10 Thread Shannon Nelson
The RCU read lock is grabbed first thing in sunvnet_start_xmit_common()
so it always needs to be released.  This removes the conditional release
in the dropped packet error path and removes a couple of superfluous
calls in the middle of the code.

Reported-by: Bijan Mottahedeh 
Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/sunvnet_common.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet_common.c 
b/drivers/net/ethernet/sun/sunvnet_common.c
index e54bc95..64671f0 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -1255,10 +1255,8 @@ int sunvnet_start_xmit_common(struct sk_buff *skb, 
struct net_device *dev,
 
rcu_read_lock();
port = vnet_tx_port(skb, dev);
-   if (unlikely(!port)) {
-   rcu_read_unlock();
+   if (unlikely(!port))
goto out_dropped;
-   }
 
if (skb_is_gso(skb) && skb->len > port->tsolen) {
err = vnet_handle_offloads(port, skb, vnet_tx_port);
@@ -1283,7 +1281,6 @@ int sunvnet_start_xmit_common(struct sk_buff *skb, struct 
net_device *dev,
fl4.saddr = ip_hdr(skb)->saddr;
 
rt = ip_route_output_key(dev_net(dev), &fl4);
-   rcu_read_unlock();
if (!IS_ERR(rt)) {
skb_dst_set(skb, &rt->dst);
icmp_send(skb, ICMP_DEST_UNREACH,
@@ -1443,8 +1440,7 @@ int sunvnet_start_xmit_common(struct sk_buff *skb, struct 
net_device *dev,
jiffies + VNET_CLEAN_TIMEOUT);
else if (port)
del_timer(&port->clean_timer);
-   if (port)
-   rcu_read_unlock();
+   rcu_read_unlock();
if (skb)
dev_kfree_skb(skb);
vnet_free_skbs(freeskbs);
-- 
1.7.1



[PATCH v3 net-next 2/9] sunvnet: remove unused variable in maybe_tx_wakeup

2017-02-10 Thread Shannon Nelson
From: Sowmini Varadhan 

The vio_dring_state *dr variable is unused in maybe_tx_wakeup().
As the comments indicate, we call maybe_tx_wakeup() whenever we
get a STOPPED LDC message on the port. If the queue is stopped,
we want to wake it up so that we will send another START message
at the next TX and trigger the consumer to drain the dring.

Signed-off-by: Sowmini Varadhan 
Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/sunvnet_common.c |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet_common.c 
b/drivers/net/ethernet/sun/sunvnet_common.c
index c71f000..0f940f0 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -719,12 +719,8 @@ static void maybe_tx_wakeup(struct vnet_port *port)
txq = netdev_get_tx_queue(VNET_PORT_TO_NET_DEVICE(port),
  port->q_index);
__netif_tx_lock(txq, smp_processor_id());
-   if (likely(netif_tx_queue_stopped(txq))) {
-   struct vio_dring_state *dr;
-
-   dr = &port->vio.drings[VIO_DRIVER_TX_RING];
+   if (likely(netif_tx_queue_stopped(txq)))
netif_tx_wake_queue(txq);
-   }
__netif_tx_unlock(txq);
 }
 
-- 
1.7.1



[PATCH v3 net-next 8/9] ldmvsw: update and simplify version string

2017-02-10 Thread Shannon Nelson
New version and simplify the print code.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/ldmvsw.c |   14 --
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sun/ldmvsw.c 
b/drivers/net/ethernet/sun/ldmvsw.c
index 3999fb7..3ef5c08 100644
--- a/drivers/net/ethernet/sun/ldmvsw.c
+++ b/drivers/net/ethernet/sun/ldmvsw.c
@@ -41,11 +41,11 @@
 static u8 vsw_port_hwaddr[ETH_ALEN] = {0xFE, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
 
 #define DRV_MODULE_NAME"ldmvsw"
-#define DRV_MODULE_VERSION "1.0"
-#define DRV_MODULE_RELDATE "Jan 15, 2016"
+#define DRV_MODULE_VERSION "1.1"
+#define DRV_MODULE_RELDATE "February 3, 2017"
 
 static char version[] =
-   DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
+   DRV_MODULE_NAME " " DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")";
 MODULE_AUTHOR("Oracle");
 MODULE_DESCRIPTION("Sun4v LDOM Virtual Switch Driver");
 MODULE_LICENSE("GPL");
@@ -322,11 +322,6 @@ static void vsw_poll_controller(struct net_device *dev)
.handshake_complete = sunvnet_handshake_complete_common,
 };
 
-static void print_version(void)
-{
-   printk_once(KERN_INFO "%s", version);
-}
-
 static const char *remote_macaddr_prop = "remote-mac-address";
 static const char *id_prop = "id";
 
@@ -342,8 +337,6 @@ static int vsw_port_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
const u64 *port_id;
u64 handle;
 
-   print_version();
-
hp = mdesc_grab();
 
rmac = mdesc_get_property(hp, vdev->mp, remote_macaddr_prop, &len);
@@ -520,6 +513,7 @@ static void vsw_cleanup(void)
 
 static int __init vsw_init(void)
 {
+   pr_info("%s\n", version);
return vio_register_driver(&vsw_port_driver);
 }
 
-- 
1.7.1



[PATCH v3 net-next 4/9] sunvnet: add driver stats for ethtool support

2017-02-10 Thread Shannon Nelson
Since we're collecting some stats in the driver code, let's support use
of the ethtool driver stats facility in both sunvnet and ldmvsw.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/ldmvsw.c |   63 +
 drivers/net/ethernet/sun/sunvnet.c|   63 +
 drivers/net/ethernet/sun/sunvnet_common.c |2 +
 3 files changed, 128 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/sun/ldmvsw.c 
b/drivers/net/ethernet/sun/ldmvsw.c
index 335b876..3999fb7 100644
--- a/drivers/net/ethernet/sun/ldmvsw.c
+++ b/drivers/net/ethernet/sun/ldmvsw.c
@@ -80,11 +80,74 @@ static void vsw_set_msglevel(struct net_device *dev, u32 
value)
port->vp->msg_enable = value;
 }
 
+static const struct {
+   const char string[ETH_GSTRING_LEN];
+} ethtool_stats_keys[] = {
+   { "rx_packets" },
+   { "tx_packets" },
+   { "rx_bytes" },
+   { "tx_bytes" },
+   { "rx_errors" },
+   { "tx_errors" },
+   { "rx_dropped" },
+   { "tx_dropped" },
+   { "multicast" },
+   { "rx_length_errors" },
+   { "rx_frame_errors" },
+   { "rx_missed_errors" },
+   { "tx_carrier_errors" },
+};
+
+static int vsw_get_sset_count(struct net_device *dev, int sset)
+{
+   switch (sset) {
+   case ETH_SS_STATS:
+   return ARRAY_SIZE(ethtool_stats_keys);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static void vsw_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
+{
+   switch (stringset) {
+   case ETH_SS_STATS:
+   memcpy(buf, ðtool_stats_keys, sizeof(ethtool_stats_keys));
+   break;
+   default:
+   WARN_ON(1);
+   break;
+   }
+}
+
+static void vsw_get_ethtool_stats(struct net_device *dev,
+ struct ethtool_stats *estats, u64 *data)
+{
+   int i = 0;
+
+   data[i++] = dev->stats.rx_packets;
+   data[i++] = dev->stats.tx_packets;
+   data[i++] = dev->stats.rx_bytes;
+   data[i++] = dev->stats.tx_bytes;
+   data[i++] = dev->stats.rx_errors;
+   data[i++] = dev->stats.tx_errors;
+   data[i++] = dev->stats.rx_dropped;
+   data[i++] = dev->stats.tx_dropped;
+   data[i++] = dev->stats.multicast;
+   data[i++] = dev->stats.rx_length_errors;
+   data[i++] = dev->stats.rx_frame_errors;
+   data[i++] = dev->stats.rx_missed_errors;
+   data[i++] = dev->stats.tx_carrier_errors;
+}
+
 static const struct ethtool_ops vsw_ethtool_ops = {
.get_drvinfo= vsw_get_drvinfo,
.get_msglevel   = vsw_get_msglevel,
.set_msglevel   = vsw_set_msglevel,
.get_link   = ethtool_op_get_link,
+   .get_sset_count = vsw_get_sset_count,
+   .get_strings= vsw_get_strings,
+   .get_ethtool_stats  = vsw_get_ethtool_stats,
 };
 
 static LIST_HEAD(vnet_list);
diff --git a/drivers/net/ethernet/sun/sunvnet.c 
b/drivers/net/ethernet/sun/sunvnet.c
index 4cc2571..e225b27 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -77,11 +77,74 @@ static void vnet_set_msglevel(struct net_device *dev, u32 
value)
vp->msg_enable = value;
 }
 
+static const struct {
+   const char string[ETH_GSTRING_LEN];
+} ethtool_stats_keys[] = {
+   { "rx_packets" },
+   { "tx_packets" },
+   { "rx_bytes" },
+   { "tx_bytes" },
+   { "rx_errors" },
+   { "tx_errors" },
+   { "rx_dropped" },
+   { "tx_dropped" },
+   { "multicast" },
+   { "rx_length_errors" },
+   { "rx_frame_errors" },
+   { "rx_missed_errors" },
+   { "tx_carrier_errors" },
+};
+
+static int vnet_get_sset_count(struct net_device *dev, int sset)
+{
+   switch (sset) {
+   case ETH_SS_STATS:
+   return ARRAY_SIZE(ethtool_stats_keys);
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static void vnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
+{
+   switch (stringset) {
+   case ETH_SS_STATS:
+   memcpy(buf, ðtool_stats_keys, sizeof(ethtool_stats_keys));
+   break;
+   default:
+   WARN_ON(1);
+   break;
+   }
+}
+
+static void vnet_get_ethtool_stats(struct net_device *dev,
+  struct ethtool_stats *estats, u64 *data)
+{
+   int i = 0;
+
+   data[i++] = dev->stats.rx_packets;
+   data[i++] = dev->stats.tx_packets;
+   data[i++] = dev->stats.rx_bytes;
+   data[i++] = dev->stats.tx_bytes;
+   data[i++] = dev->stats.rx_errors;
+   data[i++] = dev->stats.tx_errors;
+   data[i++] = dev->stats.rx_dropped;
+   data[i++] = dev->stats.tx_dropped;
+   data[i++] = dev->stats.multicast;
+   data[i++] = dev->stats.rx_length_errors;
+   data[i++] = dev->stats.rx_frame_errors;
+   data[i++] = dev->stats.rx_missed_errors;

[PATCH v3 net-next 3/9] sunvnet: update version and version printing

2017-02-10 Thread Shannon Nelson
There have been several changes since the first version of this code, so
we bump the version number.  While we're at it, we can simplify the
version printing a bit and drop a couple lines of code.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/sunvnet.c |   14 --
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet.c 
b/drivers/net/ethernet/sun/sunvnet.c
index 5356a70..4cc2571 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -38,11 +38,11 @@
 #define VNET_TX_TIMEOUT(5 * HZ)
 
 #define DRV_MODULE_NAME"sunvnet"
-#define DRV_MODULE_VERSION "1.0"
-#define DRV_MODULE_RELDATE "June 25, 2007"
+#define DRV_MODULE_VERSION "2.0"
+#define DRV_MODULE_RELDATE "February 3, 2017"
 
 static char version[] =
-   DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
+   DRV_MODULE_NAME " " DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")";
 MODULE_AUTHOR("David S. Miller (da...@davemloft.net)");
 MODULE_DESCRIPTION("Sun LDOM virtual network driver");
 MODULE_LICENSE("GPL");
@@ -303,11 +303,6 @@ static void vnet_cleanup(void)
.handshake_complete = sunvnet_handshake_complete_common,
 };
 
-static void print_version(void)
-{
-   printk_once(KERN_INFO "%s", version);
-}
-
 const char *remote_macaddr_prop = "remote-mac-address";
 
 static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id 
*id)
@@ -319,8 +314,6 @@ static int vnet_port_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
const u64 *rmac;
int len, i, err, switch_port;
 
-   print_version();
-
hp = mdesc_grab();
 
vp = vnet_find_parent(hp, vdev->mp, vdev);
@@ -446,6 +439,7 @@ static int vnet_port_remove(struct vio_dev *vdev)
 
 static int __init vnet_init(void)
 {
+   pr_info("%s\n", version);
return vio_register_driver(&vnet_port_driver);
 }
 
-- 
1.7.1



[PATCH v3 net-next 0/9] sunvnet driver updates

2017-02-10 Thread Shannon Nelson
The sunvnet ldom virtual network driver was due for some updates and
a bugfix or two.  These patches address a few items left over from
last year's make-over.

v2:
 - changed memory barrier fix to use smp_wmb
 - put NETIF_F_SG back into the advertised ldmvsw hw_features

v3:
 - the sunvnet_common module doesn't need module_init or _exit

Shannon Nelson (8):
  sunvnet: make sunvnet common code dynamically loadable
  sunvnet: update version and version printing
  sunvnet: add driver stats for ethtool support
  sunvnet: add memory barrier before check for tx enable
  sunvnet: straighten up message event handling logic
  sunvnet: remove extra rcu_read_unlocks
  ldmvsw: update and simplify version string
  ldmvsw: disable tso and gso for bridge operations

Sowmini Varadhan (1):
  sunvnet: remove unused variable in maybe_tx_wakeup

 drivers/net/ethernet/sun/Kconfig  |8 ++-
 drivers/net/ethernet/sun/ldmvsw.c |   82 +---
 drivers/net/ethernet/sun/sunvnet.c|   77 ---
 drivers/net/ethernet/sun/sunvnet_common.c |  119 ++---
 4 files changed, 200 insertions(+), 86 deletions(-)



[PATCH v3 net-next 9/9] ldmvsw: disable tso and gso for bridge operations

2017-02-10 Thread Shannon Nelson
The ldmvsw driver is specifically for supporting the ldom virtual
networking by running in the primary ldom and using the LDC to connect
the remaining ldoms to the outside world via a bridge.  With TSO and GSO
supported while connected the bridge, things tend to misbehave as seen
in our case by delayed packets, enough to begin triggering retransmits
and affecting overall throughput.  By turning off advertised support for
TSO and GSO we restore stable traffic flow through the bridge.

Orabug: 23293104

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/ldmvsw.c |5 ++---
 drivers/net/ethernet/sun/sunvnet_common.c |3 ++-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sun/ldmvsw.c 
b/drivers/net/ethernet/sun/ldmvsw.c
index 3ef5c08..8e1ecfb 100644
--- a/drivers/net/ethernet/sun/ldmvsw.c
+++ b/drivers/net/ethernet/sun/ldmvsw.c
@@ -297,8 +297,7 @@ static void vsw_poll_controller(struct net_device *dev)
dev->ethtool_ops = &vsw_ethtool_ops;
dev->watchdog_timeo = VSW_TX_TIMEOUT;
 
-   dev->hw_features = NETIF_F_TSO | NETIF_F_GSO | NETIF_F_GSO_SOFTWARE |
-  NETIF_F_HW_CSUM | NETIF_F_SG;
+   dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG;
dev->features = dev->hw_features;
 
/* MTU range: 68 - 65535 */
@@ -383,7 +382,7 @@ static int vsw_port_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
port->vp = vp;
port->dev = dev;
port->switch_port = 1;
-   port->tso = true;
+   port->tso = false; /* no tso in vsw, misbehaves in bridge */
port->tsolen = 0;
 
/* Mark the port as belonging to ldmvsw which directs the
diff --git a/drivers/net/ethernet/sun/sunvnet_common.c 
b/drivers/net/ethernet/sun/sunvnet_common.c
index 64671f0..19b8d29 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -186,6 +186,7 @@ static int handle_attr_info(struct vio_driver_state *vio,
} else {
pkt->cflags &= ~VNET_LSO_IPV4_CAPAB;
pkt->ipv4_lso_maxlen = 0;
+   port->tsolen = 0;
}
 
/* for version >= 1.6, ACK packet mode we support */
@@ -1637,7 +1638,7 @@ static void vnet_port_reset(struct vnet_port *port)
del_timer(&port->clean_timer);
sunvnet_port_free_tx_bufs_common(port);
port->rmtu = 0;
-   port->tso = true;
+   port->tso = (port->vsw == 0);  /* no tso in vsw, misbehaves in bridge */
port->tsolen = 0;
 }
 
-- 
1.7.1



[PATCH v3 net-next 6/9] sunvnet: straighten up message event handling logic

2017-02-10 Thread Shannon Nelson
The use of gotos for handling the incoming events made this code
harder to read and support than it should be.  This patch straightens
out and clears up the logic.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/sunvnet_common.c |   94 ++---
 1 files changed, 45 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet_common.c 
b/drivers/net/ethernet/sun/sunvnet_common.c
index 05fe85f..e54bc95 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -740,41 +740,37 @@ static int vnet_event_napi(struct vnet_port *port, int 
budget)
struct vio_driver_state *vio = &port->vio;
int tx_wakeup, err;
int npkts = 0;
-   int event = (port->rx_event & LDC_EVENT_RESET);
-
-ldc_ctrl:
-   if (unlikely(event == LDC_EVENT_RESET ||
-event == LDC_EVENT_UP)) {
-   vio_link_state_change(vio, event);
-
-   if (event == LDC_EVENT_RESET) {
-   vnet_port_reset(port);
-   vio_port_up(vio);
-
-   /* If the device is running but its tx queue was
-* stopped (due to flow control), restart it.
-* This is necessary since vnet_port_reset()
-* clears the tx drings and thus we may never get
-* back a VIO_TYPE_DATA ACK packet - which is
-* the normal mechanism to restart the tx queue.
-*/
-   if (netif_running(dev))
-   maybe_tx_wakeup(port);
-   }
+
+   /* we don't expect any other bits */
+   BUG_ON(port->rx_event & ~(LDC_EVENT_DATA_READY |
+ LDC_EVENT_RESET |
+ LDC_EVENT_UP));
+
+   /* RESET takes precedent over any other event */
+   if (port->rx_event & LDC_EVENT_RESET) {
+   vio_link_state_change(vio, LDC_EVENT_RESET);
+   vnet_port_reset(port);
+   vio_port_up(vio);
+
+   /* If the device is running but its tx queue was
+* stopped (due to flow control), restart it.
+* This is necessary since vnet_port_reset()
+* clears the tx drings and thus we may never get
+* back a VIO_TYPE_DATA ACK packet - which is
+* the normal mechanism to restart the tx queue.
+*/
+   if (netif_running(dev))
+   maybe_tx_wakeup(port);
+
port->rx_event = 0;
return 0;
}
-   /* We may have multiple LDC events in rx_event. Unroll send_events() */
-   event = (port->rx_event & LDC_EVENT_UP);
-   port->rx_event &= ~(LDC_EVENT_RESET | LDC_EVENT_UP);
-   if (event == LDC_EVENT_UP)
-   goto ldc_ctrl;
-   event = port->rx_event;
-   if (!(event & LDC_EVENT_DATA_READY))
-   return 0;
 
-   /* we dont expect any other bits than RESET, UP, DATA_READY */
-   BUG_ON(event != LDC_EVENT_DATA_READY);
+   if (port->rx_event & LDC_EVENT_UP) {
+   vio_link_state_change(vio, LDC_EVENT_UP);
+   port->rx_event = 0;
+   return 0;
+   }
 
err = 0;
tx_wakeup = 0;
@@ -797,25 +793,25 @@ static int vnet_event_napi(struct vnet_port *port, int 
budget)
pkt->start_idx = vio_dring_next(dr,
port->napi_stop_idx);
pkt->end_idx = -1;
-   goto napi_resume;
-   }
-   err = ldc_read(vio->lp, &msgbuf, sizeof(msgbuf));
-   if (unlikely(err < 0)) {
-   if (err == -ECONNRESET)
-   vio_conn_reset(vio);
-   break;
+   } else {
+   err = ldc_read(vio->lp, &msgbuf, sizeof(msgbuf));
+   if (unlikely(err < 0)) {
+   if (err == -ECONNRESET)
+   vio_conn_reset(vio);
+   break;
+   }
+   if (err == 0)
+   break;
+   viodbg(DATA, "TAG [%02x:%02x:%04x:%08x]\n",
+  msgbuf.tag.type,
+  msgbuf.tag.stype,
+  msgbuf.tag.stype_env,
+  msgbuf.tag.sid);
+   err = vio_validate_sid(vio, &msgbuf.tag);
+   if (err < 0)
+   break;
}
-   if (err == 0)
-   break;
-   viodbg(DATA, "TAG [%02x:%02x:%04x:%08x]\n",
-  msgbuf.tag.type,
-  msgbuf.tag.stype,
-

[PATCH v3 net-next 1/9] sunvnet: make sunvnet common code dynamically loadable

2017-02-10 Thread Shannon Nelson
When the sunvnet_common code was split out for use by both sunvnet
and the newer ldmvsw, it was made into a static kernel library, which
limits the usefulness of sunvnet and ldmvsw as loadables, since most
of the real work is being done in the shared code.  Also, this is
simply dead code in kernels that aren't running the LDoms.

This patch makes the sunvnet_common into a dynamically loadable
module and makes sunvnet and ldmvsw dependent on sunvnet_common.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/sun/Kconfig  |8 ++--
 drivers/net/ethernet/sun/sunvnet_common.c |5 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sun/Kconfig b/drivers/net/ethernet/sun/Kconfig
index a4b40e3..a7d91da 100644
--- a/drivers/net/ethernet/sun/Kconfig
+++ b/drivers/net/ethernet/sun/Kconfig
@@ -70,19 +70,23 @@ config CASSINI
  
.
 
 config SUNVNET_COMMON
-   bool
+   tristate "Common routines to support Sun Virtual Networking"
depends on SUN_LDOMS
-   default y if SUN_LDOMS
+   default m if SUN_LDOMS
 
 config SUNVNET
tristate "Sun Virtual Network support"
+   default m
depends on SUN_LDOMS
+   depends on SUNVNET_COMMON
---help---
  Support for virtual network devices under Sun Logical Domains.
 
 config LDMVSW
tristate "Sun4v LDoms Virtual Switch support"
+   default m
depends on SUN_LDOMS
+   depends on SUNVNET_COMMON
---help---
  Support for virtual switch devices under Sun4v Logical Domains.
  This driver adds a network interface for every vsw-port node
diff --git a/drivers/net/ethernet/sun/sunvnet_common.c 
b/drivers/net/ethernet/sun/sunvnet_common.c
index 191c8ad..c71f000 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -37,6 +37,11 @@
  */
 #defineVNET_MAX_RETRIES10
 
+MODULE_AUTHOR("David S. Miller (da...@davemloft.net)");
+MODULE_DESCRIPTION("Sun LDOM virtual network support library");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1.1");
+
 static int __vnet_tx_trigger(struct vnet_port *port, u32 start);
 static void vnet_port_reset(struct vnet_port *port);
 
-- 
1.7.1



cafe8df8b9bc clashes with DSA

2017-02-10 Thread Vivien Didelot
Hi,

With latest net-next/master, both my ZII Rev B and Rev C boards crash at
boot. Bisecting found the bad guy: cafe8df8b9bc ("net: phy: Fix lack of
reference count on PHY driver"). Below is the stack trace at boot:


libphy: mdio_mux: probed
mdio_bus 0.1:00: mdio_device_register
mv88e6085 0.1:00: switch 0x352 detected: Marvell 88E6352, revision 1
libphy: /mdio-mux/mdio@1/switch0@0: probed
libphy: mdio_mux: probed
mdio_bus 0.2:00: mdio_device_register
mv88e6085 0.2:00: switch 0x352 detected: Marvell 88E6352, revision 1
mmc0: host does not support reading read-only switch, assuming 
write-enable
random: fast init done
mmc0: new high speed SDHC card at address 0001
mmcblk0: mmc0:0001 L1BN2 3.86 GiB 
 mmcblk0: p1
libphy: /mdio-mux/mdio@2/switch1@0: probed
libphy: mdio_mux: probed
mdio_bus 0.4:00: mdio_device_register
mv88e6085 0.4:00: switch 0x1a7 detected: Marvell 88E6185, revision 2
libphy: /mdio-mux/mdio@4/switch2@0: probed
DSA: switch 0 0 parsed
DSA: switch 0 1 parsed
DSA: switch 0 2 parsed
DSA: tree 0 parsed
Marvell 88E1540 !mdio-mux!mdio@1:00: attached PHY driver [Marvell 
88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@1:00, irq=212)
Marvell 88E1540 !mdio-mux!mdio@1:01: attached PHY driver [Marvell 
88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@1:01, irq=213)
Marvell 88E1540 !mdio-mux!mdio@1:02: attached PHY driver [Marvell 
88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@1:02, irq=214)
Marvell 88E1540 !mdio-mux!mdio@2:00: attached PHY driver [Marvell 
88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@2:00, irq=237)
Marvell 88E1540 !mdio-mux!mdio@2:01: attached PHY driver [Marvell 
88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@2:01, irq=238)
Marvell 88E1540 !mdio-mux!mdio@2:02: attached PHY driver [Marvell 
88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@2:02, irq=239)
Unable to handle kernel NULL pointer dereference at virtual address 
0008
pgd = 80004000
[0008] *pgd=
Internal error: Oops: 17 [#1] ARM
Modules linked in:
CPU: 0 PID: 687 Comm: kworker/0:2 Not tainted 4.10.0-rc6 #115
Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
Workqueue: events deferred_probe_work_func
task: 9ecda000 task.stack: 9eca2000
PC is at phy_attach_direct+0x50/0x1a8
LR is at phy_connect_direct+0x24/0x5c
pc : [<8046d688>]lr : [<8046d8e0>]psr: 600a0013
sp : 9eca3ab8  ip : 9eca3ae8  fp : 9eca3ae4
r10:   r9 : 0002  r8 : 9ed02c00
r7 :   r6 : 9ed32800  r5 :   r4 : 9ed03000
r3 :   r2 :   r1 : 9ec62e00  r0 : 
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c53c7d  Table: 9ed28059  DAC: 0051
Process kworker/0:2 (pid: 687, stack limit = 0x9eca2208)
Stack: (0x9eca3ab8 to 0x9eca4000)
3aa0:   0002 
9ed03000
3ac0:  80634064  9fff7724   9eca3b04 
9eca3ae8
3ae0: 8046d8e0 8046d644 0002 9ed32c60  9ed32800 9eca3b24 
9eca3b08
3b00: 80634174 8046d8c8  9ed32800 9f6b9000 9ec70210 9eca3b64 
9eca3b28
3b20: 806345c0 80634120 0001 0001 9ed32c60 9ec62e00 9f646e00 
9fff7724
3b40: 9f646e00 9ec70210  9f646f08 9ec70210 9ec70210 9eca3bbc 
9eca3b68
3b60: 806352a8 80634354  801f1dc8 9ec70210 9f646f04 9f646f08 
9f646f08
3b80: 9ec7001c  0002 0009 9eca3bbc 9ec62e00 9ec70010 
9ec7001c
3ba0:  9fff7450  000c 9eca3bf4 9eca3bc0 8047dabc 
80634b48
3bc0:  8123d2e4 9eca3be4 9eca3bd8 9ec62e00 81299770 81258d50 

3be0: 8123d2e4  9eca3c04 9eca3bf8 8046e9ec 8047d6fc 9eca3c3c 
9eca3c08
3c00: 8040c2bc 8046e9d0 9eca3c24 9eca3c18 80521d10 0001 8123d2e4 
9ec62e00
3c20: 9eca3c88 8129974c   9eca3c5c 9eca3c40 8040c6d4 
8040c04c
3c40:  9eca3c88 8040c628 0001 9eca3c84 9eca3c60 8040a0fc 
8040c634
3c60: 9f5e57dc 9f73ce34 8041659c 9ec62e00 9ec62e34 8123a150 9eca3cac 
9eca3c88
3c80: 8040be90 8040a094 9ec62e00 0001 9ec62e08 9ec62e00 8123a150 

3ca0: 9eca3cbc 9eca3cb0 8040c76c 8040bddc 9eca3cdc 9eca3cc0 8040b254 
8040c75c
3cc0: 81252e00 9ec62e08 9ed02840 9ec62e00 9eca3d1c 9eca3ce0 80409010 
8040b1cc
3ce0: 9eca3d10 9eca3cf0 80407800 8040749c 9eca3d14 9fff7450 9ec62e00 

3d00: 9fff730c 9ed02840 9ec62e00  9eca3d34 9eca3d20 8046eb48 
80408bfc
3d20: 9fff7450 9ed02800 9eca3d74 9eca3d38 8052647c 8046eb18 9ec9a890 
8046f210
3d40: 80803490 8125e3a8 9eca3d6c 9fff730c  9f59ac10 9ec9a890 
8046f210
3d60: 8

Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread Florian Fainelli
On 02/10/2017 09:46 AM, Vivien Didelot wrote:
> Hi,
> 
> With latest net-next/master, both my ZII Rev B and Rev C boards crash at
> boot. Bisecting found the bad guy: cafe8df8b9bc ("net: phy: Fix lack of
> reference count on PHY driver"). Below is the stack trace at boot:

Fixed in the "net" tree with:

6d9f66ac7fec2a6ccd649e5909806dfe36f1fc25 ("net: phy: Fix PHY module
checks and NULL deref in phy_attach_direct()"), applies fine to net-next
as well.

> 
>   
>   libphy: mdio_mux: probed
>   mdio_bus 0.1:00: mdio_device_register
>   mv88e6085 0.1:00: switch 0x352 detected: Marvell 88E6352, revision 1
>   libphy: /mdio-mux/mdio@1/switch0@0: probed
>   libphy: mdio_mux: probed
>   mdio_bus 0.2:00: mdio_device_register
>   mv88e6085 0.2:00: switch 0x352 detected: Marvell 88E6352, revision 1
>   mmc0: host does not support reading read-only switch, assuming 
> write-enable
>   random: fast init done
>   mmc0: new high speed SDHC card at address 0001
>   mmcblk0: mmc0:0001 L1BN2 3.86 GiB 
>mmcblk0: p1
>   libphy: /mdio-mux/mdio@2/switch1@0: probed
>   libphy: mdio_mux: probed
>   mdio_bus 0.4:00: mdio_device_register
>   mv88e6085 0.4:00: switch 0x1a7 detected: Marvell 88E6185, revision 2
>   libphy: /mdio-mux/mdio@4/switch2@0: probed
>   DSA: switch 0 0 parsed
>   DSA: switch 0 1 parsed
>   DSA: switch 0 2 parsed
>   DSA: tree 0 parsed
>   Marvell 88E1540 !mdio-mux!mdio@1:00: attached PHY driver [Marvell 
> 88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@1:00, irq=212)
>   Marvell 88E1540 !mdio-mux!mdio@1:01: attached PHY driver [Marvell 
> 88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@1:01, irq=213)
>   Marvell 88E1540 !mdio-mux!mdio@1:02: attached PHY driver [Marvell 
> 88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@1:02, irq=214)
>   Marvell 88E1540 !mdio-mux!mdio@2:00: attached PHY driver [Marvell 
> 88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@2:00, irq=237)
>   Marvell 88E1540 !mdio-mux!mdio@2:01: attached PHY driver [Marvell 
> 88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@2:01, irq=238)
>   Marvell 88E1540 !mdio-mux!mdio@2:02: attached PHY driver [Marvell 
> 88E1540] (mii_bus:phy_addr=!mdio-mux!mdio@2:02, irq=239)
>   Unable to handle kernel NULL pointer dereference at virtual address 
> 0008
>   pgd = 80004000
>   [0008] *pgd=
>   Internal error: Oops: 17 [#1] ARM
>   Modules linked in:
>   CPU: 0 PID: 687 Comm: kworker/0:2 Not tainted 4.10.0-rc6 #115
>   Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
>   Workqueue: events deferred_probe_work_func
>   task: 9ecda000 task.stack: 9eca2000
>   PC is at phy_attach_direct+0x50/0x1a8
>   LR is at phy_connect_direct+0x24/0x5c
>   pc : [<8046d688>]lr : [<8046d8e0>]psr: 600a0013
>   sp : 9eca3ab8  ip : 9eca3ae8  fp : 9eca3ae4
>   r10:   r9 : 0002  r8 : 9ed02c00
>   r7 :   r6 : 9ed32800  r5 :   r4 : 9ed03000
>   r3 :   r2 :   r1 : 9ec62e00  r0 : 
>   Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
>   Control: 10c53c7d  Table: 9ed28059  DAC: 0051
>   Process kworker/0:2 (pid: 687, stack limit = 0x9eca2208)
>   Stack: (0x9eca3ab8 to 0x9eca4000)
>   3aa0:   0002 
> 9ed03000
>   3ac0:  80634064  9fff7724   9eca3b04 
> 9eca3ae8
>   3ae0: 8046d8e0 8046d644 0002 9ed32c60  9ed32800 9eca3b24 
> 9eca3b08
>   3b00: 80634174 8046d8c8  9ed32800 9f6b9000 9ec70210 9eca3b64 
> 9eca3b28
>   3b20: 806345c0 80634120 0001 0001 9ed32c60 9ec62e00 9f646e00 
> 9fff7724
>   3b40: 9f646e00 9ec70210  9f646f08 9ec70210 9ec70210 9eca3bbc 
> 9eca3b68
>   3b60: 806352a8 80634354  801f1dc8 9ec70210 9f646f04 9f646f08 
> 9f646f08
>   3b80: 9ec7001c  0002 0009 9eca3bbc 9ec62e00 9ec70010 
> 9ec7001c
>   3ba0:  9fff7450  000c 9eca3bf4 9eca3bc0 8047dabc 
> 80634b48
>   3bc0:  8123d2e4 9eca3be4 9eca3bd8 9ec62e00 81299770 81258d50 
> 
>   3be0: 8123d2e4  9eca3c04 9eca3bf8 8046e9ec 8047d6fc 9eca3c3c 
> 9eca3c08
>   3c00: 8040c2bc 8046e9d0 9eca3c24 9eca3c18 80521d10 0001 8123d2e4 
> 9ec62e00
>   3c20: 9eca3c88 8129974c   9eca3c5c 9eca3c40 8040c6d4 
> 8040c04c
>   3c40:  9eca3c88 8040c628 0001 9eca3c84 9eca3c60 8040a0fc 
> 8040c634
>   3c60: 9f5e57dc 9f73ce34 8041659c 9ec62e00 9ec62e34 8123a150 9eca3cac 
> 9eca3c88
>   3c80: 8040be90 8040a094 9ec62e00 0001 9ec62e08 9ec62e00 8123a150 
> 
>   3ca0: 9eca3cbc 9eca3cb0 8040c76c 8040bddc 9eca3cdc 9eca3cc0 8040b254 
> 8040c75c
>   3cc0: 81252e00 9ec62e08 9ed02840 9ec62e00 9eca3d1c 9eca3ce0 80409010 
> 8040b1cc
>   3ce0: 9eca3d10 9eca3cf0 80407800 8040749c 9eca3d14 9fff7450 9e

Re: [PATCH net-next 4/4] net/sched: cls_bpf: Use skip flags to reflect HW offload status

2017-02-10 Thread Jakub Kicinski
On Fri, 10 Feb 2017 18:33:13 +0200, Or Gerlitz wrote:
> On Fri, Feb 10, 2017 at 3:22 AM, Jakub Kicinski  wrote:
> > On Thu,  9 Feb 2017 16:18:08 +0200, Or Gerlitz wrote:  
> >> Currently there is no way of querying whether a filter is
> >> offloaded to HW or not when using both policy (no flag).
> >>
> >> Reuse the skip flags to show the insertion status by setting
> >> the skip_hw flag in case the filter wasn't offloaded.
> >>
> >> Signed-off-by: Or Gerlitz 
> >> ---
> >>  net/sched/cls_bpf.c | 17 +
> >>  1 file changed, 13 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> >> index d9c9701..91ba90d 100644
> >> --- a/net/sched/cls_bpf.c
> >> +++ b/net/sched/cls_bpf.c
> >> @@ -185,14 +185,23 @@ static int cls_bpf_offload(struct tcf_proto *tp, 
> >> struct cls_bpf_prog *prog,
> >>   return -EINVAL;
> >>   }
> >>   } else {
> >> - if (!tc_should_offload(dev, tp, prog->gen_flags))
> >> - return skip_sw ? -EINVAL : 0;
> >> + if (!tc_should_offload(dev, tp, prog->gen_flags)) {
> >> + if (tc_skip_sw(prog->gen_flags))
> >> + return -EINVAL;
> >> + prog->gen_flags |= TCA_CLS_FLAGS_SKIP_HW;
> >> + return 0;
> >> + }
> >>   cmd = TC_CLSBPF_ADD;
> >>   }
> >>
> >>   ret = cls_bpf_offload_cmd(tp, obj, cmd);
> >> - if (ret)
> >> - return skip_sw ? ret : 0;
> >> +
> >> + if (ret) {
> >> + if (skip_sw)
> >> + return ret;
> >> + prog->gen_flags |= TCA_CLS_FLAGS_SKIP_HW;
> >> + return 0;
> >> + }
> >>
> >>   obj->offloaded = true;  
> >
> > In cls_bpf we do store information about whether program is offloaded or
> > not already (see the @offloaded member).  Could we simplify the code
> > thanks to this?  
> 
> yeah, I felt like I don't fully understand the role of the offloaded
> member. As I wrote, this patch is compile tested only, I will be happy
> if you can test it post here a better version, I don't think we need
> to add/change the flags semantics, see next

The @offloaded member just tells us whether the program is offloaded.
We need it because unlike u32 and flower (I think) we have explicit
ADD and REPLACE.  Other filters just always do REPLACE.  I assume the
driver keeps track if there is already an associated rule in that case?

> > I'm obviously all for reporting whether tc objects are offloaded or not
> > but let me ask perhaps the silly question of why reuse the SKIP_HW flag?
> > We don't have to worry about flag bits running out, could it be clearer
> > to users to report whether object is present in HW using a new flag?  Or
> > even two flags for present/non-present so user doesn't have to ponder
> > what no flag means (old kernel or not offloaded?). I don't really mind
> > either way I'm just wondering what the motivation was and maybe how
> > others feel.  
> 
> yeah, the flags are a bit confusing to some people, but it's all about
> polarity..
> 
> when the flags were introduced few of us where in favor of "positive"
> polarity, that is with possibly three values: "sw only" "hw only" and
> "both" but that JJJ (Jiri/John/Jamal) consensus was to pick a
> "negative" polarity of "skip sw" "skip hw" and "default" which means
> the filter is in SW and possibly in HW. I think we can live with that
> semantics and this small series just helps for the default case, allow
> user-space to know if the filter was offloaded using the existing
> fields.
> 
> I am not in favor of making this more complex...

I'm 100% with you.  Restating my proposal was to leave the SKIP_* flags
with all their existing semantics and complexity and for reporting to
user space whether something got offloaded add new ones.  My opinion
is that it would make things simpler, but I'm happy with your version
if none else thinks this way.

To spell it out with this patchset we would get the following semantics
of flags in dumps:
 - no flags - offloaded || old kernel;
 - skip_hw - not offloaded (either on user request || no flag
   was set but offload could not happen);
 - skip_sw - offload only on explicit request.

What we could do if we want to add flags would be:
 - no flags - old kernel;
 - no_offload - not offloaded;
 - skip_hw | no_offload - not offloaded on explicit user request;
 - offload - offloaded opportunistically;
 - skip_sw | offload - offloaded on explicit request. 

Because of dealing with distributor's kernels and various other
backports checking if kernel is old is a pain in practice :|

> thanks for the feedback and review

I will try to test and provide a simplified version if possible soon
(sorry for taking long, I got relocated and I'm still sorting out my
setup).


Re: [PATCH v3] can: Fix kernel panic at security_sock_rcv_skb

2017-02-10 Thread Oliver Hartkopp

On 02/10/2017 04:16 PM, David Miller wrote:

From: Oliver Hartkopp 
Date: Fri, 10 Feb 2017 09:28:57 +0100


can you please check whether this upstream commit

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f1712c73714088a7252d276a57126d56c7d37e64

really was queued up for -stable?


You never need to ask me this question, it is presented always, here:


http://patchwork.ozlabs.org/bundle/davem/stable/?submitter=&state=*&q=&archive=

And it is indeed there.



Actually I was only aware of

http://patchwork.ozlabs.org/project/netdev/list/?state=*

which I checked before asking (of course). But there's no chance to 
figure out whether a patch is queued for -stable.


I'll put your stable queue URL into my bookmarks.

Thanks,
Oliver



Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> Fixed in the "net" tree with:
>
> 6d9f66ac7fec2a6ccd649e5909806dfe36f1fc25 ("net: phy: Fix PHY module
> checks and NULL deref in phy_attach_direct()"), applies fine to net-next
> as well.

Correct, this fixes my setup. Shouldn't this be submitted to net-next as
well then?

Thanks,

Vivien


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
> On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> Patch could be :
>

For me, clearly the data structure that use-after-free'd is struct sock
rather than struct packet_rollover.


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Eric Dumazet
On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
> On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
> > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
> >
> >> More likely the bug is in fanout_add(), with a buggy sequence in error
> >> case, and not correct locking.
> >>
> >> kfree(po->rollover);
> >> po->rollover = NULL;
> >>
> >> Two cpus entering fanout_add() (using the same af_packet socket,
> >> syzkaller courtesy...) might both see po->fanout being NULL.
> >>
> >> Then they grab the mutex.  Too late...
> >
> > Patch could be :
> >
> 
> For me, clearly the data structure that use-after-free'd is struct sock
> rather than struct packet_rollover.

Fine. But your patch makes absolutely no sense.




Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread Florian Fainelli
On 02/10/2017 09:55 AM, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> Fixed in the "net" tree with:
>>
>> 6d9f66ac7fec2a6ccd649e5909806dfe36f1fc25 ("net: phy: Fix PHY module
>> checks and NULL deref in phy_attach_direct()"), applies fine to net-next
>> as well.
> 
> Correct, this fixes my setup. Shouldn't this be submitted to net-next as
> well then?

I indicated in the patch that this was also applicable to "net-next"
after David merged "net" into "net-next" [1]. We are getting close from
releasing 4.10 now, so I am assuming that David will close net-next
soon, and that usually means merging "net" back into "net-next", which
will get us the fix. David, is that correct?

[1]: http://patchwork.ozlabs.org/patch/725923/

> 
> Thanks,
> 
> Vivien
> 


-- 
Florian


Re: [PATCH v4 0/3] Miscellaneous fixes for BPF (perf tree)

2017-02-10 Thread Arnaldo Carvalho de Melo
Em Wed, Feb 08, 2017 at 09:27:41PM +0100, Mickaël Salaün escreveu:
> This series brings some fixes and small improvements to the BPF samples.
> 
> This is intended for the perf tree and apply on 7a5980f9c006 ("tools lib bpf:
> Add missing header to the library").

Wang, are you ok with this series? Joe?

- Arnaldo
 
> Changes since v3:
> * remove applied patch 1/5
> * remove patch 2/5 on bpf_load_program() as requested by Wang Nan
> 
> Changes since v2:
> * add this cover letter
> 
> Changes since v1:
> * exclude patches not intended for the perf tree
> 
> Regards,
> 
> Mickaël Salaün (3):
>   samples/bpf: Ignore already processed ELF sections
>   samples/bpf: Reset global variables
>   samples/bpf: Add missing header
> 
>  samples/bpf/bpf_load.c | 7 +++
>  samples/bpf/tracex5_kern.c | 1 +
>  2 files changed, 8 insertions(+)
> 
> -- 
> 2.11.0


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan
 wrote:
> On (02/09/17 19:19), Eric Dumazet wrote:
>>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> I'm not sure I follow- aiui the panic was in acceessing the
> sk_receive_queue.lock in a socket that had been closed earlier. I think
> the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
> rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
> packet delivery can be done safely, and the synchronize_net in
> packet_release() makes sure that the Tx paths are quiesced before freeing
> the socket.  What is the race-hole here? Does it have to do with the
> _bh and softirq context, somehow?

My understanding about the race here is packet_release() doesn't
wait for flying packets correctly, which leads to a flying packet still
refers to the struct sock which is being released.

This could happen because struct packet_fanout is refcn'ted, it is
still there when this is not the last sock referring it, therefore, the
callback packet_rcv_fanout() is not removed yet. When packet_release()
tries to remove the pointer to struct sock from f->arr[i] in
__fanout_unlink(), a flying packet could race with f->arr[i]:

po = pkt_sk(f->arr[idx]);

Of course, the fix may not be as easy as just adding a synchronize_net(),
perhaps we need the spinlock too in fanout_demux_rollover().

At least I believe this explains the crash Dmitry reported.


Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency

2017-02-10 Thread Florian Fainelli
On 02/10/2017 12:20 AM, Arnd Bergmann wrote:
> On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli  wrote:
>> On 02/09/2017 07:08 AM, Arnd Bergmann wrote:
>> I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and
>> I was not able to reproduce this, what am I missing?
> 
> In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and
> we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in
> the header as a oneline workaround, but I think that would be more confusing
> to real users that try to use CONFIG_PHY=m without realizing why they lose
> access to their switch.

I see, this patch should also help fixing this:

http://patchwork.ozlabs.org/patch/726381/

-- 
Florian


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Fri, Feb 10, 2017 at 9:59 AM, Eric Dumazet  wrote:
> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
>> On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
>> > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>> >
>> >> More likely the bug is in fanout_add(), with a buggy sequence in error
>> >> case, and not correct locking.
>> >>
>> >> kfree(po->rollover);
>> >> po->rollover = NULL;
>> >>
>> >> Two cpus entering fanout_add() (using the same af_packet socket,
>> >> syzkaller courtesy...) might both see po->fanout being NULL.
>> >>
>> >> Then they grab the mutex.  Too late...
>> >
>> > Patch could be :
>> >
>>
>> For me, clearly the data structure that use-after-free'd is struct sock
>> rather than struct packet_rollover.
>
> Fine. But your patch makes absolutely no sense.

I don't have to give a 100% correct patch to prove my explanation
of the crash. At least it makes more sense than yours...


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Eric Dumazet
On Fri, 2017-02-10 at 09:59 -0800, Eric Dumazet wrote:
> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
> > On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
> > > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
> > >
> > >> More likely the bug is in fanout_add(), with a buggy sequence in error
> > >> case, and not correct locking.
> > >>
> > >> kfree(po->rollover);
> > >> po->rollover = NULL;
> > >>
> > >> Two cpus entering fanout_add() (using the same af_packet socket,
> > >> syzkaller courtesy...) might both see po->fanout being NULL.
> > >>
> > >> Then they grab the mutex.  Too late...
> > >
> > > Patch could be :
> > >
> > 
> > For me, clearly the data structure that use-after-free'd is struct sock
> > rather than struct packet_rollover.
> 
> Fine. But your patch makes absolutely no sense.

At least, Anoob patch is making a step into the right direction ;)

https://patchwork.ozlabs.org/patch/726532/





Re: [PATCH RFC net] net/mlx5e: Add preemption enable/disable around TC statistics upcall

2017-02-10 Thread Jakub Kicinski
On Fri, 10 Feb 2017 18:21:25 +0200, Or Gerlitz wrote:
> On Fri, Feb 10, 2017 at 3:34 AM, Jakub Kicinski wrote:
> > On Thu,  9 Feb 2017 17:38:43 +0200, Or Gerlitz wrote:  
> >> Running with CONFIG_PREEMPT set, I get a
> >>
> >> BUG: using smp_processor_id() in preemptible [] code: tc/3793
> >>
> >> asserion from the TC action (mirred) stats_update callback, when the do
> >>
> >>   _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets)
> >>
> >> As done by commit 66860be "nfp: bpf: allow offloaded filters to update 
> >> stats",
> >> disabling/enabling preemption around the TC upcall solves that.
> >>
> >> Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter 
> >> statistics support')
> >> Signed-off-by: Or Gerlitz 
> >> ---
> >>
> >> I marked it as RFC, since I wasn't fully sure on the nature of the
> >> problem, nor if this is the direction we should take to the fix.  
> 
> > I think it's the right fix  
> 
> Do you under the problem? what's wrong with the call done in the TC
> action code w.r.t preemption?
> 
> does it make sense to do this (say) 100K times/sec?

TC actions have pre-cpu stats, referencing them has to be done with
preemption disabled.  Let's CC Jamal and Cong - maybe there are some
more clever things we could do here?  The situation in a nutshell is
that the offload drivers read the stats from HW and want to write them
back to the TC action stats.  The writeback happens in process context
when user requests stats dump (potentially for multiple actions but we
currently would just iterate over all actions in driver code).


Re: [PATCH v4] net: ethernet: faraday: To support device tree usage.

2017-02-10 Thread Rob Herring
On Wed, Feb 8, 2017 at 5:59 AM, Greentime Hu  wrote:
> On Sat, Jan 28, 2017 at 6:17 AM, Rob Herring  wrote:
>>
>> On Wed, Jan 25, 2017 at 10:09:20PM +0100, Arnd Bergmann wrote:
>> > On Wed, Jan 25, 2017 at 6:34 PM, David Miller  wrote:
>> > > From: Greentime Hu 
>> > > Date: Tue, 24 Jan 2017 16:46:14 +0800
>> > >> We also use the same binding document to describe the same faraday 
>> > >> ethernet
>> > >> controller and add faraday to vendor-prefixes.txt.
>> > >
>> > > Why are you renaming the MOXA binding file instead of adding a 
>> > > completely new one
>> > > for faraday?  The MOXA one should stick around, I don't see a 
>> > > justification for
>> > > removing it.
>> >
>> > This was my suggestion, basically fixing the name of the existing
>> > binding, which was
>> > accidentally named after one of the users rather than the company that did 
>> > the
>> > hardware.
>> >
>> > We can't change the compatible string, but I'd much prefer having only
>> > one binding
>> > file for this device rather than two separate ones that could possibly 
>> > become
>> > incompatible in case we add new properties to them. If there is only
>> > one of them,
>> > naming it according to the hardware design is the general policy.
>> >
>> > Note that we currently have two separate device drivers, but that is more a
>> > historic artifact, and if we ever get around to merging them into one 
>> > driver,
>> > that should not impact the binding.
>>
>> The change is fine with me, but the subject and commit message need some
>> work.
>
> Hi, Rob:
>
> Would you please advise me of the proper subject and commit messages?

Split the binding to a separate commit and summarize the email
discussion here. For a subject, something like this:

"dt-bindings: net: generalize moxart-mac to support all faraday based ftmac IP"

Rob


Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread Florian Fainelli
On 02/10/2017 10:15 AM, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn  writes:
> 
>> David will at some point merge net into net-next.
> 
> Yes I know that, I just wasn't sure if having such crash in net-next was
> tolerated or not. Cherry-picking 6d9f66ac7fec does the job on my side.
> 
>> Until then, you can work around the issue by enabling the PHY drivers
>> for you hardware. You are also likely to gain a few nice features,
>> like PHY interrupts rather than polling, maybe some temperature
>> sensors, PHY statistics, etc...
> 
> Hum I have CONFIG_MARVELL_PHY enabled, am I missing something?

If you have fixed PHYs they'll use Generic PHY, suddenly the dungeons
collapses, you die.
-- 
Florian


Re: [patch net-next 00/10] mlxsw: Offload MC flood for unregister MC

2017-02-10 Thread David Miller
From: Jiri Pirko 
Date: Thu,  9 Feb 2017 14:54:39 +0100

> From: Jiri Pirko 
> 
> Nogah says:
> 
> When multicast is enabled, the Linux bridge floods unregistered multicast
> packets only to ports connected to a multicast router. Devices capable of
> offloading the Linux bridge need to be made aware of such ports, for
> proper flooding behavior.
> On the other hand, when multicast is disabled, such packets should be
> flooded to all ports. This patchset aims to fix that, by offloading
> the multicast state and the list of multicast router ports.
> 
> The first 3 patches adds switchdev attributes to offload this data.
> The rest of the patchset add implementation for handling this data in the
> mlxsw driver.
> 
> The effects this data has on the MDB (namely, when the multicast is
> disabled the MDB should be considered as invalid, and when it is enabled, a
> packet that is flooded by it should also be flooded to the multicast
> routers ports) is subject of future work.
> 
> Testing of this patchset included:
> Sending 3 mc packets streams, LL, register and unregistered, and checking
> that they reached only to the ports that should have received them.
> The configs were:
> mc disabled, mc without mc router ports and mc with fixed router port.
> It was checked for vlan aware bridge, vlan unaware bridge and vlan unaware
> bridge with another vlan unaware bridge on the same machine

Series applied, thanks.


Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag

2017-02-10 Thread Alexei Starovoitov
On Thu, Feb 09, 2017 at 10:59:23AM -0800, Alexei Starovoitov wrote:
> Andy,
> does it all make sense?

Andy, ping.



Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread David Miller
From: Vivien Didelot 
Date: Fri, 10 Feb 2017 12:55:44 -0500

> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> Fixed in the "net" tree with:
>>
>> 6d9f66ac7fec2a6ccd649e5909806dfe36f1fc25 ("net: phy: Fix PHY module
>> checks and NULL deref in phy_attach_direct()"), applies fine to net-next
>> as well.
> 
> Correct, this fixes my setup. Shouldn't this be submitted to net-next as
> well then?

It will propagate there the next time I merge to Linus and then merge
net into net-next.


  1   2   >