[PATCH 1/1] ixgbe: force to synchronize reporting "link on" and getting speed and duplex

2015-12-22 Thread zyjzyj2000
From: Zhu Yanjun 

In X540 NIC, there is a time span between reporting "link on" and
getting the speed and duplex. To a bonding driver in 802.3ad mode,
this time span will make it not work well if the time span is big
enough. The big time span will make bonding driver change the state of
the slave device to up while the speed and duplex of the slave device
can not be gotten. Later the bonding driver will not have change to
get the speed and duplex of the slave device. The speed and duplex of
the slave device are important to a bonding driver in 802.3ad mode.

To 82599_SFP NIC and other kinds of NICs, this problem does
not exist. As such, it is necessary for X540 to report"link on" when
the link speed is not IXGBE_LINK_SPEED_UNKNOWN.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index aed8d02..cb9d310 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6479,7 +6479,21 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
   (flow_rx ? "RX" :
   (flow_tx ? "TX" : "None";
 
-   netif_carrier_on(netdev);
+   /*
+* In X540 NIC, there is a time span between reporting "link on"
+* and getting the speed and duplex. To a bonding driver in 802.3ad
+* mode, this time span will make it not work well if the time span
+* is big enough. To 82599_SFP NIC and other kinds of NICs, this
+* problem does not exist. As such, it is better for X540 to report
+* "link on" when the link speed is not IXGBE_LINK_SPEED_UNKNOWN.
+*/
+   if ((hw->mac.type == ixgbe_mac_X540) &&
+   (link_speed != IXGBE_LINK_SPEED_UNKNOWN)) {
+   netif_carrier_on(netdev);
+   } else {
+   netif_carrier_on(netdev);
+   }
+
ixgbe_check_vf_rate_limit(adapter);
 
/* enable transmits */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 4/4] cxgb4: Use napi_complete_done() api in napi handler

2015-12-22 Thread Hariprasad Shenai
Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 138be46..5e3ffa7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2288,7 +2288,7 @@ static int napi_rx_handler(struct napi_struct *napi, int 
budget)
if (likely(work_done < budget)) {
int timer_index;
 
-   napi_complete(napi);
+   napi_complete_done(napi, work_done);
timer_index = QINTR_TIMER_IDX_G(q->next_intr_params);
 
if (q->adaptive_rx) {
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 3/4] cxgb4: Use the node info to alloc_ring() for RX queues

2015-12-22 Thread Hariprasad Shenai
Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 0333435..138be46 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2555,7 +2555,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
iq->size = roundup(iq->size, 16);
 
iq->desc = alloc_ring(adap->pdev_dev, iq->size, iq->iqe_len, 0,
- &iq->phys_addr, NULL, 0, NUMA_NO_NODE);
+ &iq->phys_addr, NULL, 0,
+ dev_to_node(adap->pdev_dev));
if (!iq->desc)
return -ENOMEM;
 
@@ -2595,7 +2596,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
fl->size = roundup(fl->size, 8);
fl->desc = alloc_ring(adap->pdev_dev, fl->size, sizeof(__be64),
  sizeof(struct rx_sw_desc), &fl->addr,
- &fl->sdesc, s->stat_len, NUMA_NO_NODE);
+ &fl->sdesc, s->stat_len,
+ dev_to_node(adap->pdev_dev));
if (!fl->desc)
goto fl_nomem;
 
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 0/4] Trivial enhancements for cxgb4

2015-12-22 Thread Hariprasad Shenai
Hi

This series adds a debug message if adapter isn't inserted in right PCI
slot. Changes naming conventions for iSCSI rx queues, use node info while
allocating rx queue and use napi_complete_done() api in napi handler.

This patch series has been created against net-next tree and includes
patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.

Thanks

V2: Dropped 'dcb_info' debug entry patch, since the same can be achieved
using lldp tool.
Based on review comments by Or Gerlitz  and
David Miller.

Hariprasad Shenai (4):
  cxgb4: Warn if device doesn't have enough PCI bandwidth
  cxgb4: get naming correct for iscsi queues
  cxgb4: Use the node info to alloc_ring() for RX queues
  cxgb4: Use napi_complete_done() api in napi handler

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  13 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |   8 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 133 -
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  10 +-
 4 files changed, 121 insertions(+), 43 deletions(-)

-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 2/4] cxgb4: get naming correct for iscsi queues

2015-12-22 Thread Hariprasad Shenai
All the upper level protocols like rdma, iscsi have their own offload rx
queues, so instead of using the generic naming convention be specific
while naming them. Improves code readability

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 13 +++---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  8 ++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 53 +++---
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  2 +-
 4 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index e01e722..3b59bc4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -398,11 +398,10 @@ struct link_config {
 
 enum {
MAX_ETH_QSETS = 32,   /* # of Ethernet Tx/Rx queue sets */
-   MAX_OFLD_QSETS = 16,  /* # of offload Tx/Rx queue sets */
+   MAX_OFLD_QSETS = 16,  /* # of offload Tx, iscsi Rx queue sets */
MAX_CTRL_QUEUES = NCHAN,  /* # of control Tx queues */
MAX_RDMA_QUEUES = NCHAN,  /* # of streaming RDMA Rx queues */
MAX_RDMA_CIQS = 32,/* # of  RDMA concentrator IQs */
-   MAX_ISCSI_QUEUES = NCHAN, /* # of streaming iSCSI Rx queues */
 };
 
 enum {
@@ -420,7 +419,7 @@ enum {
INGQ_EXTRAS = 2,/* firmware event queue and */
/*   forwarded interrupts */
MAX_INGQ = MAX_ETH_QSETS + MAX_OFLD_QSETS + MAX_RDMA_QUEUES
-  + MAX_RDMA_CIQS + MAX_ISCSI_QUEUES + INGQ_EXTRAS,
+  + MAX_RDMA_CIQS + INGQ_EXTRAS,
 };
 
 struct adapter;
@@ -639,7 +638,7 @@ struct sge {
struct sge_ctrl_txq ctrlq[MAX_CTRL_QUEUES];
 
struct sge_eth_rxq ethrxq[MAX_ETH_QSETS];
-   struct sge_ofld_rxq ofldrxq[MAX_OFLD_QSETS];
+   struct sge_ofld_rxq iscsirxq[MAX_OFLD_QSETS];
struct sge_ofld_rxq rdmarxq[MAX_RDMA_QUEUES];
struct sge_ofld_rxq rdmaciq[MAX_RDMA_CIQS];
struct sge_rspq fw_evtq cacheline_aligned_in_smp;
@@ -650,10 +649,10 @@ struct sge {
u16 max_ethqsets;   /* # of available Ethernet queue sets */
u16 ethqsets;   /* # of active Ethernet queue sets */
u16 ethtxq_rover;   /* Tx queue to clean up next */
-   u16 ofldqsets;  /* # of active offload queue sets */
+   u16 iscsiqsets;  /* # of active iSCSI queue sets */
u16 rdmaqs; /* # of available RDMA Rx queues */
u16 rdmaciqs;   /* # of available RDMA concentrator IQs */
-   u16 ofld_rxq[MAX_OFLD_QSETS];
+   u16 iscsi_rxq[MAX_OFLD_QSETS];
u16 rdma_rxq[MAX_RDMA_QUEUES];
u16 rdma_ciq[MAX_RDMA_CIQS];
u16 timer_val[SGE_NTIMERS];
@@ -679,7 +678,7 @@ struct sge {
 };
 
 #define for_each_ethrxq(sge, i) for (i = 0; i < (sge)->ethqsets; i++)
-#define for_each_ofldrxq(sge, i) for (i = 0; i < (sge)->ofldqsets; i++)
+#define for_each_iscsirxq(sge, i) for (i = 0; i < (sge)->iscsiqsets; i++)
 #define for_each_rdmarxq(sge, i) for (i = 0; i < (sge)->rdmaqs; i++)
 #define for_each_rdmaciq(sge, i) for (i = 0; i < (sge)->rdmaciqs; i++)
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 0d579b1..62a343f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2245,7 +2245,7 @@ static int sge_qinfo_show(struct seq_file *seq, void *v)
 {
struct adapter *adap = seq->private;
int eth_entries = DIV_ROUND_UP(adap->sge.ethqsets, 4);
-   int iscsi_entries = DIV_ROUND_UP(adap->sge.ofldqsets, 4);
+   int iscsi_entries = DIV_ROUND_UP(adap->sge.iscsiqsets, 4);
int rdma_entries = DIV_ROUND_UP(adap->sge.rdmaqs, 4);
int ciq_entries = DIV_ROUND_UP(adap->sge.rdmaciqs, 4);
int ctrl_entries = DIV_ROUND_UP(MAX_CTRL_QUEUES, 4);
@@ -2331,10 +2331,10 @@ do { \
 
} else if (iscsi_idx < iscsi_entries) {
const struct sge_ofld_rxq *rx =
-   &adap->sge.ofldrxq[iscsi_idx * 4];
+   &adap->sge.iscsirxq[iscsi_idx * 4];
const struct sge_ofld_txq *tx =
&adap->sge.ofldtxq[iscsi_idx * 4];
-   int n = min(4, adap->sge.ofldqsets - 4 * iscsi_idx);
+   int n = min(4, adap->sge.iscsiqsets - 4 * iscsi_idx);
 
S("QType:", "iSCSI");
T("TxQ ID:", q.cntxt_id);
@@ -2454,7 +2454,7 @@ do { \
 static int sge_queue_entries(const struct adapter *adap)
 {
return DIV_ROUND_UP(adap->sge.ethqsets, 4) +
-  DIV_ROUND_UP(adap->sge.ofldqsets, 4) +
+  DIV_ROUND_UP(adap->sge.iscsiqsets, 4) +
   DIV_ROUND_UP(adap->sge.rdmaqs, 4) +
   DIV_ROUND_UP(adap->sge.rdmaciqs, 4) +
   DIV_ROUND

[PATCHv2 net-next 1/4] cxgb4: Warn if device doesn't have enough PCI bandwidth

2015-12-22 Thread Hariprasad Shenai
Check if the device get enough bandwidth from the entire PCI chain to
satisfy its capabilities. This patch determines the PCIe device's
bandwidth capabilities by reading its PCIe Link Capabilities registers
and then call the pcie_get_minimum_link function to ensure that the
adapter is hooked into a slot which is capable of providing the
necessary bandwidth capabilities.

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 80 -
 1 file changed, 78 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 8490c84..8326a776 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4532,6 +4532,79 @@ static int init_rss(struct adapter *adap)
return 0;
 }
 
+static int cxgb4_get_pcie_dev_link_caps(struct adapter *adap,
+   enum pci_bus_speed *speed,
+   enum pcie_link_width *width)
+{
+   u32 lnkcap1, lnkcap2;
+   int err1, err2;
+
+#define  PCIE_MLW_CAP_SHIFT 4   /* start of MLW mask in link capabilities */
+
+   *speed = PCI_SPEED_UNKNOWN;
+   *width = PCIE_LNK_WIDTH_UNKNOWN;
+
+   err1 = pcie_capability_read_dword(adap->pdev, PCI_EXP_LNKCAP,
+ &lnkcap1);
+   err2 = pcie_capability_read_dword(adap->pdev, PCI_EXP_LNKCAP2,
+ &lnkcap2);
+   if (!err2 && lnkcap2) { /* PCIe r3.0-compliant */
+   if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
+   *speed = PCIE_SPEED_8_0GT;
+   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_5_0GB)
+   *speed = PCIE_SPEED_5_0GT;
+   else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_2_5GB)
+   *speed = PCIE_SPEED_2_5GT;
+   }
+   if (!err1) {
+   *width = (lnkcap1 & PCI_EXP_LNKCAP_MLW) >> PCIE_MLW_CAP_SHIFT;
+   if (!lnkcap2) { /* pre-r3.0 */
+   if (lnkcap1 & PCI_EXP_LNKCAP_SLS_5_0GB)
+   *speed = PCIE_SPEED_5_0GT;
+   else if (lnkcap1 & PCI_EXP_LNKCAP_SLS_2_5GB)
+   *speed = PCIE_SPEED_2_5GT;
+   }
+   }
+
+   if (*speed == PCI_SPEED_UNKNOWN || *width == PCIE_LNK_WIDTH_UNKNOWN)
+   return err1 ? err1 : err2 ? err2 : -EINVAL;
+   return 0;
+}
+
+static void cxgb4_check_pcie_caps(struct adapter *adap)
+{
+   enum pcie_link_width width, width_cap;
+   enum pci_bus_speed speed, speed_cap;
+
+#define PCIE_SPEED_STR(speed) \
+   (speed == PCIE_SPEED_8_0GT ? "8.0GT/s" : \
+speed == PCIE_SPEED_5_0GT ? "5.0GT/s" : \
+speed == PCIE_SPEED_2_5GT ? "2.5GT/s" : \
+"Unknown")
+
+   if (cxgb4_get_pcie_dev_link_caps(adap, &speed_cap, &width_cap)) {
+   dev_warn(adap->pdev_dev,
+"Unable to determine PCIe device BW capabilities\n");
+   return;
+   }
+
+   if (pcie_get_minimum_link(adap->pdev, &speed, &width) ||
+   speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN) {
+   dev_warn(adap->pdev_dev,
+"Unable to determine PCI Express bandwidth.\n");
+   return;
+   }
+
+   dev_info(adap->pdev_dev, "PCIe link speed is %s, device supports %s\n",
+PCIE_SPEED_STR(speed), PCIE_SPEED_STR(speed_cap));
+   dev_info(adap->pdev_dev, "PCIe link width is x%d, device supports 
x%d\n",
+width, width_cap);
+   if (speed < speed_cap || width < width_cap)
+   dev_info(adap->pdev_dev,
+"A slot with more lanes and/or higher speed is "
+"suggested for optimal performance.\n");
+}
+
 static void print_port_info(const struct net_device *dev)
 {
char buf[80];
@@ -4559,10 +4632,10 @@ static void print_port_info(const struct net_device 
*dev)
--bufp;
sprintf(bufp, "BASE-%s", t4_get_port_type_description(pi->port_type));
 
-   netdev_info(dev, "Chelsio %s rev %d %s %sNIC PCIe x%d%s%s\n",
+   netdev_info(dev, "Chelsio %s rev %d %s %sNIC %s\n",
adap->params.vpd.id,
CHELSIO_CHIP_RELEASE(adap->params.chip), buf,
-   is_offload(adap) ? "R" : "", adap->params.pci.width, spd,
+   is_offload(adap) ? "R" : "",
(adap->flags & USING_MSIX) ? " MSI-X" :
(adap->flags & USING_MSI) ? " MSI" : "");
netdev_info(dev, "S/N: %s, P/N: %s\n",
@@ -4908,6 +4981,9 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
else if (msi > 0 && pci_enable_msi(pdev) == 0)
adapter->flags |= USING_MSI;
 
+   /* check for PCI Express bandwidth capabiltites */
+   c

Re: [PATCH net-next 3/5] cxgb4: add dcb info node in debugfs

2015-12-22 Thread Hariprasad Shenai
On Tue, Dec 22, 2015 at 15:45:52 -0500, David Miller wrote:
> From: Or Gerlitz 
> Date: Mon, 21 Dec 2015 09:33:22 +0200
> 
> > On Mon, Dec 21, 2015 at 9:16 AM, Hariprasad Shenai
> >  wrote:
> >> Add new /sys/kernel/debug/cxgb4/*/dcb_info node to dump out
> >> various Data Center Bridging information.
> > 
> > why? what's wrong with using the lldp tool for that purpose?
> 
> Agreed, and I don't like your explanation.
> 
> Even if you are using firmware managed DCB, the lldp tool should be
> usable for querying.
> 
> People need to stop putting so much crap into debugfs, it's a serious
> pet peeve of mine.
> 
> Every piece of driver unique interface crap you put into debugfs is a
> _HARDSHIP_ for the user.  Because they have to learn a unique way to
> do X in every driver that tries to export the same kind of
> functionality.

Will drop this from the series for now and send V2 for the same.
Will send a separate series for adding lldptool support for firmware
managed DCB.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] r8152: fix lockup when runtime PM is enabled

2015-12-22 Thread Hayes Wang
Oliver Neukum [mailto:oneu...@suse.com]
[...]
> It is clear to me that you cannot get away with using the same operation
> for resume() and reset_resume() in your driver. It is fundamentally
> impossible. Firmware cannot fix it.

I would think how to fix it.

> Sorry for the length of the explanation.

Thanks for your response. I have some questions. What are the flows when
the system resume follows a system suspend which follows a autosuspend?
Are they as following?

1. suspend() with PMSG_IS_AUTO for autosuspned.
2. suspend() for system suspend.
3. resume() for system resume.

And, should the device exist autosuspend before (2)? 

Best Regards,
Hayes



[PATCH 3/3] drivers: net: cpsw: use of_phy_connect() in fixed-link case

2015-12-22 Thread David Rivshin (Allworx)
From: David Rivshin 

If a fixed-link DT subnode is used, the phy_device was looked up so
that a PHY ID string could be constructed and passed to phy_connect().
This is not necessary, as the device_node can be passed directly to
of_phy_connect() instead. This reuses the same codepath as if the
phy-handle DT property was used.

Signed-off-by: David Rivshin 
---
 drivers/net/ethernet/ti/cpsw.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index f9029e7..94b818c 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2038,29 +2038,21 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
"phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", &lenp);
if (slave_data->phy_node) {
dev_dbg(&pdev->dev,
"slave[%d] using phy-handle=\"%s\"\n",
i, slave_data->phy_node->full_name);
} else if (of_phy_is_fixed_link(slave_node)) {
-   struct device_node *phy_node;
-   struct phy_device *phy_dev;
-
/* In the case of a fixed PHY, the DT node associated
 * to the PHY is the Ethernet MAC DT node.
 */
ret = of_phy_register_fixed_link(slave_node);
if (ret)
return ret;
-   phy_node = of_node_get(slave_node);
-   phy_dev = of_phy_find_device(phy_node);
-   if (!phy_dev)
-   return -ENODEV;
-   snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
-PHY_ID_FMT, phy_dev->bus->id, phy_dev->addr);
+   slave_data->phy_node = of_node_get(slave_node);
} else if (parp) {
u32 phyid;
struct device_node *mdio_node;
struct platform_device *mdio;
 
if (lenp != (sizeof(__be32) * 2)) {
dev_err(&pdev->dev, "Invalid slave[%d] phy_id 
property\n", i);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config

2015-12-22 Thread David Rivshin (Allworx)
From: David Rivshin 

Commit 9e42f715264ff158478fa30eaed847f6e131366b ("drivers: net: cpsw: add
phy-handle parsing") saved the "phy-handle" phandle into a new cpsw_priv
field. However, phy connections are per-slave, so the phy_node field should
be in cpsw_slave_data rather than cpsw_priv.

This would go unnoticed in a single emac configuration. But in dual_emac
mode, the last "phy-handle" property parsed for either slave would be used
by both of them, causing them both to refer to the same phy_device.

Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin 
---
You may want to consider this for 4.3-stable. It manages to apply
on top of v4.3.3 with 'git am -C1', or I can produce a separate
patch against v4.3.3 if preferred.

 drivers/net/ethernet/ti/cpsw.c | 13 ++---
 drivers/net/ethernet/ti/cpsw.h |  1 +
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 3b489ca..8ad0ed8 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -363,15 +363,14 @@ static inline void slave_write(struct cpsw_slave *slave, 
u32 val, u32 offset)
__raw_writel(val, slave->regs + offset);
 }
 
 struct cpsw_priv {
spinlock_t  lock;
struct platform_device  *pdev;
struct net_device   *ndev;
-   struct device_node  *phy_node;
struct napi_struct  napi_rx;
struct napi_struct  napi_tx;
struct device   *dev;
struct cpsw_platform_data   data;
struct cpsw_ss_regs __iomem *regs;
struct cpsw_wr_regs __iomem *wr_regs;
u8 __iomem  *hw_stats;
@@ -1144,16 +1143,16 @@ static void cpsw_slave_open(struct cpsw_slave *slave, 
struct cpsw_priv *priv)
 
if (priv->data.dual_emac)
cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port);
else
cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
   1 << slave_port, 0, 0, ALE_MCAST_FWD_2);
 
-   if (priv->phy_node)
-   slave->phy = of_phy_connect(priv->ndev, priv->phy_node,
+   if (slave->data->phy_node)
+   slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
 &cpsw_adjust_link, 0, slave->data->phy_if);
else
slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
 &cpsw_adjust_link, slave->data->phy_if);
if (IS_ERR(slave->phy)) {
dev_err(priv->dev, "phy %s not found on slave %d\n",
slave->data->phy_id, slave->slave_num);
@@ -1936,20 +1935,19 @@ static void cpsw_slave_init(struct cpsw_slave *slave, 
struct cpsw_priv *priv,
 
slave->data = data;
slave->regs = regs + slave_reg_ofs;
slave->sliver   = regs + sliver_reg_ofs;
slave->port_vlan = data->dual_emac_res_vlan;
 }
 
-static int cpsw_probe_dt(struct cpsw_priv *priv,
+static int cpsw_probe_dt(struct cpsw_platform_data *data,
 struct platform_device *pdev)
 {
struct device_node *node = pdev->dev.of_node;
struct device_node *slave_node;
-   struct cpsw_platform_data *data = &priv->data;
int i = 0, ret;
u32 prop;
 
if (!node)
return -EINVAL;
 
if (of_property_read_u32(node, "slaves", &prop)) {
@@ -2029,15 +2027,16 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
int lenp;
const __be32 *parp;
 
/* This is no slave child node, continue */
if (strcmp(slave_node->name, "slave"))
continue;
 
-   priv->phy_node = of_parse_phandle(slave_node, "phy-handle", 0);
+   slave_data->phy_node = of_parse_phandle(slave_node,
+   "phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", &lenp);
if (of_phy_is_fixed_link(slave_node)) {
struct device_node *phy_node;
struct phy_device *phy_dev;
 
/* In the case of a fixed PHY, the DT node associated
 * to the PHY is the Ethernet MAC DT node.
@@ -2270,15 +2269,15 @@ static int cpsw_probe(struct platform_device *pdev)
 * This may be required here for child devices.
 */
pm_runtime_enable(&pdev->dev);
 
/* Select default pin state */
pinctrl_pm_select_default_state(&pdev->dev);
 
-   if (cpsw_probe_dt(priv, pdev)) {
+   if (cpsw_probe_dt(&priv->data, pdev)) {
dev_err(&pdev->dev, "cpsw: platform data missing\n");
ret = -ENODEV;
goto clean_runtime_disable_ret;
}
data = &priv->da

[PATCH 2/3] drivers: net: cpsw: fix error messages when using phy-handle DT property

2015-12-22 Thread David Rivshin (Allworx)
From: David Rivshin 

The phy-handle, phy_id, and fixed-link properties are mutually exclusive,
and only one need be specified. However if phy-handle was specified, an
error message would complain about the lack of phy_id or fixed-link.

Also, if phy-handle was specified and the subsequent of_phy_connect()
failed, the error message still referenced slaved->data->phy_id, which
would be empty. Instead, use the name of the device_node as a useful
identifier.

Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin 
---
This would require some adjustments to backport to 4.3-stable due to
other changes in this area. Let me know if you want a version of this
against v4.3.3.

 Documentation/devicetree/bindings/net/cpsw.txt |  4 ++--
 drivers/net/ethernet/ti/cpsw.c | 17 +
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/cpsw.txt 
b/Documentation/devicetree/bindings/net/cpsw.txt
index 28a4781..3033c0f 100644
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -46,16 +46,16 @@ Optional properties:
 - dual_emac_res_vlan   : Specifies VID to be used to segregate the ports
 - mac-address  : See ethernet.txt file in the same directory
 - phy_id   : Specifies slave phy id
 - phy-handle   : See ethernet.txt file in the same directory
 
 Slave sub-nodes:
 - fixed-link   : See fixed-link.txt file in the same directory
- Either the property phy_id, or the sub-node
- fixed-link can be specified
+
+Note: Exactly one of phy_id, phy-handle, or fixed-link must be specified.
 
 Note: "ti,hwmods" field is used to fetch the base address and irq
 resources from TI, omap hwmod data base during device registration.
 Future plan is to migrate hwmod data base contents into device tree
 blob so that, all the required data will be used from device tree dts
 file.
 
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 8ad0ed8..f9029e7 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1150,16 +1150,19 @@ static void cpsw_slave_open(struct cpsw_slave *slave, 
struct cpsw_priv *priv)
if (slave->data->phy_node)
slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
 &cpsw_adjust_link, 0, slave->data->phy_if);
else
slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
 &cpsw_adjust_link, slave->data->phy_if);
if (IS_ERR(slave->phy)) {
-   dev_err(priv->dev, "phy %s not found on slave %d\n",
-   slave->data->phy_id, slave->slave_num);
+   dev_err(priv->dev, "phy \"%s\" not found on slave %d\n",
+   slave->data->phy_node ?
+   slave->data->phy_node->full_name :
+   slave->data->phy_id,
+   slave->slave_num);
slave->phy = NULL;
} else {
dev_info(priv->dev, "phy found : id is : 0x%x\n",
 slave->phy->phy_id);
phy_start(slave->phy);
 
/* Configure GMII_SEL register */
@@ -2030,15 +2033,19 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
/* This is no slave child node, continue */
if (strcmp(slave_node->name, "slave"))
continue;
 
slave_data->phy_node = of_parse_phandle(slave_node,
"phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", &lenp);
-   if (of_phy_is_fixed_link(slave_node)) {
+   if (slave_data->phy_node) {
+   dev_dbg(&pdev->dev,
+   "slave[%d] using phy-handle=\"%s\"\n",
+   i, slave_data->phy_node->full_name);
+   } else if (of_phy_is_fixed_link(slave_node)) {
struct device_node *phy_node;
struct phy_device *phy_dev;
 
/* In the case of a fixed PHY, the DT node associated
 * to the PHY is the Ethernet MAC DT node.
 */
ret = of_phy_register_fixed_link(slave_node);
@@ -2066,15 +2073,17 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
if (!mdio) {
dev_err(&pdev->dev, "Missing mdio platform 
device\n");
return -EINVAL;
}
snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
 PHY_ID_FMT, mdio->name, phyid);
} else {
-   dev_err(

[PATCH 0/3] drivers: net: cpsw: phy-handle fixes

2015-12-22 Thread David Rivshin (Allworx)
From: David Rivshin 

This series is based on the tip of the net tree.

The first patch fixes a bug that makes dual_emac mode break if
either slave uses the phy-handle property in the devicetree.

The second patch fixes some cosmetic problems with error messages,
and also makes the binding documentation more explicit.

The third patch cleans up the fixed-link case to work like
the now-fixed phy-handle case.

I have tested on the following hardware configurations:
 - (EVMSK) dual emac, phy_id property in both slaves
 - (BeagleBoneBlack) single emac, phy_id property
 - (custom) single emac, fixed-link subnode
Note that I don't have a board which would uses a phy-handle property,
though I have used hacked devicetrees to exercise the code paths.
Testing by anyone who has real hardware using phy-handle or dual_emac
with fixed-link would be appreciated.

David Rivshin (3):
  drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac
config
  drivers: net: cpsw: fix error messages when using phy-handle DT
property
  drivers: net: cpsw: use of_phy_connect() in fixed-link case

 Documentation/devicetree/bindings/net/cpsw.txt |  4 +--
 drivers/net/ethernet/ti/cpsw.c | 40 +-
 drivers/net/ethernet/ti/cpsw.h |  1 +
 3 files changed, 23 insertions(+), 22 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/4] soreuseport: define reuseport groups

2015-12-22 Thread Craig Gallek
On Tue, Dec 22, 2015 at 5:11 PM, kbuild test robot  wrote:
> Hi Craig,
>
> [auto build test ERROR on net-next/master]
>
> url:
> https://github.com/0day-ci/linux/commits/Craig-Gallek/Faster-SO_REUSEPORT/20151223-040911
> config: arm-mvebu_v7_defconfig (attached as .config)
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm
>
> Note: the linux-review/Craig-Gallek/Faster-SO_REUSEPORT/20151223-040911 HEAD 
> 660edb1fa7e1a2bf71ef9cbf4555cda4af95ea58 builds fine.
>   It only hurts bisectibility.
>
> All errors (new ones prefixed by >>):
>
>In file included from include/linux/spinlock_types.h:13:0,
> from include/net/sock_reuseport.h:4,
> from net/core/sock_reuseport.c:7:
>>> arch/arm/include/asm/spinlock_types.h:12:3: error: unknown type name 'u32'
>   u32 slock;
>   ^
>>> arch/arm/include/asm/spinlock_types.h:18:4: error: unknown type name 'u16'
>u16 owner;
>^
>arch/arm/include/asm/spinlock_types.h:19:4: error: unknown type name 'u16'
>u16 next;
>^
>arch/arm/include/asm/spinlock_types.h:28:2: error: unknown type name 'u32'
>  u32 lock;
>  ^
>
> vim +/u32 +12 arch/arm/include/asm/spinlock_types.h
>
> fb1c8f93 include/asm-arm/spinlock_types.h  Ingo Molnar 2005-09-10   6  
> #endif
> fb1c8f93 include/asm-arm/spinlock_types.h  Ingo Molnar 2005-09-10   7
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06   8  
> #define TICKET_SHIFT 16
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06   9
> fb1c8f93 include/asm-arm/spinlock_types.h  Ingo Molnar 2005-09-10  10  
> typedef struct {
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  11 
>   union {
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 @12 
>   u32 slock;
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  13 
>   struct __raw_tickets {
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  14  
> #ifdef __ARMEB__
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  15 
>   u16 next;
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  16 
>   u16 owner;
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  17  
> #else
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 @18 
>   u16 owner;
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  19 
>   u16 next;
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  20  
> #endif
> 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  21 
>   } tickets;
>
> :: The code at line 12 was first introduced by commit
> :: 546c2896a42202dbc7d02f7c6ec9948ac1bf511b ARM: 7446/1: spinlock: use 
> ticket algorithm for ARMv6+ locking implementation
>
> :: TO: Will Deacon 
> :: CC: Russell King 
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation

ACK, I don't even need spinlock_types for this file anymore (though it
may be worth fixing the arm version to include linux/types.h).  Will
remove for v2
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 16/17] calipso: Add validation of CALIPSO option.

2015-12-22 Thread Huw Davies
On Tue, Dec 22, 2015 at 10:47:43PM +0100, Hannes Frederic Sowa wrote:
> On 22.12.2015 17:59, Huw Davies wrote:
> > I'm confused about this one.  AFAICS, this will drop packets that we
> > can't process.  We don't send the icmp error, but I can certainly add
> > that.  Is that what you mean?
> 
> Actually, the implementation of calipso_validate will accept the packets
> because it defaults to return true if we don't compile the module. At
> least we should drop the packet if it is not loaded. I am in favor of
> adding the parameter problem icmp error. So, yes, I think it should be
> added.

Yet the option value is 0x07, i.e. the two highest bits are both zero
which according to:
https://tools.ietf.org/html/rfc2460#section-4.2
means we should just skip it.

https://tools.ietf.org/html/rfc5570#section-5.1.1
reaffirms that.

In terms of sending an icmp on error while validating:
https://tools.ietf.org/html/rfc5570#section-6.2.2
is pretty conservative in that case too.  Most errors
should just be silently dropped.

Huw.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] phy: micrel: Fix finding PHY properties in MAC node for KSZ9031.

2015-12-22 Thread David Miller
From: Andrew Lunn 
Date: Tue, 22 Dec 2015 12:06:34 +0100

> On Tue, Dec 22, 2015 at 11:58:40AM +0100, Henri Roosen wrote:
>> Commit 651df2183543 ("phy: micrel: Fix finding PHY properties in MAC
>>  node.") only fixes finding PHY properties in MAC node for KSZ9021. This
>> commit applies the same fix for KSZ9031.
>> 
>> Signed-off-by: Henri Roosen 
> 
> Fixes: 8b63ec1837fa ("phylib: Make PHYs children of their MDIO bus, not the 
> bus' parent.")
> 
> Acked-by: Andrew Lunn 

This does not apply cleanly to any of my trees.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/4] soreuseport: define reuseport groups

2015-12-22 Thread kbuild test robot
Hi Craig,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Craig-Gallek/Faster-SO_REUSEPORT/20151223-040911
config: arm-mvebu_v7_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

Note: the linux-review/Craig-Gallek/Faster-SO_REUSEPORT/20151223-040911 HEAD 
660edb1fa7e1a2bf71ef9cbf4555cda4af95ea58 builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   In file included from include/linux/spinlock_types.h:13:0,
from include/net/sock_reuseport.h:4,
from net/core/sock_reuseport.c:7:
>> arch/arm/include/asm/spinlock_types.h:12:3: error: unknown type name 'u32'
  u32 slock;
  ^
>> arch/arm/include/asm/spinlock_types.h:18:4: error: unknown type name 'u16'
   u16 owner;
   ^
   arch/arm/include/asm/spinlock_types.h:19:4: error: unknown type name 'u16'
   u16 next;
   ^
   arch/arm/include/asm/spinlock_types.h:28:2: error: unknown type name 'u32'
 u32 lock;
 ^

vim +/u32 +12 arch/arm/include/asm/spinlock_types.h

fb1c8f93 include/asm-arm/spinlock_types.h  Ingo Molnar 2005-09-10   6  
#endif
fb1c8f93 include/asm-arm/spinlock_types.h  Ingo Molnar 2005-09-10   7  
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06   8  
#define TICKET_SHIFT 16
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06   9  
fb1c8f93 include/asm-arm/spinlock_types.h  Ingo Molnar 2005-09-10  10  
typedef struct {
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  11   
union {
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 @12   
u32 slock;
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  13   
struct __raw_tickets {
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  14  
#ifdef __ARMEB__
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  15   
u16 next;
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  16   
u16 owner;
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  17  #else
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 @18   
u16 owner;
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  19   
u16 next;
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  20  
#endif
546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06  21   
} tickets;

:: The code at line 12 was first introduced by commit
:: 546c2896a42202dbc7d02f7c6ec9948ac1bf511b ARM: 7446/1: spinlock: use 
ticket algorithm for ARMv6+ locking implementation

:: TO: Will Deacon 
:: CC: Russell King 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH net-next 1/4] soreuseport: define reuseport groups

2015-12-22 Thread David Miller
From: Craig Gallek 
Date: Tue, 22 Dec 2015 16:58:11 -0500

> On Tue, Dec 22, 2015 at 4:40 PM, David Miller  wrote:
>> From: Craig Gallek 
>> Date: Tue, 22 Dec 2015 15:05:07 -0500
>>
>>> + for (i = 0; i < reuse->num_socks; i++) {
>>> + if (reuse->socks[i] == sk) {
>>> + reuse->socks[i] = reuse->socks[reuse->num_socks - 1];
>>> + reuse->num_socks--;
>>> + if (reuse->num_socks == 0)
>>> + kfree_rcu(reuse, rcu);
>>> + break;
>>> + }
>>> + }
>>
>> Don't you need to memmove() the entire rest of the array down one slot
>> when you hit the matching 'sk' in there?  I can't see how it can work
>> to only move one entry down.
> It moves the last element in the list into the slot that was just
> emptied.  You could argue that this may cause unexpected changes in
> the index -> socket mapping observed by the user, but I'm not sure
> making many socket indexes change (by sliding everything down one)
> when one is removed is a desirable behavior either.  I don't have a
> strong opinion either way though...

Thanks for explaining, I misered the code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 -next 0/3] tcp: honour SO_BINDTODEVICE for TW_RST case too

2015-12-22 Thread David Miller
From: Florian Westphal 
Date: Mon, 21 Dec 2015 21:29:23 +0100

> This is V2, this time as a small series since I followed Erics advice
> to split this into smaller chunks, I hope this makes it easier to
> review.
> 
> First patch adds inet_sk_transparent helper.
> Second patch contains an if/else swap that I split from the
> original TW_RST v1 one.
> Third patch is the actual change without the superfluous sock_net change.

Looks good, series applied, thanks Florian!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] addrconf: always initialize sysctl table data

2015-12-22 Thread David Miller
From: Cong Wang 
Date: Mon, 21 Dec 2015 10:55:45 -0800

> When sysctl performs restrict writes, it allows to write from
> a middle position of a sysctl file, which requires us to initialize
> the table data before calling proc_dostring() for the write case.
> 
> Fixes: 3d1bec99320d ("ipv6: introduce secret_stable to ipv6_devconf")
> Reported-by: Sasha Levin 
> Acked-by: Hannes Frederic Sowa 
> Tested-by: Sasha Levin 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/4] soreuseport: define reuseport groups

2015-12-22 Thread Craig Gallek
On Tue, Dec 22, 2015 at 4:40 PM, David Miller  wrote:
> From: Craig Gallek 
> Date: Tue, 22 Dec 2015 15:05:07 -0500
>
>> + for (i = 0; i < reuse->num_socks; i++) {
>> + if (reuse->socks[i] == sk) {
>> + reuse->socks[i] = reuse->socks[reuse->num_socks - 1];
>> + reuse->num_socks--;
>> + if (reuse->num_socks == 0)
>> + kfree_rcu(reuse, rcu);
>> + break;
>> + }
>> + }
>
> Don't you need to memmove() the entire rest of the array down one slot
> when you hit the matching 'sk' in there?  I can't see how it can work
> to only move one entry down.
It moves the last element in the list into the slot that was just
emptied.  You could argue that this may cause unexpected changes in
the index -> socket mapping observed by the user, but I'm not sure
making many socket indexes change (by sliding everything down one)
when one is removed is a desirable behavior either.  I don't have a
strong opinion either way though...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread Dave Jones
On Tue, Dec 22, 2015 at 04:50:20PM -0500, David Miller wrote:
 
 > >  > > Simple fix is below.  Though, I don't understand the history of the
 > >  > > multiple locks in this structure to be sure it's correct.  I'll send
 > >  > > it as a formal patch.  Please reject if it's not the right approach.
 > >  > > 
 > >  > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c
 > >  > > index 1c149e9..cc80870 100644
 > >  > > --- a/lib/rhashtable.c
 > >  > > +++ b/lib/rhashtable.c
 > >  > > @@ -516,7 +516,8 @@ int rhashtable_walk_init(struct rhashtable *ht,
 > >  > > struct rhashtable_iter *iter)
 > >  > > return -ENOMEM;
 > >  > > 
 > >  > > spin_lock(&ht->lock);
 > >  > > -   iter->walker->tbl = rht_dereference(ht->tbl, ht);
 > >  > > +   iter->walker->tbl =
 > >  > > +   rcu_dereference_protected(ht->tbl, 
 > > lockdep_is_held(&ht->lock));
 > >  > > list_add(&iter->walker->list, &iter->walker->tbl->walkers);
 > >  > > spin_unlock(&ht->lock);
 > >  > 
 > >  > How can this be the "fix"?  That's exactly what's in the tree.
 > > 
 > > I should have made clear, this is Linus' tree I'm hitting this on,
 > > which matches what Craig posted.
 > 
 > Ok, so this should be fixed in my 'net' tree and I'll send that to Linus
 > soon.

Great, thanks Dave.  Sorry for the fire-alarm :)

Dave
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] net-next: Add tcindex to conntrack and add netfilter target/matches

2015-12-22 Thread Luuk Paulussen
This patch implements support for setting/matching the skb->tc_index
field from Xtables, as well as allowing it to be saved/restored using
connection tracking.

This provides 16 extra bits of mark space that can be saved/restored from
the connection (for performance benefits) when the marking is being done
for tc purposes.

Currently the tc_index field can be set by a number of ingress schedulers,
but if these are not being used, then there is no reason why this field
couldn't also be marked from netfilter.

Once the tc_index field has been set, it can be matched with the existing
tcindex filter in the scheduling code.

Benefits:
1. Marking for tc purposes can be done in this field, alleviating space
   restrictions in generic packet mark.
2. Doesn't increase sk_buff size.
3. tc_index can be saved/restored from connection so that if a flow has
   already been classified, it doesn't have to be done again.
4. save/restore can be done with a mark so that separate marking can be
   done for the two directions of the flow.

Reviewed-by: Matt Bennett 
Reviewed-by: Kyeong Yoo 
Signed-off-by: Luuk Paulussen 
---
 include/net/netfilter/nf_conntrack.h   |   6 +-
 include/uapi/linux/netfilter/Kbuild|   4 +
 include/uapi/linux/netfilter/nf_conntrack_common.h |   1 +
 include/uapi/linux/netfilter/nfnetlink_conntrack.h |   1 +
 include/uapi/linux/netfilter/xt_CONNTCINDEX.h  |   6 +
 include/uapi/linux/netfilter/xt_TCINDEX.h  |   6 +
 include/uapi/linux/netfilter/xt_conntcindex.h  |  31 
 include/uapi/linux/netfilter/xt_tcindex.h  |  15 ++
 net/netfilter/Kconfig  |  30 
 net/netfilter/Makefile |   2 +
 net/netfilter/nf_conntrack_netlink.c   |  38 -
 net/netfilter/xt_conntcindex.c | 165 +
 net/netfilter/xt_tcindex.c |  84 +++
 13 files changed, 385 insertions(+), 4 deletions(-)
 create mode 100644 include/uapi/linux/netfilter/xt_CONNTCINDEX.h
 create mode 100644 include/uapi/linux/netfilter/xt_TCINDEX.h
 create mode 100644 include/uapi/linux/netfilter/xt_conntcindex.h
 create mode 100644 include/uapi/linux/netfilter/xt_tcindex.h
 create mode 100644 net/netfilter/xt_conntcindex.c
 create mode 100644 net/netfilter/xt_tcindex.c

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index fde4068..9b0ab48 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -105,7 +105,11 @@ struct nf_conn {
 
 #if defined(CONFIG_NF_CONNTRACK_MARK)
u_int32_t mark;
-#endif
+
+#ifdef CONFIG_NET_SCHED
+   u_int16_t tc_index;
+#endif /* CONFIG_NET_SCHED */
+#endif /* CONFIG_NF_CONNTRACK_MARK */
 
 #ifdef CONFIG_NF_CONNTRACK_SECMARK
u_int32_t secmark;
diff --git a/include/uapi/linux/netfilter/Kbuild 
b/include/uapi/linux/netfilter/Kbuild
index 1d973d2..fedaaab 100644
--- a/include/uapi/linux/netfilter/Kbuild
+++ b/include/uapi/linux/netfilter/Kbuild
@@ -22,6 +22,7 @@ header-y += xt_CHECKSUM.h
 header-y += xt_CLASSIFY.h
 header-y += xt_CONNMARK.h
 header-y += xt_CONNSECMARK.h
+header-y += xt_CONNTCINDEX.h
 header-y += xt_CT.h
 header-y += xt_DSCP.h
 header-y += xt_HMARK.h
@@ -33,6 +34,7 @@ header-y += xt_NFLOG.h
 header-y += xt_NFQUEUE.h
 header-y += xt_RATEEST.h
 header-y += xt_SECMARK.h
+header-y += xt_TCINDEX.h
 header-y += xt_TCPMSS.h
 header-y += xt_TCPOPTSTRIP.h
 header-y += xt_TEE.h
@@ -46,6 +48,7 @@ header-y += xt_connbytes.h
 header-y += xt_connlabel.h
 header-y += xt_connlimit.h
 header-y += xt_connmark.h
+header-y += xt_conntcindex.h
 header-y += xt_conntrack.h
 header-y += xt_cpu.h
 header-y += xt_dccp.h
@@ -81,6 +84,7 @@ header-y += xt_socket.h
 header-y += xt_state.h
 header-y += xt_statistic.h
 header-y += xt_string.h
+header-y += xt_tcindex.h
 header-y += xt_tcpmss.h
 header-y += xt_tcpudp.h
 header-y += xt_time.h
diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h 
b/include/uapi/linux/netfilter/nf_conntrack_common.h
index 319f471..b211bb8 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
@@ -107,6 +107,7 @@ enum ip_conntrack_events {
IPCT_NATSEQADJ = IPCT_SEQADJ,
IPCT_SECMARK,   /* new security mark has been set */
IPCT_LABEL, /* new connlabel has been set */
+   IPCT_TCINDEX,   /* new tc_index has been set */
 };
 
 enum ip_conntrack_expect_events {
diff --git a/include/uapi/linux/netfilter/nfnetlink_conntrack.h 
b/include/uapi/linux/netfilter/nfnetlink_conntrack.h
index c1a4e144..cfdd15f 100644
--- a/include/uapi/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/uapi/linux/netfilter/nfnetlink_conntrack.h
@@ -53,6 +53,7 @@ enum ctattr_type {
CTA_MARK_MASK,
CTA_LABELS,
CTA_LABELS_MASK,
+   CTA_TC_INDEX,
__CTA_MAX
 };
 #define CTA_MAX (__CTA_MAX - 1)
diff --git

[no subject]

2015-12-22 Thread Luuk Paulussen

Sorry for the resend.  I forgot to include relevant netfilter maintainers
in CC list

Changes from v1 are to add dependency on NF_CONNTRACK to Kconfig to resolve
the build issue and some style fixups from checkpatch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread David Miller
From: Dave Jones 
Date: Tue, 22 Dec 2015 16:47:34 -0500

> On Tue, Dec 22, 2015 at 04:42:25PM -0500, David Miller wrote:
>  > From: Craig Gallek 
>  > Date: Tue, 22 Dec 2015 16:38:32 -0500
>  > 
>  > > On Tue, Dec 22, 2015 at 4:28 PM, David Miller  
> wrote:
>  > >> From: Craig Gallek 
>  > >> Date: Tue, 22 Dec 2015 15:51:19 -0500
>  > >>
>  > >>> I was actually just looking at this as well (though a slightly
>  > >>> different stack).  The issue is with: c6ff5268293e rhashtable: Fix
>  > >>> walker list corruption
>  > >>>
>  > >>> It changed the lock acquired in rhashtable_walk_init to use the new
>  > >>> spinlock, but the rht_dereference macro expects the mutex.  I was
>  > >>> still trying to track down which repository this change came in
>  > >>> through, though...
>  > >>
>  > >> Both cam via my networking tree.
>  > > Simple fix is below.  Though, I don't understand the history of the
>  > > multiple locks in this structure to be sure it's correct.  I'll send
>  > > it as a formal patch.  Please reject if it's not the right approach.
>  > > 
>  > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c
>  > > index 1c149e9..cc80870 100644
>  > > --- a/lib/rhashtable.c
>  > > +++ b/lib/rhashtable.c
>  > > @@ -516,7 +516,8 @@ int rhashtable_walk_init(struct rhashtable *ht,
>  > > struct rhashtable_iter *iter)
>  > > return -ENOMEM;
>  > > 
>  > > spin_lock(&ht->lock);
>  > > -   iter->walker->tbl = rht_dereference(ht->tbl, ht);
>  > > +   iter->walker->tbl =
>  > > +   rcu_dereference_protected(ht->tbl, 
> lockdep_is_held(&ht->lock));
>  > > list_add(&iter->walker->list, &iter->walker->tbl->walkers);
>  > > spin_unlock(&ht->lock);
>  > 
>  > How can this be the "fix"?  That's exactly what's in the tree.
> 
> I should have made clear, this is Linus' tree I'm hitting this on,
> which matches what Craig posted.

Ok, so this should be fixed in my 'net' tree and I'll send that to Linus
soon.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6 route to gateway on fe80::1%eth0 when I have fe80::1%br0 locally

2015-12-22 Thread Hannes Frederic Sowa
Hi Marc,

On 22.12.2015 22:28, Marc Haber wrote:
> Hi Hannes,
> 
> thanks for your mail.
> 
> On Tue, Dec 22, 2015 at 04:15:14PM +0100, Hannes Frederic Sowa wrote:
>> On 12.12.2015 20:58, Marc Haber wrote:
>>> Any hints would be appreciated.
>>
>> This sysctl should help:
>>
>> accept_ra_from_local - BOOLEAN
>> Accept RA with source-address that is found on local machine
>> if the RA is otherwise proper and able to be accepted.
>> Default is to NOT accept these as it may be an un-intended
>> network loop.
>>
>> Functional default:
>>enabled if accept_ra_from_local is enabled
>>on a specific interface.
>>disabled if accept_ra_from_local is disabled
>>on a specific interface.
>>
>> Anyway, this has to be fixed up in a clean way and should work by default.
> 
> The clean way would be:
> 
> accept_ra_from_local=0: never accept RA with source-address that is
>   found on local machine
> accept_ra_from_local=1: always accept RA with source-address that is
>   found on local machine. Dangerous.
> accept_ra_from_local=2: only accept RA with link local source-address
>   that is found on local machine, and not if received RA points to an
>   address that is locally configured on the same interface. Default.
> 
> Shall I file a bug for this in bugzilla?

Thanks but no need to do that, I already cooked a patch and will submit
tomorrow after some testing. We don't need to enhance the sysctl,
default should be to simply check the interface too if a route with
link-local address is received.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 16/17] calipso: Add validation of CALIPSO option.

2015-12-22 Thread Hannes Frederic Sowa
On 22.12.2015 17:59, Huw Davies wrote:
> On Tue, Dec 22, 2015 at 02:50:20PM +0100, Hannes Frederic Sowa wrote:
>> On 22.12.2015 12:46, Huw Davies wrote:
>>>  
>>> +/* CALIPSO RFC 5570 */
>>> +
>>> +static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>>> +{
>>> +   const unsigned char *nh = skb_network_header(skb);
>>> +
>>> +   if (nh[optoff + 1] < 8)
>>> +   goto drop;
>>> +
>>> +   if (nh[optoff + 6] * 4 + 8 > nh[optoff + 1])
>>> +   goto drop;
>>> +
>>> +   if (!calipso_validate(skb, nh + optoff))
>>> +   goto drop;
>>> +
>>> +   return true;
>>> +
>>> +drop:
>>> +   kfree_skb(skb);
>>> +   return false;
>>> +}
>>> +
>>
>> Formally, if an extension header could not be processed, the packet
>> should be discarded and an icmp error parameter extension should be
>> send. I think we shouldn't let those packets pass here.
> 
> Thanks for your comments Hannes, I'm looking into your other
> suggestions.
> 
> I'm confused about this one.  AFAICS, this will drop packets that we
> can't process.  We don't send the icmp error, but I can certainly add
> that.  Is that what you mean?

Actually, the implementation of calipso_validate will accept the packets
because it defaults to return true if we don't compile the module. At
least we should drop the packet if it is not loaded. I am in favor of
adding the parameter problem icmp error. So, yes, I think it should be
added.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread Dave Jones
On Tue, Dec 22, 2015 at 04:42:25PM -0500, David Miller wrote:
 > From: Craig Gallek 
 > Date: Tue, 22 Dec 2015 16:38:32 -0500
 > 
 > > On Tue, Dec 22, 2015 at 4:28 PM, David Miller  wrote:
 > >> From: Craig Gallek 
 > >> Date: Tue, 22 Dec 2015 15:51:19 -0500
 > >>
 > >>> I was actually just looking at this as well (though a slightly
 > >>> different stack).  The issue is with: c6ff5268293e rhashtable: Fix
 > >>> walker list corruption
 > >>>
 > >>> It changed the lock acquired in rhashtable_walk_init to use the new
 > >>> spinlock, but the rht_dereference macro expects the mutex.  I was
 > >>> still trying to track down which repository this change came in
 > >>> through, though...
 > >>
 > >> Both cam via my networking tree.
 > > Simple fix is below.  Though, I don't understand the history of the
 > > multiple locks in this structure to be sure it's correct.  I'll send
 > > it as a formal patch.  Please reject if it's not the right approach.
 > > 
 > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c
 > > index 1c149e9..cc80870 100644
 > > --- a/lib/rhashtable.c
 > > +++ b/lib/rhashtable.c
 > > @@ -516,7 +516,8 @@ int rhashtable_walk_init(struct rhashtable *ht,
 > > struct rhashtable_iter *iter)
 > > return -ENOMEM;
 > > 
 > > spin_lock(&ht->lock);
 > > -   iter->walker->tbl = rht_dereference(ht->tbl, ht);
 > > +   iter->walker->tbl =
 > > +   rcu_dereference_protected(ht->tbl, 
 > > lockdep_is_held(&ht->lock));
 > > list_add(&iter->walker->list, &iter->walker->tbl->walkers);
 > > spin_unlock(&ht->lock);
 > 
 > How can this be the "fix"?  That's exactly what's in the tree.

I should have made clear, this is Linus' tree I'm hitting this on,
which matches what Craig posted.

Dave
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread Craig Gallek
On Tue, Dec 22, 2015 at 4:42 PM, David Miller  wrote:
> From: Craig Gallek 
> Date: Tue, 22 Dec 2015 16:38:32 -0500
>
>> On Tue, Dec 22, 2015 at 4:28 PM, David Miller  wrote:
>>> From: Craig Gallek 
>>> Date: Tue, 22 Dec 2015 15:51:19 -0500
>>>
 I was actually just looking at this as well (though a slightly
 different stack).  The issue is with: c6ff5268293e rhashtable: Fix
 walker list corruption

 It changed the lock acquired in rhashtable_walk_init to use the new
 spinlock, but the rht_dereference macro expects the mutex.  I was
 still trying to track down which repository this change came in
 through, though...
>>>
>>> Both cam via my networking tree.
>> Simple fix is below.  Though, I don't understand the history of the
>> multiple locks in this structure to be sure it's correct.  I'll send
>> it as a formal patch.  Please reject if it's not the right approach.
>>
>> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
>> index 1c149e9..cc80870 100644
>> --- a/lib/rhashtable.c
>> +++ b/lib/rhashtable.c
>> @@ -516,7 +516,8 @@ int rhashtable_walk_init(struct rhashtable *ht,
>> struct rhashtable_iter *iter)
>> return -ENOMEM;
>>
>> spin_lock(&ht->lock);
>> -   iter->walker->tbl = rht_dereference(ht->tbl, ht);
>> +   iter->walker->tbl =
>> +   rcu_dereference_protected(ht->tbl, 
>> lockdep_is_held(&ht->lock));
>> list_add(&iter->walker->list, &iter->walker->tbl->walkers);
>> spin_unlock(&ht->lock);
>
> How can this be the "fix"?  That's exactly what's in the tree.
Ah, you're right, this fix was submitted to next in 179ccc0a7364 but
hasn't made it into net-next yet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 2/3] tcp: send_reset: test for non-NULL sk first

2015-12-22 Thread Hannes Frederic Sowa
On 21.12.2015 21:29, Florian Westphal wrote:
> tcp_md5_do_lookup requires a full socket, so once we extend
> _send_reset() to also accept timewait socket we would have to change
> 
> if (!sk && hash_location)
> 
> to something like
> 
> if ((!sk || !sk_fullsock(sk)) && hash_location) {
>   ...
> } else {
>   (sk && sk_fullsock(sk)) tcp_md5_do_lookup()
> }
> 
> Switch the two branches: check if we have a socket first, then
> fall back to a listener lookup if we saw a md5 option (hash_location).
> 
> Signed-off-by: Florian Westphal 

Acked-by: Hannes Frederic Sowa 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 1/3] net: add inet_sk_transparent() helper

2015-12-22 Thread Hannes Frederic Sowa
On 21.12.2015 21:29, Florian Westphal wrote:
> Avoids cluttering tcp_v4_send_reset when followup patch extends
> it to deal with timewait sockets.
> 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Florian Westphal 

Acked-by: Hannes Frederic Sowa 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 -next 3/3] tcp: honour SO_BINDTODEVICE for TW_RST case too

2015-12-22 Thread Hannes Frederic Sowa
On 21.12.2015 21:29, Florian Westphal wrote:
> Hannes points out that when we generate tcp reset for timewait sockets we
> pretend we found no socket and pass NULL sk to tcp_vX_send_reset().
> 
> Make it cope with inet tw sockets and then provide tw sk.
> 
> This makes RSTs appear on correct interface when SO_BINDTODEVICE is used.
> 
> Packetdrill test case:
> // want default route to be used, we rely on BINDTODEVICE
> `ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0`
> 
> 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> // test case still works due to BINDTODEVICE
> 0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0
> 0.100...0.200 connect(3, ..., ...) = 0
> 
> 0.100 > S 0:0(0) 
> 0.200 < S. 0:0(0) ack 1 win 32792 
> 0.200 > . 1:1(0) ack 1
> 
> 0.210 close(3) = 0
> 
> 0.210 > F. 1:1(0) ack 1 win 29200
> 0.300 < . 1:1(0) ack 2 win 46
> 
> // more data while in FIN_WAIT2, expect RST
> 1.300 < P. 1:1001(1000) ack 1 win 46
> 
> // fails without this change -- default route is used
> 1.301 > R 1:1(0) win 0
> 
> Reported-by: Hannes Frederic Sowa 
> Signed-off-by: Florian Westphal 

Acked-by: Hannes Frederic Sowa 

Tested and works fine, thanks Florian and Eric!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread David Miller
From: Craig Gallek 
Date: Tue, 22 Dec 2015 16:38:32 -0500

> On Tue, Dec 22, 2015 at 4:28 PM, David Miller  wrote:
>> From: Craig Gallek 
>> Date: Tue, 22 Dec 2015 15:51:19 -0500
>>
>>> I was actually just looking at this as well (though a slightly
>>> different stack).  The issue is with: c6ff5268293e rhashtable: Fix
>>> walker list corruption
>>>
>>> It changed the lock acquired in rhashtable_walk_init to use the new
>>> spinlock, but the rht_dereference macro expects the mutex.  I was
>>> still trying to track down which repository this change came in
>>> through, though...
>>
>> Both cam via my networking tree.
> Simple fix is below.  Though, I don't understand the history of the
> multiple locks in this structure to be sure it's correct.  I'll send
> it as a formal patch.  Please reject if it's not the right approach.
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 1c149e9..cc80870 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -516,7 +516,8 @@ int rhashtable_walk_init(struct rhashtable *ht,
> struct rhashtable_iter *iter)
> return -ENOMEM;
> 
> spin_lock(&ht->lock);
> -   iter->walker->tbl = rht_dereference(ht->tbl, ht);
> +   iter->walker->tbl =
> +   rcu_dereference_protected(ht->tbl, 
> lockdep_is_held(&ht->lock));
> list_add(&iter->walker->list, &iter->walker->tbl->walkers);
> spin_unlock(&ht->lock);

How can this be the "fix"?  That's exactly what's in the tree.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/4] soreuseport: define reuseport groups

2015-12-22 Thread David Miller
From: Craig Gallek 
Date: Tue, 22 Dec 2015 15:05:07 -0500

> + for (i = 0; i < reuse->num_socks; i++) {
> + if (reuse->socks[i] == sk) {
> + reuse->socks[i] = reuse->socks[reuse->num_socks - 1];
> + reuse->num_socks--;
> + if (reuse->num_socks == 0)
> + kfree_rcu(reuse, rcu);
> + break;
> + }
> + }

Don't you need to memmove() the entire rest of the array down one slot
when you hit the matching 'sk' in there?  I can't see how it can work
to only move one entry down.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread Craig Gallek
On Tue, Dec 22, 2015 at 4:28 PM, David Miller  wrote:
> From: Craig Gallek 
> Date: Tue, 22 Dec 2015 15:51:19 -0500
>
>> I was actually just looking at this as well (though a slightly
>> different stack).  The issue is with: c6ff5268293e rhashtable: Fix
>> walker list corruption
>>
>> It changed the lock acquired in rhashtable_walk_init to use the new
>> spinlock, but the rht_dereference macro expects the mutex.  I was
>> still trying to track down which repository this change came in
>> through, though...
>
> Both cam via my networking tree.
Simple fix is below.  Though, I don't understand the history of the
multiple locks in this structure to be sure it's correct.  I'll send
it as a formal patch.  Please reject if it's not the right approach.

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 1c149e9..cc80870 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -516,7 +516,8 @@ int rhashtable_walk_init(struct rhashtable *ht,
struct rhashtable_iter *iter)
return -ENOMEM;

spin_lock(&ht->lock);
-   iter->walker->tbl = rht_dereference(ht->tbl, ht);
+   iter->walker->tbl =
+   rcu_dereference_protected(ht->tbl, lockdep_is_held(&ht->lock));
list_add(&iter->walker->list, &iter->walker->tbl->walkers);
spin_unlock(&ht->lock);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread David Miller
From: Craig Gallek 
Date: Tue, 22 Dec 2015 15:51:19 -0500

> I was actually just looking at this as well (though a slightly
> different stack).  The issue is with: c6ff5268293e rhashtable: Fix
> walker list corruption
> 
> It changed the lock acquired in rhashtable_walk_init to use the new
> spinlock, but the rht_dereference macro expects the mutex.  I was
> still trying to track down which repository this change came in
> through, though...

Both cam via my networking tree.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6 route to gateway on fe80::1%eth0 when I have fe80::1%br0 locally

2015-12-22 Thread Marc Haber
Hi Hannes,

thanks for your mail.

On Tue, Dec 22, 2015 at 04:15:14PM +0100, Hannes Frederic Sowa wrote:
> On 12.12.2015 20:58, Marc Haber wrote:
> > Any hints would be appreciated.
> 
> This sysctl should help:
> 
> accept_ra_from_local - BOOLEAN
> Accept RA with source-address that is found on local machine
> if the RA is otherwise proper and able to be accepted.
> Default is to NOT accept these as it may be an un-intended
> network loop.
> 
> Functional default:
>enabled if accept_ra_from_local is enabled
>on a specific interface.
>disabled if accept_ra_from_local is disabled
>on a specific interface.
> 
> Anyway, this has to be fixed up in a clean way and should work by default.

The clean way would be:

accept_ra_from_local=0: never accept RA with source-address that is
  found on local machine
accept_ra_from_local=1: always accept RA with source-address that is
  found on local machine. Dangerous.
accept_ra_from_local=2: only accept RA with link local source-address
  that is found on local machine, and not if received RA points to an
  address that is locally configured on the same interface. Default.

Shall I file a bug for this in bugzilla?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull request (net): ipsec 2015-12-22

2015-12-22 Thread David Miller
From: Steffen Klassert 
Date: Tue, 22 Dec 2015 10:35:18 +0100

> Just one patch to fix dst_entries_init with multiple namespaces.
> From Dan Streetman.
> 
> Please pull or let me know if there are problems.

Pulled, thanks Steffen.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v2] mlxsw: core: Use devm_kzalloc to allocate mlxsw_hwmon structure

2015-12-22 Thread David Miller
From: Jiri Pirko 
Date: Tue, 22 Dec 2015 09:43:07 +0100

> From: Jiri Pirko 
> 
> KASan reported use-after-free for the hwmon structure. So fix this by
> using devm_kzalloc and let the core take care about freeing the memory
> during device dettach.
> 
> Reported-by: Ido Schimmel 
> Fixes: 89309da39 ("mlxsw: core: Implement temperature hwmon interface")
> Signed-off-by: Jiri Pirko 

Applied, thanks Jiri.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net: user-controllable kmalloc size in __sctp_setsockopt_connectx

2015-12-22 Thread Marcelo Ricardo Leitner
Hi,

On Tue, Dec 22, 2015 at 09:13:54PM +0100, Dmitry Vyukov wrote:
> Hello,
...
> 
>  [] __sctp_setsockopt_connectx+0xc6/0x150
> net/sctp/socket.c:1318
>  [< inline >] sctp_getsockopt_connectx3 net/sctp/socket.c:1410
>  [] sctp_getsockopt+0x25ee/0x3e00 net/sctp/socket.c:6007
>  [] sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2601
>  [< inline >] SYSC_getsockopt net/socket.c:1782
>  [] SyS_getsockopt+0x142/0x230 net/socket.c:1764
>  [] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185

This is similar to that other one. I'll send a patch for it tomorrow.

Thanks,
Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: tcp: deal with listen sockets properly in tcp_abort.

2015-12-22 Thread David Miller
From: Lorenzo Colitti 
Date: Tue, 22 Dec 2015 00:03:44 +0900

> When closing a listen socket, tcp_abort currently calls
> tcp_done without clearing the request queue. If the socket has a
> child socket that is established but not yet accepted, the child
> socket is then left without a parent, causing a leak.
> 
> Fix this by setting the socket state to TCP_CLOSE and calling
> inet_csk_listen_stop with the socket lock held, like tcp_close
> does.
> 
> Tested using net_test. With this patch, calling SOCK_DESTROY on a
> listen socket that has an established but not yet accepted child
> socket results in the parent and the child being closed, such
> that they no longer appear in sock_diag dumps.
> 
> Reported-by: Eric Dumazet 
> Signed-off-by: Lorenzo Colitti 

Applied to net-next, which I assume is the intended target tree for
this patch.

Please make that explicit, always, in future submissions.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next] mlxsw: core: Allow to reset temperature history via hwmon interface

2015-12-22 Thread David Miller
From: Jiri Pirko 
Date: Mon, 21 Dec 2015 11:14:21 +0100

> From: Jiri Pirko 
> 
> Add another sysfs hwmon attribute to expose possibility to reset
> temperature sensors history.
> 
> Signed-off-by: Jiri Pirko 

Applied, thanks Jiri.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ipv6/addrlabel: fix ip6addrlbl_get()

2015-12-22 Thread David Miller
From: Andrey Ryabinin 
Date: Mon, 21 Dec 2015 12:54:45 +0300

> ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
> ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
> ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
> 
> Fix this by inverting ip6addrlbl_hold() check.
> 
> Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address 
> selection policy table.")
> Signed-off-by: Andrey Ryabinin 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net] switchdev: bridge: Pass ageing time as clock_t instead of jiffies

2015-12-22 Thread David Miller
From: Jiri Pirko 
Date: Mon, 21 Dec 2015 09:56:01 +0100

> From: Ido Schimmel 
> 
> The bridge's ageing time is offloaded to hardware when:
>   1) A port joins a bridge
>   2) The ageing time of the bridge is changed
> 
> In the first case the ageing time is offloaded as jiffies, but in the
> second case it's offloaded as clock_t, which is what existing switchdev
> drivers expect to receive.
> 
> Fixes: 6ac311ae8bfb ("Adding switchdev ageing notification on port bridged")
> Signed-off-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 

Applied and queued up for -stable, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread Craig Gallek
On Tue, Dec 22, 2015 at 3:45 PM, Dave Jones  wrote:
> ===
> [ INFO: suspicious RCU usage. ]
> 4.4.0-rc6-think+ #1 Not tainted
> ---
> lib/rhashtable.c:522 suspicious rcu_dereference_protected() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 0
> 2 locks held by trinity-c1/3652:
>  #0:  (&p->lock){+.+.+.}, at: [] seq_read+0xd7/0x900
>  #1:  (&(&ht->lock)->rlock){+.+...}, at: [] 
> rhashtable_walk_init+0x9d/0x170
>
> stack backtrace:
> CPU: 0 PID: 3652 Comm: trinity-c1 Not tainted 4.4.0-rc6-think+ #1
>  9af6ac60 3fc014d4 8800cff779e0 9a548da1
>  880459b8b700 8800cff77a10 9a131068 8800cdd32c48
>  880464af8000 8800cdd32c58 880464af8160 8800cff77a48
> Call Trace:
>  [] dump_stack+0x4e/0x7d
>  [] lockdep_rcu_suspicious+0xf8/0x110
>  [] rhashtable_walk_init+0x163/0x170
>  [] netlink_walk_start+0x49/0x90
>  [] netlink_seq_start+0x40/0x90
>  [] seq_read+0x1bf/0x900
>  [] ? seq_lseek+0x1b0/0x1b0
>  [] ? __might_fault+0xe0/0xf0
>  [] ? __might_fault+0x87/0xf0
>  [] ? rw_copy_check_uvector+0x139/0x170
>  [] proc_reg_read+0x7f/0xc0
>  [] do_loop_readv_writev+0xe0/0x110
>  [] ? proc_reg_write+0xc0/0xc0
>  [] do_readv_writev+0x38b/0x3c0
>  [] ? proc_reg_write+0xc0/0xc0
>  [] ? vfs_write+0x260/0x260
>  [] ? __lock_is_held+0x25/0xd0
>  [] ? mark_held_locks+0x23/0xc0
>  [] ? context_tracking_exit.part.5+0x2a/0x50
>  [] ? trace_hardirqs_on_caller+0x186/0x280
>  [] ? trace_hardirqs_on+0xd/0x10
>  [] vfs_readv+0x56/0x70
>  [] SyS_preadv+0x15d/0x180
>  [] ? SyS_writev+0x1a0/0x1a0
>  [] ? trace_hardirqs_on_thunk+0x17/0x19
>  [] entry_SYSCALL_64_fastpath+0x12/0x6b

I was actually just looking at this as well (though a slightly
different stack).  The issue is with: c6ff5268293e rhashtable: Fix
walker list corruption

It changed the lock acquired in rhashtable_walk_init to use the new
spinlock, but the rht_dereference macro expects the mutex.  I was
still trying to track down which repository this change came in
through, though...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread David Miller
From: Dave Jones 
Date: Tue, 22 Dec 2015 15:45:39 -0500

> ===
> [ INFO: suspicious RCU usage. ]
> 4.4.0-rc6-think+ #1 Not tainted
> ---
> lib/rhashtable.c:522 suspicious rcu_dereference_protected() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 0
> 2 locks held by trinity-c1/3652:
>  #0:  (&p->lock){+.+.+.}, at: [] seq_read+0xd7/0x900
>  #1:  (&(&ht->lock)->rlock){+.+...}, at: [] 
> rhashtable_walk_init+0x9d/0x170

I'm so confused, the code reads:

spin_lock(&ht->lock);
iter->walker->tbl =
rcu_dereference_protected(ht->tbl, lockdep_is_held(&ht->lock));

?!?!?!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


suspicious RCU usage (netlink/rhashtable)

2015-12-22 Thread Dave Jones
===
[ INFO: suspicious RCU usage. ]
4.4.0-rc6-think+ #1 Not tainted
---
lib/rhashtable.c:522 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
2 locks held by trinity-c1/3652:
 #0:  (&p->lock){+.+.+.}, at: [] seq_read+0xd7/0x900
 #1:  (&(&ht->lock)->rlock){+.+...}, at: [] 
rhashtable_walk_init+0x9d/0x170

stack backtrace:
CPU: 0 PID: 3652 Comm: trinity-c1 Not tainted 4.4.0-rc6-think+ #1
 9af6ac60 3fc014d4 8800cff779e0 9a548da1
 880459b8b700 8800cff77a10 9a131068 8800cdd32c48
 880464af8000 8800cdd32c58 880464af8160 8800cff77a48
Call Trace:
 [] dump_stack+0x4e/0x7d
 [] lockdep_rcu_suspicious+0xf8/0x110
 [] rhashtable_walk_init+0x163/0x170
 [] netlink_walk_start+0x49/0x90
 [] netlink_seq_start+0x40/0x90
 [] seq_read+0x1bf/0x900
 [] ? seq_lseek+0x1b0/0x1b0
 [] ? __might_fault+0xe0/0xf0
 [] ? __might_fault+0x87/0xf0
 [] ? rw_copy_check_uvector+0x139/0x170
 [] proc_reg_read+0x7f/0xc0
 [] do_loop_readv_writev+0xe0/0x110
 [] ? proc_reg_write+0xc0/0xc0
 [] do_readv_writev+0x38b/0x3c0
 [] ? proc_reg_write+0xc0/0xc0
 [] ? vfs_write+0x260/0x260
 [] ? __lock_is_held+0x25/0xd0
 [] ? mark_held_locks+0x23/0xc0
 [] ? context_tracking_exit.part.5+0x2a/0x50
 [] ? trace_hardirqs_on_caller+0x186/0x280
 [] ? trace_hardirqs_on+0xd/0x10
 [] vfs_readv+0x56/0x70
 [] SyS_preadv+0x15d/0x180
 [] ? SyS_writev+0x1a0/0x1a0
 [] ? trace_hardirqs_on_thunk+0x17/0x19
 [] entry_SYSCALL_64_fastpath+0x12/0x6b

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/5] cxgb4: add dcb info node in debugfs

2015-12-22 Thread David Miller
From: Or Gerlitz 
Date: Mon, 21 Dec 2015 09:33:22 +0200

> On Mon, Dec 21, 2015 at 9:16 AM, Hariprasad Shenai
>  wrote:
>> Add new /sys/kernel/debug/cxgb4/*/dcb_info node to dump out
>> various Data Center Bridging information.
> 
> why? what's wrong with using the lldp tool for that purpose?

Agreed, and I don't like your explanation.

Even if you are using firmware managed DCB, the lldp tool should be
usable for querying.

People need to stop putting so much crap into debugfs, it's a serious
pet peeve of mine.

Every piece of driver unique interface crap you put into debugfs is a
_HARDSHIP_ for the user.  Because they have to learn a unique way to
do X in every driver that tries to export the same kind of
functionality.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dccp: fix use-after-free after cloning struct dccp_sock

2015-12-22 Thread David Miller
From: Vegard Nossum 
Date: Sun, 20 Dec 2015 21:53:27 +0100

> @@ -115,6 +115,10 @@ struct sock *dccp_create_openreq_child(const struct sock 
> *sk,
>   newdp->dccps_isr = dreq->dreq_isr;
>   newdp->dccps_gsr = dreq->dreq_gsr;
>  
> + newdp->dccps_hc_rx_ackvec = NULL;
> + newdp->dccps_hc_rx_ccid = NULL;
> + newdp->dccps_hc_tx_ccid = NULL;

->dccps_hc_rx_ackvec is set to NULL several lines above this, so you don't
need to add that case here.

WRT the ccid pointers, I don't think we can just NULL them out.

If the parent socket has these CCID features enabled, we have to
clone them into the child somehow.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sh_eth: fix 16-bit descriptor field access endianness too

2015-12-22 Thread David Miller
From: Sergei Shtylyov 
Date: Sun, 20 Dec 2015 01:48:04 +0300

> Commit 1299653affa4 ("sh_eth: fix descriptor access endianness") only
> addressed the 32-bit buffer address field byte-swapping  but the driver
> still accesses 16-bit frame/buffer length descriptor fields without the
> necessary byte-swapping -- which should affect the big-endian kernels.
> In order to be able to use {cpu|edmac}_to_{edmac|cpu}(), we need to declare
> the RX/TX descriptor word 1 as a 32-bit field and use shifts/masking to
> access the 16-bit subfields (which gets rid of the ugly #ifdef'ery too)...
> 
> Signed-off-by: Sergei Shtylyov 

Applied, thanks Sergei.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDS: don't pretend to use cpu notifiers

2015-12-22 Thread David Miller
From: Santosh Shilimkar 
Date: Sat, 19 Dec 2015 12:55:43 -0800

> From: Sebastian Andrzej Siewior 
> 
> It looks like an attempt to use CPU notifier here which was never
> completed. Nobody tried to wire it up completely since 2k9. So I unwind
> this code and get rid of everything not required. Oh look! 19 lines were
> removed while code still does the same thing.
> 
> Acked-by: Santosh Shilimkar 
> Tested-by: Santosh Shilimkar 
> Signed-off-by: Sebastian Andrzej Siewior 

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2015-12-22 Thread David Miller
From: Vijay Pandurangan 
Date: Fri, 18 Dec 2015 14:34:59 -0500

> Packets that arrive from real hardware devices have ip_summed ==
> CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
> CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
> current version of veth will replace CHECKSUM_NONE with
> CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
> a veth device to be delivered to the application. This caused applications
> at Twitter to receive corrupt data when network hardware was corrupting
> packets.
> 
> We believe this was added as an optimization to skip computing and
> verifying checksums for communication between containers. However, locally
> generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
> written does nothing for them. As far as we can tell, after removing this
> code, these packets are transmitted from one stack to another unmodified
> (tcpdump shows invalid checksums on both sides, as expected), and they are
> delivered correctly to applications. We didn’t test every possible network
> configuration, but we tried a few common ones such as bridging containers,
> using NAT between the host and a container, and routing from hardware
> devices to containers. We have effectively deployed this in production at
> Twitter (by disabling RX checksum offloading on veth devices).
> 
> This code dates back to the first version of the driver, commit
>  ("[NET]: Virtual ethernet device driver"), so I
> suspect this bug occurred mostly because the driver API has evolved
> significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
> packet checksumming") (in December 2010) fixed this for packets that get
> created locally and sent to hardware devices, by not changing
> CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
> in from hardware devices.
> 
> Co-authored-by: Evan Jones 
> Signed-off-by: Evan Jones 
> Cc: Nicolas Dichtel 
> Cc: Phil Sutter 
> Cc: Toshiaki Makita 
> Cc: netdev@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Vijay Pandurangan 

Applied and queued up for -stable, thanks.
N‹§²ζμrΈ›yϊθšΨb²X¬ΆΗ§vΨ^–)ήΊ{.nΗ+‰·§zΧ^Ύ)ν…
ζθw*jg¬±¨Ά‰šŽŠέ’j/κδzΉή–Šΰ2Šή™¨θ­Ϊ&’)ί‘«aΆΪώψ�G«ιh�ζj:+v‰¨Šwθ†Ω₯

net: user-controllable kmalloc size in __sctp_setsockopt_connectx

2015-12-22 Thread Dmitry Vyukov
Hello,

The following program triggers WARNING in kmalloc:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 

int main()
{
long r0 = syscall(SYS_mmap, 0x2000ul, 0x4000ul, 0x3ul,
0x32ul, 0xul, 0x0ul);
long r1 = syscall(SYS_socket, 0x2ul, 0x80801ul, 0x84ul, 0, 0, 0);
*(uint32_t*)0x20002fb0 = (uint32_t)0x5fb;
*(uint32_t*)0x20002fb4 = (uint32_t)0x;
*(uint32_t*)0x20002fb8 = (uint32_t)0x0;
*(uint32_t*)0x20002fbc = (uint32_t)0x0;
*(uint32_t*)0x20002fc0 = (uint32_t)0x;
*(uint16_t*)0x20002fc4 = (uint16_t)0x7;
*(uint16_t*)0x20002fc6 = (uint16_t)0x8;
*(uint64_t*)0x20002fc8 = (uint64_t)0xa1d;
*(uint64_t*)0x20002fd0 = (uint64_t)0xd775;
*(uint64_t*)0x20002fd8 = (uint64_t)0x9;
*(uint64_t*)0x20002fe0 = (uint64_t)0x26;
*(uint64_t*)0x20002fe8 = (uint64_t)0x2;
*(uint64_t*)0x20002ff0 = (uint64_t)0x997;
*(uint32_t*)0x20002ff8 = (uint32_t)0x0;
*(uint32_t*)0x20002ffc = (uint32_t)0x;
long r17 = syscall(SYS_msgctl, 0xul, 0xbul,
0x20002fb0ul, 0, 0, 0);
memcpy((void*)0x20001000,
"\xc4\xcb\x30\xad\x58\x07\xa7\x93\x4f\xba\x75\x75\x33\x9a\x9b\x14\x36\x28\x6d\xc6\x57\x57\xc0\x17\x3b\x03\x6e\xe8\xbd\x31\x99\x17\x1b\x18\xcb\x05\x31\x3b\xc5\x39\xda\xdf\x1f\x9f\x1f\xd0\x1c\xd4\xce\x04\x1c\x00\xa0\x0b\xf8\x13\xd6\x93\xbd\x43\x33\xcb\x6d\x18\x8f\xab\x15\x59\x79\x63\x0d\x3d\x8d\x11\xd4\xd5\x07",
77);
long r19 = syscall(SYS_msgsnd, 0x0ul, 0x20001000ul, 0x800ul, 0, 0, 0);
long r20 = syscall(SYS_getsockopt, r1, 0x84ul, 0x6ful,
0x20002fcbul, 0x2ffful, 0);
return 0;
}

[ cut here ]
WARNING: CPU: 3 PID: 6724 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x771/0x15f0()
Modules linked in:
CPU: 3 PID: 6724 Comm: a.out Not tainted 4.4.0-rc6+ #173
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  880069a276d0 82899ffd 
 880064124680 85deedc0 880069a27710 812ebbb9
 815f8f81 85deedc0 0bad 880069a27998
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
 [] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
 [] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [< inline >] __alloc_pages_slowpath mm/page_alloc.c:2989
 [] __alloc_pages_nodemask+0x771/0x15f0 mm/page_alloc.c:3235
 [] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
 [< inline >] alloc_pages include/linux/gfp.h:451
 [] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
 [] kmalloc_order+0x1f/0x80 mm/slab_common.c:1007
 [] kmalloc_order_trace+0x1f/0x140 mm/slab_common.c:1018
 [< inline >] kmalloc_large include/linux/slab.h:390
 [] __kmalloc+0x2de/0x330 mm/slub.c:3555
 [< inline >] kmalloc include/linux/slab.h:463
 [] __sctp_setsockopt_connectx+0xc6/0x150
net/sctp/socket.c:1318
 [< inline >] sctp_getsockopt_connectx3 net/sctp/socket.c:1410
 [] sctp_getsockopt+0x25ee/0x3e00 net/sctp/socket.c:6007
 [] sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2601
 [< inline >] SYSC_getsockopt net/socket.c:1782
 [] SyS_getsockopt+0x142/0x230 net/socket.c:1764
 [] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
---[ end trace 142fd9e8ed8bda1f ]---

On commit 4ef7675344d687a0ef5b0d7c0cee12da005870c0 (Dec 20).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] tcp/ipv4: kernel panic because of (possible) division by zero

2015-12-22 Thread Oleksandr Natalenko
That is correct, I have net.ipv4.tcp_ecn set to 1.

I've recompiled the kernel with proposed patch, now still waiting for issue to 
be triggered.

Could I manually simulate the erroneous TCP ECN behavior to speed up the 
debugging?

On понеділок, 21 грудня 2015 р. 18:10:32 EET Yuchung Cheng wrote:
> On Mon, Dec 21, 2015 at 12:25 PM, Oleksandr Natalenko
> 
>  wrote:
> > Commit 3759824da87b30ce7a35b4873b62b0ba38905ef5 (tcp: PRR uses CRB mode by
> > default and SS mode conditionally) introduced changes to
> > net/ipv4/tcp_input.c tcp_cwnd_reduction() that, possibly, cause division
> > by zero, and therefore, kernel panic in interrupt handler [1].
> > 
> > Reverting 3759824da87b30ce7a35b4873b62b0ba38905ef5 seems to fix the issue.
> > 
> > I'm able to reproduce the issue on 4.3.0–4.3.3 once per several day
> > (occasionally).
> > 
> > What could be done to help in debugging this issue?
> 
> Do you have ECN enabled (i.e. sysctl net.ipv4.tcp_ecn > 0)?
> 
> If so I suspect an ACK carrying ECE during CA_Loss causes entering CWR
> state w/o calling tcp_init_cwnd_reduct() to set tp->prior_cwnd. Can
> you try this debug / quick-fix patch and send me the error message if
> any?
> 
> > Regards,
> > 
> >   Oleksandr.
> > 
> > [1] http://i.piccy.info/
> > i9/6f5cb187c4ff282d189f78c63f95af43/1450729403/283985/951663/panic.jpg


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/5] sfc: Handle MCDI proxy authorisation

2015-12-22 Thread David Miller
From: Bert Kenward 
Date: Fri, 18 Dec 2015 17:09:04 +

> +#ifdef DEBUG
> + WARN_ON(1);
> +#endif

Don't do stuff like this.  Either the assertion is valid and belongs
here, or it doesn't.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/4] soreuseport: BPF selection functional test

2015-12-22 Thread Craig Gallek
From: Craig Gallek 

This program will build classic and extended BPF programs and
validate the socket selection logic when used with
SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF.

It also validates the re-programing flow and several edge cases.

Signed-off-by: Craig Gallek 
---
 tools/testing/selftests/net/.gitignore  |   1 +
 tools/testing/selftests/net/Makefile|   2 +-
 tools/testing/selftests/net/reuseport_bpf.c | 467 
 3 files changed, 469 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/reuseport_bpf.c

diff --git a/tools/testing/selftests/net/.gitignore 
b/tools/testing/selftests/net/.gitignore
index 0032662..6fb2336 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -1,3 +1,4 @@
 socket
 psock_fanout
 psock_tpacket
+reuseport_bpf
diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index fac4782..41449b5 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -4,7 +4,7 @@ CFLAGS = -Wall -O2 -g
 
 CFLAGS += -I../../../../usr/include/
 
-NET_PROGS = socket psock_fanout psock_tpacket
+NET_PROGS = socket psock_fanout psock_tpacket reuseport_bpf
 
 all: $(NET_PROGS)
 %: %.c
diff --git a/tools/testing/selftests/net/reuseport_bpf.c 
b/tools/testing/selftests/net/reuseport_bpf.c
new file mode 100644
index 000..74ff099
--- /dev/null
+++ b/tools/testing/selftests/net/reuseport_bpf.c
@@ -0,0 +1,467 @@
+/*
+ * Test functionality of BPF filters for SO_REUSEPORT.  The tests below will 
use
+ * a BPF program (both classic and extended) to read the first word from an
+ * incoming packet (expected to be in network byte-order), calculate a modulus
+ * of that number, and then dispatch the packet to the Nth socket using the
+ * result.  These tests are run for each supported address family and protocol.
+ * Additionally, a few edge cases in the implementation are tested.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#endif
+
+struct test_params {
+   int recv_family;
+   int send_family;
+   int protocol;
+   size_t recv_socks;
+   uint16_t recv_port;
+   uint16_t send_port_min;
+};
+
+static size_t sockaddr_size(void)
+{
+   return sizeof(struct sockaddr_storage);
+}
+
+static struct sockaddr *new_any_sockaddr(int family, uint16_t port)
+{
+   struct sockaddr_storage *addr;
+   struct sockaddr_in *addr4;
+   struct sockaddr_in6 *addr6;
+
+   addr = malloc(sizeof(struct sockaddr_storage));
+   memset(addr, 0, sizeof(struct sockaddr_storage));
+
+   switch (family) {
+   case AF_INET:
+   addr4 = (struct sockaddr_in *)addr;
+   addr4->sin_family = AF_INET;
+   addr4->sin_addr.s_addr = htonl(INADDR_ANY);
+   addr4->sin_port = htons(port);
+   break;
+   case AF_INET6:
+   addr6 = (struct sockaddr_in6 *)addr;
+   addr6->sin6_family = AF_INET6;
+   addr6->sin6_addr = in6addr_any;
+   addr6->sin6_port = htons(port);
+   break;
+   default:
+   error(1, 0, "Unsupported family %d", family);
+   }
+   return (struct sockaddr *)addr;
+}
+
+static struct sockaddr *new_loopback_sockaddr(int family, uint16_t port)
+{
+   struct sockaddr *addr = new_any_sockaddr(family, port);
+   struct sockaddr_in *addr4;
+   struct sockaddr_in6 *addr6;
+
+   switch (family) {
+   case AF_INET:
+   addr4 = (struct sockaddr_in *)addr;
+   addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+   break;
+   case AF_INET6:
+   addr6 = (struct sockaddr_in6 *)addr;
+   addr6->sin6_addr = in6addr_loopback;
+   break;
+   default:
+   error(1, 0, "Unsupported family %d", family);
+   }
+   return addr;
+}
+
+static void attach_ebpf(int fd, uint16_t mod)
+{
+   static char bpf_log_buf[65536];
+   static const char bpf_license[] = "GPL";
+
+   int bpf_fd;
+   const struct bpf_insn prog[] = {
+   /* BPF_MOV64_REG(BPF_REG_6, BPF_REG_1) */
+   { BPF_ALU64 | BPF_MOV | BPF_X, BPF_REG_6, BPF_REG_1, 0, 0 },
+   /* BPF_LD_ABS(BPF_W, 0) R0 = (uint32_t)skb[0] */
+   { BPF_LD | BPF_ABS | BPF_W, 0, 0, 0, 0 },
+   /* BPF_ALU64_IMM(BPF_MOD, BPF_REG_0, mod) */
+   { BPF_ALU64 | BPF_MOD | BPF_K, BPF_REG_0, 0, 0, mod },
+   /* BPF_EXIT_INSN() */
+   { BPF_JMP | BPF_EXIT, 0, 0, 0, 0 }
+   };
+   union bpf_attr attr;
+
+   memset(&attr, 0, sizeof(attr));
+   attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
+   attr.insn_cnt = ARRAY_SIZE(prog);
+   

[PATCH net-next 1/4] soreuseport: define reuseport groups

2015-12-22 Thread Craig Gallek
From: Craig Gallek 

struct sock_reuseport is an optional shared structure referenced by each
socket belonging to a reuseport group.  When a socket is bound to an
address/port not yet in use and the reuseport flag has been set, the
structure will be allocated and attached to the newly bound socket.
When subsequent calls to bind are made for the same address/port, the
shared structure will be updated to include the new socket and the
newly bound socket will reference the group structure.

Usually, when an incoming packet was destined for a reuseport group,
all sockets in the same group needed to be considered before a
dispatching decision was made.  With this structure, an appropriate
socket can be found after looking up just one socket in the group.

This shared structure will also allow for more complicated decisions to
be made when selecting a socket (eg a BPF filter).

This work is based off a similar implementation written by
Ying Cai  for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek 
---
 include/net/sock.h   |   2 +
 include/net/sock_reuseport.h |  21 ++
 net/core/Makefile|   2 +-
 net/core/sock_reuseport.c| 173 +++
 4 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 include/net/sock_reuseport.h
 create mode 100644 net/core/sock_reuseport.c

diff --git a/include/net/sock.h b/include/net/sock.h
index 3794cdd..e830c10 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -318,6 +318,7 @@ struct cg_proto;
   *@sk_error_report: callback to indicate errors (e.g. %MSG_ERRQUEUE)
   *@sk_backlog_rcv: callback to process the backlog
   *@sk_destruct: called at sock freeing time, i.e. when all refcnt == 0
+  *@sk_reuseport_cb: reuseport group container
  */
 struct sock {
/*
@@ -453,6 +454,7 @@ struct sock {
int (*sk_backlog_rcv)(struct sock *sk,
  struct sk_buff *skb);
void(*sk_destruct)(struct sock *sk);
+   struct sock_reuseport __rcu *sk_reuseport_cb;
 };
 
 #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h
new file mode 100644
index 000..f17d190
--- /dev/null
+++ b/include/net/sock_reuseport.h
@@ -0,0 +1,21 @@
+#ifndef _SOCK_REUSEPORT_H
+#define _SOCK_REUSEPORT_H
+
+#include 
+#include 
+#include 
+
+struct sock_reuseport {
+   struct rcu_head rcu;
+
+   u16 max_socks;  /* length of socks */
+   u16 num_socks;  /* elements in socks */
+   struct sock *socks[0];  /* array of sock pointers */
+};
+
+extern int reuseport_alloc(struct sock *sk);
+extern int reuseport_add_sock(struct sock *sk, const struct sock *sk2);
+extern void reuseport_detach_sock(struct sock *sk);
+extern struct sock *reuseport_select_sock(struct sock *sk, u32 hash);
+
+#endif  /* _SOCK_REUSEPORT_H */
diff --git a/net/core/Makefile b/net/core/Makefile
index 086b01f..0b835de 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -9,7 +9,7 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 
 obj-y   += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \
neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
-   sock_diag.o dev_ioctl.o tso.o
+   sock_diag.o dev_ioctl.o tso.o sock_reuseport.o
 
 obj-$(CONFIG_XFRM) += flow.o
 obj-y += net-sysfs.o
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
new file mode 100644
index 000..963c8d5
--- /dev/null
+++ b/net/core/sock_reuseport.c
@@ -0,0 +1,173 @@
+/*
+ * To speed up listener socket lookup, create an array to store all sockets
+ * listening on the same port.  This allows a decision to be made after finding
+ * the first socket.
+ */
+
+#include 
+#include 
+
+#define INIT_SOCKS 128
+
+static DEFINE_SPINLOCK(reuseport_lock);
+
+static struct sock_reuseport *__reuseport_alloc(u16 max_socks)
+{
+   size_t size = sizeof(struct sock_reuseport) +
+ sizeof(struct sock *) * max_socks;
+   struct sock_reuseport *reuse = kzalloc(size, GFP_ATOMIC);
+
+   if (!reuse)
+   return NULL;
+
+   reuse->max_socks = max_socks;
+
+   return reuse;
+}
+
+int reuseport_alloc(struct sock *sk)
+{
+   struct sock_reuseport *reuse;
+
+   /* bh lock used since this function call may precede hlist lock in
+* soft irq of receive path or setsockopt from process context
+*/
+   spin_lock_bh(&reuseport_lock);
+   WARN_ONCE(rcu_dereference_protected(sk->sk_reuseport_cb,
+   lockdep_is_held(&reuseport_lock)),
+ "multiple allocations for the same socket");
+   reuse = __reuseport_alloc(INIT_SOCKS);
+   if (!reuse) {
+   spin_unlock_bh(&reuse

[PATCH net-next 3/4] soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF

2015-12-22 Thread Craig Gallek
From: Craig Gallek 

Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek 
---
 arch/alpha/include/uapi/asm/socket.h   |   3 +
 arch/avr32/include/uapi/asm/socket.h   |   3 +
 arch/frv/include/uapi/asm/socket.h |   3 +
 arch/ia64/include/uapi/asm/socket.h|   3 +
 arch/m32r/include/uapi/asm/socket.h|   3 +
 arch/mips/include/uapi/asm/socket.h|   3 +
 arch/mn10300/include/uapi/asm/socket.h |   3 +
 arch/parisc/include/uapi/asm/socket.h  |   3 +
 arch/powerpc/include/uapi/asm/socket.h |   3 +
 arch/s390/include/uapi/asm/socket.h|   3 +
 arch/sparc/include/uapi/asm/socket.h   |   3 +
 arch/xtensa/include/uapi/asm/socket.h  |   3 +
 include/linux/filter.h |   2 +
 include/net/sock_reuseport.h   |  10 ++-
 include/net/udp.h  |   5 +-
 include/uapi/asm-generic/socket.h  |   3 +
 net/core/filter.c  | 120 +++--
 net/core/sock.c|  29 
 net/core/sock_reuseport.c  |  88 ++--
 net/ipv4/udp.c |  14 ++--
 net/ipv4/udp_diag.c|   4 +-
 net/ipv6/udp.c |  14 ++--
 22 files changed, 282 insertions(+), 43 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 9a20821..c5fb9e6 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -92,4 +92,7 @@
 #define SO_ATTACH_BPF  50
 #define SO_DETACH_BPF  SO_DETACH_FILTER
 
+#define SO_ATTACH_REUSEPORT_CBPF   51
+#define SO_ATTACH_REUSEPORT_EBPF   52
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h 
b/arch/avr32/include/uapi/asm/socket.h
index 2b65ed6..9de0796 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -85,4 +85,7 @@
 #define SO_ATTACH_BPF  50
 #define SO_DETACH_BPF  SO_DETACH_FILTER
 
+#define SO_ATTACH_REUSEPORT_CBPF   51
+#define SO_ATTACH_REUSEPORT_EBPF   52
+
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index 4823ad1..f02e484 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -85,5 +85,8 @@
 #define SO_ATTACH_BPF  50
 #define SO_DETACH_BPF  SO_DETACH_FILTER
 
+#define SO_ATTACH_REUSEPORT_CBPF   51
+#define SO_ATTACH_REUSEPORT_EBPF   52
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 59be3d8..bce2916 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -94,4 +94,7 @@
 #define SO_ATTACH_BPF  50
 #define SO_DETACH_BPF  SO_DETACH_FILTER
 
+#define SO_ATTACH_REUSEPORT_CBPF   51
+#define SO_ATTACH_REUSEPORT_EBPF   52
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 7bc4cb2..14aa4a6 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -85,4 +85,7 @@
 #define SO_ATTACH_BPF  50
 #define SO_DETACH_BPF  SO_DETACH_FILTER
 
+#define SO_ATTACH_REUSEPORT_CBPF   51
+#define SO_ATTACH_REUSEPORT_EBPF   52
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index dec3c85..5910fe2 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -103,4 +103,7 @@
 #define SO_ATTACH_BPF  50
 #define SO_DETACH_BPF  SO_DETACH_FILTER
 
+#define SO_ATTACH_REUSEPORT_CBPF   51
+#define SO_ATTACH_REUSEPORT_EBPF   52
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index cab7d6d..58b1aa0 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -85,4 +85,7 @@
 #define SO_ATTACH_BPF  50
 #define SO_DETACH_BPF  SO_DETACH_FILTER
 
+#define SO_ATTACH_REUSEPORT_CBPF   51
+#define SO_ATTACH_REUSEPORT_EBPF   52
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index a5cd40c..f9cf122 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -84,4 +84,7 @@
 #define SO_ATTACH_BPF  0x402B
 #define SO_DETACH_BPF  SO_DETACH_FILTER
 
+#define SO_ATTACH_REUSEPORT_CBPF   0x402C
+#define SO_ATTAC

[PATCH net-next 0/4] Faster SO_REUSEPORT

2015-12-22 Thread Craig Gallek
From: Craig Gallek 

This series contains two optimizations for the SO_REUSEPORT feature:
Faster lookup when selecting a socket for an incoming packet and
the ability to select the socket from the group using a BPF program.

This series only includes the UDP path.  I plan to submit a follow-up
including the TCP path if the implementation in this series is
acceptable.

Craig Gallek (4):
  soreuseport: define reuseport groups
  soreuseport: fast reuseport UDP socket selection
  soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF
  soreuseport: BPF selection functional test

 arch/alpha/include/uapi/asm/socket.h|   3 +
 arch/avr32/include/uapi/asm/socket.h|   3 +
 arch/frv/include/uapi/asm/socket.h  |   3 +
 arch/ia64/include/uapi/asm/socket.h |   3 +
 arch/m32r/include/uapi/asm/socket.h |   3 +
 arch/mips/include/uapi/asm/socket.h |   3 +
 arch/mn10300/include/uapi/asm/socket.h  |   3 +
 arch/parisc/include/uapi/asm/socket.h   |   3 +
 arch/powerpc/include/uapi/asm/socket.h  |   3 +
 arch/s390/include/uapi/asm/socket.h |   3 +
 arch/sparc/include/uapi/asm/socket.h|   3 +
 arch/xtensa/include/uapi/asm/socket.h   |   3 +
 include/linux/filter.h  |   2 +
 include/net/addrconf.h  |   3 +-
 include/net/sock.h  |   2 +
 include/net/sock_reuseport.h|  29 ++
 include/net/udp.h   |   7 +-
 include/uapi/asm-generic/socket.h   |   3 +
 net/core/Makefile   |   2 +-
 net/core/filter.c   | 120 +--
 net/core/sock.c |  29 ++
 net/core/sock_reuseport.c   | 251 +++
 net/ipv4/udp.c  | 127 ++--
 net/ipv4/udp_diag.c |   4 +-
 net/ipv6/inet6_connection_sock.c|   4 +-
 net/ipv6/udp.c  |  56 +++-
 tools/testing/selftests/net/.gitignore  |   1 +
 tools/testing/selftests/net/Makefile|   2 +-
 tools/testing/selftests/net/reuseport_bpf.c | 467 
 29 files changed, 1077 insertions(+), 68 deletions(-)
 create mode 100644 include/net/sock_reuseport.h
 create mode 100644 net/core/sock_reuseport.c
 create mode 100644 tools/testing/selftests/net/reuseport_bpf.c

-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/4] soreuseport: fast reuseport UDP socket selection

2015-12-22 Thread Craig Gallek
From: Craig Gallek 

Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind.  The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores):  Create reuseport groups of varying size.  Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop.  Record number of messages
received per second while saturating a 10G link.
  10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
  20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
  40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai  for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek 
---
 include/net/addrconf.h   |   3 +-
 include/net/udp.h|   2 +-
 net/ipv4/udp.c   | 119 +++
 net/ipv6/inet6_connection_sock.c |   4 +-
 net/ipv6/udp.c   |  48 +---
 5 files changed, 141 insertions(+), 35 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 78003df..47f52d3 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -87,7 +87,8 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr 
*addr,
  u32 banned_flags);
 int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr,
u32 banned_flags);
-int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2);
+int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
+bool match_wildcard);
 void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr);
 void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr 
*addr);
 
diff --git a/include/net/udp.h b/include/net/udp.h
index 6d4ed18..3b5d7f9 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -191,7 +191,7 @@ static inline void udp_lib_close(struct sock *sk, long 
timeout)
 }
 
 int udp_lib_get_port(struct sock *sk, unsigned short snum,
-int (*)(const struct sock *, const struct sock *),
+int (*)(const struct sock *, const struct sock *, bool),
 unsigned int hash2_nulladdr);
 
 u32 udp_flow_hashrnd(void);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8841e98..de2e1c0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,6 +113,7 @@
 #include 
 #include 
 #include "udp_impl.h"
+#include 
 
 struct udp_table udp_table __read_mostly;
 EXPORT_SYMBOL(udp_table);
@@ -137,7 +138,8 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
   unsigned long *bitmap,
   struct sock *sk,
   int (*saddr_comp)(const struct sock *sk1,
-const struct sock *sk2),
+const struct sock *sk2,
+bool match_wildcard),
   unsigned int log)
 {
struct sock *sk2;
@@ -152,8 +154,9 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
(!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if ||
 sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
(!sk2->sk_reuseport || !sk->sk_reuseport ||
+rcu_access_pointer(sk->sk_reuseport_cb) ||
 !uid_eq(uid, sock_i_uid(sk2))) &&
-   saddr_comp(sk, sk2)) {
+   saddr_comp(sk, sk2, true)) {
if (!bitmap)
return 1;
__set_bit(udp_sk(sk2)->udp_port_hash >> log, bitmap);
@@ -170,7 +173,8 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num,
struct udp_hslot *hslot2,
struct sock *sk,
int (*saddr_comp)(const struct sock *sk1,
- const struct sock *sk2))
+ const struct sock *sk2,
+ bool match_wildcard))
 {
struct sock *sk2;
struct hlist_nulls_node *node;
@@ -186,8 +190,9 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num,
(!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if ||
 sk2->sk_bound_d

Re: [PATCH] net-sysfs: use to_net_dev in net_namespace()

2015-12-22 Thread David Miller
From: Geliang Tang 
Date: Tue, 22 Dec 2015 23:11:49 +0800

> Use to_net_dev() instead of open-coding it.
> 
> Signed-off-by: Geliang Tang 

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Netfilter fixes for net

2015-12-22 Thread David Miller
From: Pablo Neira Ayuso 
Date: Tue, 22 Dec 2015 18:53:15 +0100

> The following patchset contains two netfilter fixes:
> 
> 1) Oneliner from Florian to dump missing NFT_CT_L3PROTOCOL netlink
>attribute, from Florian Westphal.
> 
> 2) Another oneliner for nf_tables to use skb->protocol from the new
>netdev family, we can't assume ethernet there.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next 00/10][pull request] 100GbE Intel Wired LAN Driver Updates 2015-12-22

2015-12-22 Thread David Miller
From: Jeff Kirsher 
Date: Tue, 22 Dec 2015 06:02:02 -0800

> This series contains updates to fm10k only.

Pulled, thanks a lot Jeff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 00/16] add Intel(R) X722 iWARP driver

2015-12-22 Thread Doug Ledford
On 12/21/2015 06:13 PM, Faisal Latif wrote:
> This (V1) series contains the addition of the i40iw.ko driver after
> incorporating the feedback from Christoph Hellwig and Joe Perches for
> initial series.
> 
> This driver provides iWARP RDMA functionality for the Intel(R) X722 Ethernet
> controller for PCI Physical Functions. It also has support for Virtual
> Function driver (i40iwvf.ko), which that will be part of separate patch
> series.
> 
> It cooperates with the Intel(R) X722 base driver (i40e.ko) to allocate
> resources and program the controller.
> 
> This series include 1 patch to i40e.ko to provide interface support to
> i40iw.ko. The interface provides a driver registration mechanism, resource
> allocations, and device reset coordination mechanisms.
> 
> This patch series is based on Doug Ledford's k.o/for-4.5.

Please use shallow threading on patch submissions like this.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 -next 3/3] tcp: honour SO_BINDTODEVICE for TW_RST case too

2015-12-22 Thread Eric Dumazet
On Mon, 2015-12-21 at 21:29 +0100, Florian Westphal wrote:
> Hannes points out that when we generate tcp reset for timewait sockets we
> pretend we found no socket and pass NULL sk to tcp_vX_send_reset().
> 
> Make it cope with inet tw sockets and then provide tw sk.
> 
> This makes RSTs appear on correct interface when SO_BINDTODEVICE is used.

Acked-by: Eric Dumazet 

Thanks Florian !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 1/3] net: add inet_sk_transparent() helper

2015-12-22 Thread Eric Dumazet
On Mon, 2015-12-21 at 21:29 +0100, Florian Westphal wrote:
> Avoids cluttering tcp_v4_send_reset when followup patch extends
> it to deal with timewait sockets.
> 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Florian Westphal 
> ---

Acked-by: Eric Dumazet 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 2/3] tcp: send_reset: test for non-NULL sk first

2015-12-22 Thread Eric Dumazet
On Mon, 2015-12-21 at 21:29 +0100, Florian Westphal wrote:
> tcp_md5_do_lookup requires a full socket, so once we extend
> _send_reset() to also accept timewait socket we would have to change
> 

Acked-by: Eric Dumazet 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] can: sja1000: add documentation for Technologic Systems version

2015-12-22 Thread Rob Herring
On Mon, Dec 21, 2015 at 9:09 AM, Damien Riegel
 wrote:
> On Sat, Dec 19, 2015 at 09:37:42PM -0600, Rob Herring wrote:
>> On Fri, Dec 18, 2015 at 03:17:24PM -0500, Damien Riegel wrote:
>> > This commit adds documentation for the Technologic Systems version of
>> > SJA1000. The difference with the NXP version is in the way the registers
>> > are accessed.
>> >
>> > Signed-off-by: Damien Riegel 
>> > ---
>> >  Documentation/devicetree/bindings/net/can/sja1000.txt | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/Documentation/devicetree/bindings/net/can/sja1000.txt 
>> > b/Documentation/devicetree/bindings/net/can/sja1000.txt
>> > index b4a6d53..7a158d5 100644
>> > --- a/Documentation/devicetree/bindings/net/can/sja1000.txt
>> > +++ b/Documentation/devicetree/bindings/net/can/sja1000.txt
>> > @@ -2,7 +2,7 @@ Memory mapped SJA1000 CAN controller from NXP (formerly 
>> > Philips)
>> >
>> >  Required properties:
>> >
>> > -- compatible : should be "nxp,sja1000".
>> > +- compatible : should be one of "nxp,sja1000", "technologic,sja1000".
>> >
>> >  - reg : should specify the chip select, address offset and size required
>> > to map the registers of the SJA1000. The size is usually 0x80.
>> > @@ -14,6 +14,7 @@ Optional properties:
>> >
>> >  - reg-io-width : Specify the size (in bytes) of the IO accesses that
>> > should be performed on the device.  Valid value is 1, 2 or 4.
>> > +   Must be set to 2 for technologic version.
>> > Default to 1 (8 bits).
>>
>> Really, this should default to 2 for technologic version and not be
>> required.
>
> Would something along the line of "This property is 
> for technologic version." be more appropriate?

Yes, exactly.

Rob
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] cgroup: implement cgroup_get_from_path() and expose cgroup_put()

2015-12-22 Thread Tejun Heo
Hello, Serge.

On Mon, Dec 21, 2015 at 06:22:41PM -0600, Serge E. Hallyn wrote:
> I'm trying to figure out how to handle this in the cgroup ns patchset.
> Is this going to be purely used internally?  From the user i see in
> this patchset it looks like I should leave it be (and have @path always
> be absolute)  Is that right?

Yeap, I don't think this function needs to worry about namespaces.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] netfilter: nf_tables: use skb->protocol instead of assuming ethernet header

2015-12-22 Thread Pablo Neira Ayuso
Otherwise we may end up with incorrect network and transport header for
other protocols.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_netdev.c b/net/netfilter/nf_tables_netdev.c
index 7b9c053..edb3502f 100644
--- a/net/netfilter/nf_tables_netdev.c
+++ b/net/netfilter/nf_tables_netdev.c
@@ -94,7 +94,7 @@ nft_do_chain_netdev(void *priv, struct sk_buff *skb,
 {
struct nft_pktinfo pkt;
 
-   switch (eth_hdr(skb)->h_proto) {
+   switch (skb->protocol) {
case htons(ETH_P_IP):
nft_netdev_set_pktinfo_ipv4(&pkt, skb, state);
break;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] netfilter: nft_ct: include direction when dumping NFT_CT_L3PROTOCOL key

2015-12-22 Thread Pablo Neira Ayuso
From: Florian Westphal 

one nft userspace test case fails with

'ct l3proto original ipv4' mismatches 'ct l3proto ipv4'

... because NFTA_CT_DIRECTION attr is missing.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_ct.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 8cbca34..9399215 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -366,6 +366,7 @@ static int nft_ct_get_dump(struct sk_buff *skb, const 
struct nft_expr *expr)
goto nla_put_failure;
 
switch (priv->key) {
+   case NFT_CT_L3PROTOCOL:
case NFT_CT_PROTOCOL:
case NFT_CT_SRC:
case NFT_CT_DST:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Netfilter fixes for net

2015-12-22 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains two netfilter fixes:

1) Oneliner from Florian to dump missing NFT_CT_L3PROTOCOL netlink
   attribute, from Florian Westphal.

2) Another oneliner for nf_tables to use skb->protocol from the new
   netdev family, we can't assume ethernet there.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 73796d8bf27372e26c2b79881947304c14c2d353:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-12-17 
14:05:22 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to d5f79b6e4d169039903cc869e16e59ad861dd479:

  netfilter: nft_ct: include direction when dumping NFT_CT_L3PROTOCOL key 
(2015-12-18 14:45:45 +0100)


Florian Westphal (1):
  netfilter: nft_ct: include direction when dumping NFT_CT_L3PROTOCOL key

Pablo Neira Ayuso (1):
  netfilter: nf_tables: use skb->protocol instead of assuming ethernet 
header

 net/netfilter/nf_tables_netdev.c | 2 +-
 net/netfilter/nft_ct.c   | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] CALIPSO implementation

2015-12-22 Thread Huw Davies
On Tue, Dec 22, 2015 at 09:28:37AM -0800, Casey Schaufler wrote:
> On 12/22/2015 3:46 AM, Huw Davies wrote:
> > This patch series implements RFC 5570 - Common Architecture Label IPv6
> > Security Option (CALIPSO).  Its goal is to set MLS sensitivity labels
> > on IPv6 packets using a hop-by-hop option.  CALIPSO very similar to
> > its IPv4 cousin CIPSO and much of this series is based on that code.
> 
> There's a one line change to the Smack code in 15/17 due to
> a change in the api, but I assume that there has been no
> attempt to verify that this works with Smack. It's not 100%
> clear that this won't break a Smack kernel, but I haven't
> tried it.

That's correct, I've not looked into Smack at all.  I'll
see if I can figure out what's needed.

Huw.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] CALIPSO implementation

2015-12-22 Thread Casey Schaufler
On 12/22/2015 3:46 AM, Huw Davies wrote:
> This patch series implements RFC 5570 - Common Architecture Label IPv6
> Security Option (CALIPSO).  Its goal is to set MLS sensitivity labels
> on IPv6 packets using a hop-by-hop option.  CALIPSO very similar to
> its IPv4 cousin CIPSO and much of this series is based on that code.

There's a one line change to the Smack code in 15/17 due to
a change in the api, but I assume that there has been no
attempt to verify that this works with Smack. It's not 100%
clear that this won't break a Smack kernel, but I haven't
tried it.

You'll need to provide sufficient information (or code!) so
that security modules other than SELinux can use this. If
you look at how Smack uses netlabel for IPv4 you will see
that it differs substantially from the way SELinux uses it.

Thank you for tackling RFC 5570. The lack of something like
this has put IPv6 at a real disadvantage.

>
> Most of this series involves adding support to NetLabel and adding a
> CALIPSO module within IPv6, and as such is fairly self-contained.
> There are however a few places where I've needed to add things to the
> core networking stack, so I'd be particularly interested in hearing
> comments about these:
>
> [PATCH 08/17] ipv6: Add ipv6_renew_options_kern() that accepts a kernel mem 
> pointer.
>   Hopefully not too controversial - adds a kernel memory version of
>   ipv6_renew_options()
>
> [PATCH 12/17] ipv6: Allow request socks to contain IPv6 options.
>   We need a way to set the IPv6 options on request sockets, just like
>   we do for IPv4.  This is so that the LSM can ensure the SYN-ACK is
>   correctly labelled.
>
> The series is based off v4.4-rc6.
>
> Thoughts about these and any of the other patches are most welcome.
>
> If anybody actually wants to play with this, then you'll need some patches
> to netlabel-tools that are currently available on the 'calipso' branch at:
> https://github.com/hdmdavies/netlabel_tools.git
>
> Thanks to Paul Moore for his guidance in getting this far.
>
> Huw.
> ___
> Selinux mailing list
> seli...@tycho.nsa.gov
> To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
> To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V2 3/4] net/mlx5e: Add HW timestamping (TS) support

2015-12-22 Thread Richard Cochran
On Tue, Dec 22, 2015 at 12:00:52PM +0200, Saeed Mahameed wrote:
> Ok, but what will happen if somehow tstamp->overflow_period is zero ?
> the work will run too rapidly.
> don't we need to have protection against such case.

Why not return an error in that case?

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 16/17] calipso: Add validation of CALIPSO option.

2015-12-22 Thread Huw Davies
On Tue, Dec 22, 2015 at 02:50:20PM +0100, Hannes Frederic Sowa wrote:
> On 22.12.2015 12:46, Huw Davies wrote:
> >  
> > +/* CALIPSO RFC 5570 */
> > +
> > +static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> > +{
> > +   const unsigned char *nh = skb_network_header(skb);
> > +
> > +   if (nh[optoff + 1] < 8)
> > +   goto drop;
> > +
> > +   if (nh[optoff + 6] * 4 + 8 > nh[optoff + 1])
> > +   goto drop;
> > +
> > +   if (!calipso_validate(skb, nh + optoff))
> > +   goto drop;
> > +
> > +   return true;
> > +
> > +drop:
> > +   kfree_skb(skb);
> > +   return false;
> > +}
> > +
> 
> Formally, if an extension header could not be processed, the packet
> should be discarded and an icmp error parameter extension should be
> send. I think we shouldn't let those packets pass here.

Thanks for your comments Hannes, I'm looking into your other
suggestions.

I'm confused about this one.  AFAICS, this will drop packets that we
can't process.  We don't send the icmp error, but I can certainly add
that.  Is that what you mean?

Thanks,
Huw.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6 route to gateway on fe80::1%eth0 when I have fe80::1%br0 locally

2015-12-22 Thread Hannes Frederic Sowa
On 12.12.2015 20:58, Marc Haber wrote:
> Any hints would be appreciated.

This sysctl should help:

accept_ra_from_local - BOOLEAN
Accept RA with source-address that is found on local machine
if the RA is otherwise proper and able to be accepted.
Default is to NOT accept these as it may be an un-intended
network loop.

Functional default:
   enabled if accept_ra_from_local is enabled
   on a specific interface.
   disabled if accept_ra_from_local is disabled
   on a specific interface.

Anyway, this has to be fixed up in a clean way and should work by default.

Thanks for the report,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net-sysfs: use to_net_dev in net_namespace()

2015-12-22 Thread Geliang Tang
Use to_net_dev() instead of open-coding it.

Signed-off-by: Geliang Tang 
---
 net/core/net-sysfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bca8c35..b6c8a66 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1453,8 +1453,8 @@ static void netdev_release(struct device *d)
 
 static const void *net_namespace(struct device *d)
 {
-   struct net_device *dev;
-   dev = container_of(d, struct net_device, dev);
+   struct net_device *dev = to_net_dev(d);
+
return dev_net(dev);
 }
 
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v2 net-next] ravb: Add dma queue interrupt support

2015-12-22 Thread Sergei Shtylyov

Hello.

On 12/20/2015 12:15 PM, Yoshihiro Kaneko wrote:


From: Kazuya Mizuguchi 

This patch supports the following interrupts.

- One interrupt for multiple (descriptor, error, management)
- One interrupt for emac
- Four interrupts for dma queue (best effort rx/tx, network control rx/tx)


   You still don't say why it's better than the current scheme...


Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Yoshihiro Kaneko 
---

This patch is based on the master branch of David Miller's next networking
tree.

v2 [Yoshihiro Kaneko]
* compile tested only
* As suggested by Sergei Shtylyov
   - add comment to CIE
   - remove comments from CIE bits
   - fix value of TIx_ALL
   - define each bits for CIE, GIE, GID, RIE0, RID0, RIE2, RID2, TIE, TID
   - reversed Christmas tree declaration ordered
   - rename _ravb_emac_interrupt() to ravb_emac_interrupt_unlocked()
   - remove unnecessary clearing of CIE
   - use a bit name corresponding to the target register, RIE0, RIE2, TIE,
 TID, RID2, GID, GIE

  drivers/net/ethernet/renesas/ravb.h  | 213 ++
  drivers/net/ethernet/renesas/ravb_main.c | 247 +++
  drivers/net/ethernet/renesas/ravb_ptp.c  |  45 --
  3 files changed, 464 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index 9fbe92a..71badd6d 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -157,6 +157,7 @@ enum ravb_reg {
TIC = 0x0378,
TIS = 0x037C,
ISS = 0x0380,
+   CIE = 0x0384,   /* R-Car Gen3 only */
GCCR= 0x0390,
GMTT= 0x0394,
GPTC= 0x0398,
@@ -170,6 +171,15 @@ enum ravb_reg {
GCT0= 0x03B8,
GCT1= 0x03BC,
GCT2= 0x03C0,
+   GIE = 0x03CC,
+   GID = 0x03D0,
+   DIL = 0x0440,
+   RIE0= 0x0460,
+   RID0= 0x0464,
+   RIE2= 0x0470,
+   RID2= 0x0474,
+   TIE = 0x0478,
+   TID = 0x047c,


   So you only commented on CIE and considered it done? :-)

[...]

@@ -411,14 +422,27 @@ static int ravb_dmac_init(struct net_device *ndev)
ravb_write(ndev, TCCR_TFEN, TCCR);

/* Interrupt init: */
-   /* Frame receive */
-   ravb_write(ndev, RIC0_FRE0 | RIC0_FRE1, RIC0);
-   /* Disable FIFO full warning */
-   ravb_write(ndev, 0, RIC1);
-   /* Receive FIFO full error, descriptor empty */
-   ravb_write(ndev, RIC2_QFE0 | RIC2_QFE1 | RIC2_RFFE, RIC2);
-   /* Frame transmitted, timestamp FIFO updated */
-   ravb_write(ndev, TIC_FTE0 | TIC_FTE1 | TIC_TFUE, TIC);
+   if (priv->chip_id == RCAR_GEN2) {
+   /* Frame receive */
+   ravb_write(ndev, RIC0_FRE0 | RIC0_FRE1, RIC0);
+   /* Disable FIFO full warning */
+   ravb_write(ndev, 0, RIC1);
+   /* Receive FIFO full error, descriptor empty */
+   ravb_write(ndev, RIC2_QFE0 | RIC2_QFE1 | RIC2_RFFE, RIC2);
+   /* Frame transmitted, timestamp FIFO updated */
+   ravb_write(ndev, TIC_FTE0 | TIC_FTE1 | TIC_TFUE, TIC);
+   } else {
+   /* Clear DIL.DPLx */
+   ravb_write(ndev, 0, DIL);
+   /* Set queue specific interrupt */
+   ravb_write(ndev, CIE_CRIE | CIE_CTIE | CIE_CL0M, CIE);
+   /* Frame receive */
+   ravb_write(ndev, RIE0_FRS0 | RIE0_FRS1, RIE0);
+   /* Receive FIFO full error, descriptor empty */
+   ravb_write(ndev, RIE2_QFS0 | RIE2_QFS1 | RIE2_RFFS, RIE2);
+   /* Frame transmitted, timestamp FIFO updated */
+   ravb_write(ndev, TIE_FTS0 | TIE_FTS1 | TIE_TFUS, TIE);
+   }


   So in this case for gen3 we enable interrupts we need in addition to already
enabled (by a boot loader perhaps)? I don't think you actually want it...

[...]

@@ -690,7 +726,10 @@ static void ravb_error_interrupt(struct net_device *ndev)
ravb_write(ndev, ~EIS_QFS, EIS);
if (eis & EIS_QFS) {
ris2 = ravb_read(ndev, RIS2);
-   ravb_write(ndev, ~(RIS2_QFF0 | RIS2_RFFF), RIS2);
+   if (priv->chip_id == RCAR_GEN2)
+   ravb_write(ndev, ~(RIS2_QFF0 | RIS2_RFFF), RIS2);
+   else
+   ravb_write(ndev, RID2_QFD0 | RID2_RFFD, RID2);


   Err, aren't you doing 2 different things for gen2 and gen3 here. For gen2 
you're clearing the QFF0/RFFF interrupts, for gen3 you're disabling them, no?


[...]

@@ -758,16 +797,43 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id)

[...]

+/* Descriptor IRQ/Error/Management interrupt handler */
+static irqreturn_t ravb_multi_interrupt(int irq, void *dev_id)
+{
+   struct net_device *ndev = dev_id;
+   struct ravb_private *priv = netdev_priv(ndev);
+   irqreturn_t result = IRQ_NONE;
+   u32 iss;
+
+   spin_lock(&priv->l

[net-next 00/10][pull request] 100GbE Intel Wired LAN Driver Updates 2015-12-22

2015-12-22 Thread Jeff Kirsher
This series contains updates to fm10k only.

Bruce cleans up the initialization of fm10k_workqueue at the global level,
which fixes a checkpatch.pl error.  Made several other cleanups of the
driver, like making structures that do not change constant, remove unused
code, cleanup code comments and use boolean states true/false instead of
an integer since a bool is all that is needed.

Jacob fixed the TLV format for little endian structures which are 4 byte
aligned copy, so add an additional __aligned(4) and __packed to ensure
that these structures are actually 4 byte aligned and packed correctly.
Updated the driver to use ether_addr_equal() instead of memcmp() to
compare MAC addresses.

Alex Duyck cleans up the exception handling so all of the paths result in
a similar state if we fail.  Specifically the driver will now unload the
mailbox interrupt, free the queue vectors and MSI-X, and then detach the
interface.

The following are changes since commit 076ef440708bc28d821cebb2dbca64e3c917ac73:
  ibmveth: consolidate kmalloc of array, memset 0 to kcalloc
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Alexander Duyck (1):
  fm10k: Cleanup exception handling for changing queues

Bruce Allan (7):
  fm10k: don't initialize fm10k_workqueue at global level
  fm10k: address operator not needed when declaring function pointers
  fm10k: constify fm10k_mac_ops, fm10k_iov_ops and fm10k_info structures
  fm10k: remove unused struct element
  fm10k: use true/false for boolean get_host_state
  fm10k: cleanup mailbox code comments etc
  fm10k: IS_ENABLED() is not appropriate for boolean kconfig option

Jacob Keller (2):
  fm10k: correctly pack TLV structures and explain reasoning
  fm10k: use ether_addr_equal instead of memcmp

 drivers/net/ethernet/intel/fm10k/fm10k_main.c   |  6 +--
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.c| 50 +++
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.h|  4 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 28 ---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c| 61 +--
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 66 -
 drivers/net/ethernet/intel/fm10k/fm10k_pf.h | 15 --
 drivers/net/ethernet/intel/fm10k/fm10k_tlv.c|  2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_tlv.h|  4 +-
 drivers/net/ethernet/intel/fm10k/fm10k_type.h   |  9 ++--
 drivers/net/ethernet/intel/fm10k/fm10k_vf.c | 44 -
 drivers/net/ethernet/intel/fm10k/fm10k_vf.h |  2 +-
 12 files changed, 173 insertions(+), 118 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 01/10] fm10k: don't initialize fm10k_workqueue at global level

2015-12-22 Thread Jeff Kirsher
From: Bruce Allan 

Cleans up checkpatch GLOBAL_INITIALIZERS error

Signed-off-by: Bruce Allan 
Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 75ff109..b243c3c 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -42,7 +42,7 @@ MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_VERSION);
 
 /* single workqueue for entire fm10k driver */
-struct workqueue_struct *fm10k_workqueue = NULL;
+struct workqueue_struct *fm10k_workqueue;
 
 /**
  * fm10k_init_module - Driver Registration Routine
@@ -56,8 +56,7 @@ static int __init fm10k_init_module(void)
pr_info("%s\n", fm10k_copyright);
 
/* create driver workqueue */
-   if (!fm10k_workqueue)
-   fm10k_workqueue = create_workqueue("fm10k");
+   fm10k_workqueue = create_workqueue("fm10k");
 
fm10k_dbg_init();
 
@@ -80,7 +79,6 @@ static void __exit fm10k_exit_module(void)
/* destroy driver workqueue */
flush_workqueue(fm10k_workqueue);
destroy_workqueue(fm10k_workqueue);
-   fm10k_workqueue = NULL;
 }
 module_exit(fm10k_exit_module);
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 07/10] fm10k: remove unused struct element

2015-12-22 Thread Jeff Kirsher
From: Bruce Allan 

Signed-off-by: Bruce Allan 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_type.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_type.h 
b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
index bc27c75..854ebb1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_type.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
@@ -550,7 +550,6 @@ struct fm10k_mac_ops {
struct fm10k_dglort_cfg *);
void (*set_dma_mask)(struct fm10k_hw *, u64);
s32 (*get_fault)(struct fm10k_hw *, int, struct fm10k_fault *);
-   void (*request_lport_map)(struct fm10k_hw *);
s32 (*adjust_systime)(struct fm10k_hw *, s32 ppb);
u64 (*read_systime)(struct fm10k_hw *);
 };
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 08/10] fm10k: use true/false for boolean get_host_state

2015-12-22 Thread Jeff Kirsher
From: Bruce Allan 

Signed-off-by: Bruce Allan 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 9c6ed88..4eb7a6f 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -912,7 +912,7 @@ static irqreturn_t fm10k_msix_mbx_vf(int __always_unused 
irq, void *data)
fm10k_mbx_unlock(interface);
}
 
-   hw->mac.get_host_state = 1;
+   hw->mac.get_host_state = true;
fm10k_service_event_schedule(interface);
 
return IRQ_HANDLED;
@@ -1128,7 +1128,7 @@ static irqreturn_t fm10k_msix_mbx_pf(int __always_unused 
irq, void *data)
}
 
/* we should validate host state after interrupt event */
-   hw->mac.get_host_state = 1;
+   hw->mac.get_host_state = true;
 
/* validate host state, and handle VF mailboxes in the service task */
fm10k_service_event_schedule(interface);
@@ -1635,7 +1635,7 @@ void fm10k_up(struct fm10k_intfc *interface)
netif_tx_start_all_queues(interface->netdev);
 
/* kick off the service timer now */
-   hw->mac.get_host_state = 1;
+   hw->mac.get_host_state = true;
mod_timer(&interface->service_timer, jiffies);
 }
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 03/10] fm10k: Cleanup exception handling for changing queues

2015-12-22 Thread Jeff Kirsher
From: Alexander Duyck 

This patch is meant to cleanup the exception handling for the paths where
we reset the interrupts and then reconfigure them.  In all of these paths
we had very different levels of exception handling.  I have updated the
driver so that all of the paths should result in a similar state if we
fail.

Specifically the driver will now unload the mailbox interrupt, free the
queue vectors and MSI-X, and then detach the interface.

In addition for any of the PCIe related resets I have added a check with
the hw_ready function to just make sure the registers are in a readable
state prior to reopening the interface.

Signed-off-by: Alexander Duyck 
Reviewed-by: Bruce Allan 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 22 --
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c| 53 +++--
 2 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 83ddf36..6fdb782 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1153,6 +1153,7 @@ static struct rtnl_link_stats64 *fm10k_get_stats64(struct 
net_device *netdev,
 int fm10k_setup_tc(struct net_device *dev, u8 tc)
 {
struct fm10k_intfc *interface = netdev_priv(dev);
+   int err;
 
/* Currently only the PF supports priority classes */
if (tc && (interface->hw.mac.type != fm10k_mac_pf))
@@ -1177,17 +1178,30 @@ int fm10k_setup_tc(struct net_device *dev, u8 tc)
netdev_reset_tc(dev);
netdev_set_num_tc(dev, tc);
 
-   fm10k_init_queueing_scheme(interface);
+   err = fm10k_init_queueing_scheme(interface);
+   if (err)
+   goto err_queueing_scheme;
 
-   fm10k_mbx_request_irq(interface);
+   err = fm10k_mbx_request_irq(interface);
+   if (err)
+   goto err_mbx_irq;
 
-   if (netif_running(dev))
-   fm10k_open(dev);
+   err = netif_running(dev) ? fm10k_open(dev) : 0;
+   if (err)
+   goto err_open;
 
/* flag to indicate SWPRI has yet to be updated */
interface->flags |= FM10K_FLAG_SWPRI_CONFIG;
 
return 0;
+err_open:
+   fm10k_mbx_free_irq(interface);
+err_mbx_irq:
+   fm10k_clear_queueing_scheme(interface);
+err_queueing_scheme:
+   netif_device_detach(dev);
+
+   return err;
 }
 
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 020f6dc..202468f 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -186,7 +186,13 @@ static void fm10k_reinit(struct fm10k_intfc *interface)
}
 
/* reassociate interrupts */
-   fm10k_mbx_request_irq(interface);
+   err = fm10k_mbx_request_irq(interface);
+   if (err)
+   goto err_mbx_irq;
+
+   err = fm10k_hw_ready(interface);
+   if (err)
+   goto err_open;
 
/* update hardware address for VFs if perm_addr has changed */
if (hw->mac.type == fm10k_mac_vf) {
@@ -206,14 +212,23 @@ static void fm10k_reinit(struct fm10k_intfc *interface)
/* reset clock */
fm10k_ts_reset(interface);
 
-   if (netif_running(netdev))
-   fm10k_open(netdev);
+   err = netif_running(netdev) ? fm10k_open(netdev) : 0;
+   if (err)
+   goto err_open;
 
fm10k_iov_resume(interface->pdev);
 
+   rtnl_unlock();
+
+   clear_bit(__FM10K_RESETTING, &interface->state);
+
+   return;
+err_open:
+   fm10k_mbx_free_irq(interface);
+err_mbx_irq:
+   fm10k_clear_queueing_scheme(interface);
 reinit_err:
-   if (err)
-   netif_device_detach(netdev);
+   netif_device_detach(netdev);
 
rtnl_unlock();
 
@@ -2131,16 +2146,22 @@ static int fm10k_resume(struct pci_dev *pdev)
rtnl_lock();
 
err = fm10k_init_queueing_scheme(interface);
-   if (!err) {
-   fm10k_mbx_request_irq(interface);
-   if (netif_running(netdev))
-   err = fm10k_open(netdev);
-   }
+   if (err)
+   goto err_queueing_scheme;
 
-   rtnl_unlock();
+   err = fm10k_mbx_request_irq(interface);
+   if (err)
+   goto err_mbx_irq;
 
+   err = fm10k_hw_ready(interface);
if (err)
-   return err;
+   goto err_open;
+
+   err = netif_running(netdev) ? fm10k_open(netdev) : 0;
+   if (err)
+   goto err_open;
+
+   rtnl_unlock();
 
/* assume host is not ready, to prevent race with watchdog in case we
 * actually don't have connection to the switch
@@ -2158,6 +2179,14 @@ static int fm10k_resume(struct pci_dev *pdev)
netif_device_attach(netdev);
 
ret

[net-next 02/10] fm10k: correctly pack TLV structures and explain reasoning

2015-12-22 Thread Jeff Kirsher
From: Jacob Keller 

The TLV format for little endian structures is actually 4 byte aligned
copy. To this end, we need to add an additional __aligned(4) marker
along with __packed to ensure that these structures are actually 4 byte
aligned and packed correctly. Use of just __packed will not work as this
will result in 1byte alignment which is incorrect. Add a comment
explaining the reasoning behind why these structures need the special
treatment.

Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.h | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
index a8fc512..3378592 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
@@ -74,6 +74,11 @@ enum fm10k_pf_tlv_attr_id_v1 {
 #define FM10K_MSG_UPDATE_PVID_PVID_SHIFT   16
 #define FM10K_MSG_UPDATE_PVID_PVID_SIZE16
 
+/* The following data structures are overlayed directly onto TLV mailbox
+ * messages, and must not break 4 byte alignment. Ensure the structures line
+ * up correctly as per their TLV definition.
+ */
+
 struct fm10k_mac_update {
__le32  mac_lower;
__le16  mac_upper;
@@ -81,26 +86,26 @@ struct fm10k_mac_update {
__le16  glort;
u8  flags;
u8  action;
-} __packed;
+} __aligned(4) __packed;
 
 struct fm10k_global_table_data {
__le32  used;
__le32  avail;
-} __packed;
+} __aligned(4) __packed;
 
 struct fm10k_swapi_error {
__le32  status;
struct fm10k_global_table_data  mac;
struct fm10k_global_table_data  nexthop;
struct fm10k_global_table_data  ffu;
-} __packed;
+} __aligned(4) __packed;
 
 struct fm10k_swapi_1588_timestamp {
__le64 egress;
__le64 ingress;
__le16 dglort;
__le16 sglort;
-} __packed;
+} __aligned(4) __packed;
 
 s32 fm10k_msg_lport_map_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *);
 extern const struct fm10k_tlv_attr fm10k_lport_map_msg_attr[];
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 10/10] fm10k: IS_ENABLED() is not appropriate for boolean kconfig option

2015-12-22 Thread Jeff Kirsher
From: Bruce Allan 

Tri-states need 'if IS_ENABLED()', booleans should use 'ifdef'.

Signed-off-by: Bruce Allan 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 6fdb782..662569d 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -20,7 +20,7 @@
 
 #include "fm10k.h"
 #include 
-#if IS_ENABLED(CONFIG_FM10K_VXLAN)
+#ifdef CONFIG_FM10K_VXLAN
 #include 
 #endif /* CONFIG_FM10K_VXLAN */
 
@@ -556,11 +556,11 @@ int fm10k_open(struct net_device *netdev)
if (err)
goto err_set_queues;
 
-#if IS_ENABLED(CONFIG_FM10K_VXLAN)
+#ifdef CONFIG_FM10K_VXLAN
/* update VXLAN port configuration */
vxlan_get_rx_port(netdev);
-
 #endif
+
fm10k_up(interface);
 
return 0;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 05/10] fm10k: address operator not needed when declaring function pointers

2015-12-22 Thread Jeff Kirsher
From: Bruce Allan 

Signed-off-by: Bruce Allan 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 58 ++---
 drivers/net/ethernet/intel/fm10k/fm10k_vf.c | 38 +--
 2 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 7dd7ca8..606c0f1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1867,38 +1867,38 @@ static const struct fm10k_msg_data fm10k_msg_data_pf[] 
= {
 };
 
 static struct fm10k_mac_ops mac_ops_pf = {
-   .get_bus_info   = &fm10k_get_bus_info_generic,
-   .reset_hw   = &fm10k_reset_hw_pf,
-   .init_hw= &fm10k_init_hw_pf,
-   .start_hw   = &fm10k_start_hw_generic,
-   .stop_hw= &fm10k_stop_hw_generic,
-   .update_vlan= &fm10k_update_vlan_pf,
-   .read_mac_addr  = &fm10k_read_mac_addr_pf,
-   .update_uc_addr = &fm10k_update_uc_addr_pf,
-   .update_mc_addr = &fm10k_update_mc_addr_pf,
-   .update_xcast_mode  = &fm10k_update_xcast_mode_pf,
-   .update_int_moderator   = &fm10k_update_int_moderator_pf,
-   .update_lport_state = &fm10k_update_lport_state_pf,
-   .update_hw_stats= &fm10k_update_hw_stats_pf,
-   .rebind_hw_stats= &fm10k_rebind_hw_stats_pf,
-   .configure_dglort_map   = &fm10k_configure_dglort_map_pf,
-   .set_dma_mask   = &fm10k_set_dma_mask_pf,
-   .get_fault  = &fm10k_get_fault_pf,
-   .get_host_state = &fm10k_get_host_state_pf,
-   .adjust_systime = &fm10k_adjust_systime_pf,
-   .read_systime   = &fm10k_read_systime_pf,
+   .get_bus_info   = fm10k_get_bus_info_generic,
+   .reset_hw   = fm10k_reset_hw_pf,
+   .init_hw= fm10k_init_hw_pf,
+   .start_hw   = fm10k_start_hw_generic,
+   .stop_hw= fm10k_stop_hw_generic,
+   .update_vlan= fm10k_update_vlan_pf,
+   .read_mac_addr  = fm10k_read_mac_addr_pf,
+   .update_uc_addr = fm10k_update_uc_addr_pf,
+   .update_mc_addr = fm10k_update_mc_addr_pf,
+   .update_xcast_mode  = fm10k_update_xcast_mode_pf,
+   .update_int_moderator   = fm10k_update_int_moderator_pf,
+   .update_lport_state = fm10k_update_lport_state_pf,
+   .update_hw_stats= fm10k_update_hw_stats_pf,
+   .rebind_hw_stats= fm10k_rebind_hw_stats_pf,
+   .configure_dglort_map   = fm10k_configure_dglort_map_pf,
+   .set_dma_mask   = fm10k_set_dma_mask_pf,
+   .get_fault  = fm10k_get_fault_pf,
+   .get_host_state = fm10k_get_host_state_pf,
+   .adjust_systime = fm10k_adjust_systime_pf,
+   .read_systime   = fm10k_read_systime_pf,
 };
 
 static struct fm10k_iov_ops iov_ops_pf = {
-   .assign_resources   = &fm10k_iov_assign_resources_pf,
-   .configure_tc   = &fm10k_iov_configure_tc_pf,
-   .assign_int_moderator   = &fm10k_iov_assign_int_moderator_pf,
+   .assign_resources   = fm10k_iov_assign_resources_pf,
+   .configure_tc   = fm10k_iov_configure_tc_pf,
+   .assign_int_moderator   = fm10k_iov_assign_int_moderator_pf,
.assign_default_mac_vlan= fm10k_iov_assign_default_mac_vlan_pf,
-   .reset_resources= &fm10k_iov_reset_resources_pf,
-   .set_lport  = &fm10k_iov_set_lport_pf,
-   .reset_lport= &fm10k_iov_reset_lport_pf,
-   .update_stats   = &fm10k_iov_update_stats_pf,
-   .report_timestamp   = &fm10k_iov_report_timestamp_pf,
+   .reset_resources= fm10k_iov_reset_resources_pf,
+   .set_lport  = fm10k_iov_set_lport_pf,
+   .reset_lport= fm10k_iov_reset_lport_pf,
+   .update_stats   = fm10k_iov_update_stats_pf,
+   .report_timestamp   = fm10k_iov_report_timestamp_pf,
 };
 
 static s32 fm10k_get_invariants_pf(struct fm10k_hw *hw)
@@ -1910,7 +1910,7 @@ static s32 fm10k_get_invariants_pf(struct fm10k_hw *hw)
 
 struct fm10k_info fm10k_pf_info = {
.mac= fm10k_mac_pf,
-   .get_invariants = &fm10k_get_invariants_pf,
+   .get_invariants = fm10k_get_invariants_pf,
.mac_ops= &mac_ops_pf,
.iov_ops= &iov_ops_pf,
 };
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
index f1dc6e8..38219f5 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
@@ -563,24 +563,24 @@ static const str

[net-next 04/10] fm10k: use ether_addr_equal instead of memcmp

2015-12-22 Thread Jeff Kirsher
From: Jacob Keller 

When comparing MAC addresses, use ether_addr_equal instead of memcmp to
ETH_ALEN length. Found and replaced using the following sed:

 sed -e 's/memcmp\x28\(.*\), ETH_ALEN\x29/!ether_addr_equal\x28\1\x29/'

Reported-by: Bruce Allan 
Signed-off-by: Jacob Keller 
Reviewed-by: Bruce Allan 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c  | 2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_tlv.c | 2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_vf.c  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 202468f..9c6ed88 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1191,7 +1191,7 @@ static s32 fm10k_mbx_mac_addr(struct fm10k_hw *hw, u32 
**results,
 
/* MAC was changed so we need reset */
if (is_valid_ether_addr(hw->mac.perm_addr) &&
-   memcmp(hw->mac.perm_addr, hw->mac.addr, ETH_ALEN))
+   !ether_addr_equal(hw->mac.perm_addr, hw->mac.addr))
interface->flags |= FM10K_FLAG_RESET_REQUESTED;
 
/* VLAN override was changed, or default VLAN changed */
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 808307e..7dd7ca8 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1250,7 +1250,7 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *hw, u32 
**results,
 
/* block attempts to set MAC for a locked device */
if (is_valid_ether_addr(vf_info->mac) &&
-   memcmp(mac, vf_info->mac, ETH_ALEN))
+   !ether_addr_equal(mac, vf_info->mac))
return FM10K_ERR_PARAM;
 
set = !(vlan & FM10K_VLAN_CLEAR);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_tlv.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_tlv.c
index 95afb5c..ab01bb3 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_tlv.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_tlv.c
@@ -755,7 +755,7 @@ parse_nested:
err = fm10k_tlv_attr_get_mac_vlan(
results[FM10K_TEST_MSG_MAC_ADDR],
result_mac, &result_vlan);
-   if (!err && memcmp(test_mac, result_mac, ETH_ALEN))
+   if (!err && !ether_addr_equal(test_mac, result_mac))
err = FM10K_ERR_INVALID_VALUE;
if (!err && test_vlan != result_vlan)
err = FM10K_ERR_INVALID_VALUE;
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
index 5445c0f..f1dc6e8 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
@@ -298,7 +298,7 @@ static s32 fm10k_update_uc_addr_vf(struct fm10k_hw *hw, u16 
glort,
 
/* verify we are not locked down on the MAC address */
if (is_valid_ether_addr(hw->mac.perm_addr) &&
-   memcmp(hw->mac.perm_addr, mac, ETH_ALEN))
+   !ether_addr_equal(hw->mac.perm_addr, mac))
return FM10K_ERR_PARAM;
 
/* add bit to notify us if this is a set or clear operation */
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 06/10] fm10k: constify fm10k_mac_ops, fm10k_iov_ops and fm10k_info structures

2015-12-22 Thread Jeff Kirsher
From: Bruce Allan 

These structures never change so declare them as const.

Signed-off-by: Bruce Allan 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c   | 6 +++---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.h   | 2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_type.h | 8 
 drivers/net/ethernet/intel/fm10k/fm10k_vf.c   | 4 ++--
 drivers/net/ethernet/intel/fm10k/fm10k_vf.h   | 2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 606c0f1..62ccebc 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1866,7 +1866,7 @@ static const struct fm10k_msg_data fm10k_msg_data_pf[] = {
FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error),
 };
 
-static struct fm10k_mac_ops mac_ops_pf = {
+static const struct fm10k_mac_ops mac_ops_pf = {
.get_bus_info   = fm10k_get_bus_info_generic,
.reset_hw   = fm10k_reset_hw_pf,
.init_hw= fm10k_init_hw_pf,
@@ -1889,7 +1889,7 @@ static struct fm10k_mac_ops mac_ops_pf = {
.read_systime   = fm10k_read_systime_pf,
 };
 
-static struct fm10k_iov_ops iov_ops_pf = {
+static const struct fm10k_iov_ops iov_ops_pf = {
.assign_resources   = fm10k_iov_assign_resources_pf,
.configure_tc   = fm10k_iov_configure_tc_pf,
.assign_int_moderator   = fm10k_iov_assign_int_moderator_pf,
@@ -1908,7 +1908,7 @@ static s32 fm10k_get_invariants_pf(struct fm10k_hw *hw)
return fm10k_sm_mbx_init(hw, &hw->mbx, fm10k_msg_data_pf);
 }
 
-struct fm10k_info fm10k_pf_info = {
+const struct fm10k_info fm10k_pf_info = {
.mac= fm10k_mac_pf,
.get_invariants = fm10k_get_invariants_pf,
.mac_ops= &mac_ops_pf,
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
index 3378592..b2d96b4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
@@ -133,5 +133,5 @@ s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *, u32 **,
 s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *, u32 **,
 struct fm10k_mbx_info *);
 
-extern struct fm10k_info fm10k_pf_info;
+extern const struct fm10k_info fm10k_pf_info;
 #endif /* _FM10K_PF_H */
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_type.h 
b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
index 098883d..bc27c75 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_type.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
@@ -660,10 +660,10 @@ enum fm10k_devices {
 };
 
 struct fm10k_info {
-   enum fm10k_mac_type mac;
-   s32 (*get_invariants)(struct fm10k_hw *);
-   struct fm10k_mac_ops*mac_ops;
-   struct fm10k_iov_ops*iov_ops;
+   enum fm10k_mac_type mac;
+   s32 (*get_invariants)(struct fm10k_hw *);
+   const struct fm10k_mac_ops  *mac_ops;
+   const struct fm10k_iov_ops  *iov_ops;
 };
 
 struct fm10k_hw {
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
index 38219f5..91f8d73 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
@@ -562,7 +562,7 @@ static const struct fm10k_msg_data fm10k_msg_data_vf[] = {
FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error),
 };
 
-static struct fm10k_mac_ops mac_ops_vf = {
+static const struct fm10k_mac_ops mac_ops_vf = {
.get_bus_info   = fm10k_get_bus_info_generic,
.reset_hw   = fm10k_reset_hw_vf,
.init_hw= fm10k_init_hw_vf,
@@ -590,7 +590,7 @@ static s32 fm10k_get_invariants_vf(struct fm10k_hw *hw)
return fm10k_pfvf_mbx_init(hw, &hw->mbx, fm10k_msg_data_vf, 0);
 }
 
-struct fm10k_info fm10k_vf_info = {
+const struct fm10k_info fm10k_vf_info = {
.mac= fm10k_mac_vf,
.get_invariants = fm10k_get_invariants_vf,
.mac_ops= &mac_ops_vf,
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.h 
b/drivers/net/ethernet/intel/fm10k/fm10k_vf.h
index 06a99d7..c4439f1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.h
@@ -74,5 +74,5 @@ extern const struct fm10k_tlv_attr fm10k_1588_msg_attr[];
 #define FM10K_VF_MSG_1588_HANDLER(func) \
FM10K_MSG_HANDLER(FM10K_VF_MSG_ID_1588, fm10k_1588_msg_attr, func)
 
-extern struct fm10k_info fm10k_vf_info;
+extern const struct fm10k_info fm10k_vf_info;
 #endif /* _FM10K_VF_H */
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 09/10] fm10k: cleanup mailbox code comments etc

2015-12-22 Thread Jeff Kirsher
From: Bruce Allan 

Cleanup a number of issues with function header comments, lower-case
acronyms (i.e. FIFO, TLV), duplicate comments and a stubbed-out header
comment for fm10k_sm_mbx_init.

Signed-off-by: Bruce Allan 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.c | 50 +---
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.h |  4 +--
 drivers/net/ethernet/intel/fm10k/fm10k_tlv.h |  4 +--
 3 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
index c7fea47..98202c3 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
@@ -57,7 +57,7 @@ static u16 fm10k_fifo_unused(struct fm10k_mbx_fifo *fifo)
 }
 
 /**
- *  fm10k_fifo_empty - Test to verify if fifo is empty
+ *  fm10k_fifo_empty - Test to verify if FIFO is empty
  *  @fifo: pointer to FIFO
  *
  *  This function returns true if the FIFO is empty, else false
@@ -72,7 +72,7 @@ static bool fm10k_fifo_empty(struct fm10k_mbx_fifo *fifo)
  *  @fifo: pointer to FIFO
  *  @offset: offset to add to head
  *
- *  This function returns the indices into the fifo based on head + offset
+ *  This function returns the indices into the FIFO based on head + offset
  **/
 static u16 fm10k_fifo_head_offset(struct fm10k_mbx_fifo *fifo, u16 offset)
 {
@@ -84,7 +84,7 @@ static u16 fm10k_fifo_head_offset(struct fm10k_mbx_fifo 
*fifo, u16 offset)
  *  @fifo: pointer to FIFO
  *  @offset: offset to add to tail
  *
- *  This function returns the indices into the fifo based on tail + offset
+ *  This function returns the indices into the FIFO based on tail + offset
  **/
 static u16 fm10k_fifo_tail_offset(struct fm10k_mbx_fifo *fifo, u16 offset)
 {
@@ -160,7 +160,7 @@ static u16 fm10k_mbx_index_len(struct fm10k_mbx_info *mbx, 
u16 head, u16 tail)
 /**
  *  fm10k_mbx_tail_add - Determine new tail value with added offset
  *  @mbx: pointer to mailbox
- *  @offset: length to add to head offset
+ *  @offset: length to add to tail offset
  *
  *  This function takes the local tail index and recomputes it for
  *  a given length added as an offset.
@@ -176,7 +176,7 @@ static u16 fm10k_mbx_tail_add(struct fm10k_mbx_info *mbx, 
u16 offset)
 /**
  *  fm10k_mbx_tail_sub - Determine new tail value with subtracted offset
  *  @mbx: pointer to mailbox
- *  @offset: length to add to head offset
+ *  @offset: length to add to tail offset
  *
  *  This function takes the local tail index and recomputes it for
  *  a given length added as an offset.
@@ -240,7 +240,7 @@ static u16 fm10k_mbx_pushed_tail_len(struct fm10k_mbx_info 
*mbx)
 }
 
 /**
- *  fm10k_fifo_write_copy - pulls data off of msg and places it in fifo
+ *  fm10k_fifo_write_copy - pulls data off of msg and places it in FIFO
  *  @fifo: pointer to FIFO
  *  @msg: message array to populate
  *  @tail_offset: additional offset to add to tail pointer
@@ -336,6 +336,7 @@ static u16 fm10k_mbx_validate_msg_size(struct 
fm10k_mbx_info *mbx, u16 len)
 
 /**
  *  fm10k_mbx_write_copy - pulls data off of Tx FIFO and places it in mbmem
+ *  @hw: pointer to hardware structure
  *  @mbx: pointer to mailbox
  *
  *  This function will take a section of the Tx FIFO and copy it into the
@@ -711,7 +712,7 @@ static bool fm10k_mbx_tx_complete(struct fm10k_mbx_info 
*mbx)
  *  @hw: pointer to hardware structure
  *  @mbx: pointer to mailbox
  *
- *  This function dequeues messages and hands them off to the tlv parser.
+ *  This function dequeues messages and hands them off to the TLV parser.
  *  It will return the number of messages processed when called.
  **/
 static u16 fm10k_mbx_dequeue_rx(struct fm10k_hw *hw,
@@ -924,7 +925,7 @@ static void fm10k_mbx_create_fake_disconnect_hdr(struct 
fm10k_mbx_info *mbx)
 }
 
 /**
- *  fm10k_mbx_create_error_msg - Generate a error message
+ *  fm10k_mbx_create_error_msg - Generate an error message
  *  @mbx: pointer to mailbox
  *  @err: local error encountered
  *
@@ -957,7 +958,6 @@ static void fm10k_mbx_create_error_msg(struct 
fm10k_mbx_info *mbx, s32 err)
 /**
  *  fm10k_mbx_validate_msg_hdr - Validate common fields in the message header
  *  @mbx: pointer to mailbox
- *  @msg: message array to read
  *
  *  This function will parse up the fields in the mailbox header and return
  *  an error if the header contains any of a number of invalid configurations
@@ -1021,11 +1021,12 @@ static s32 fm10k_mbx_validate_msg_hdr(struct 
fm10k_mbx_info *mbx)
 
 /**
  *  fm10k_mbx_create_reply - Generate reply based on state and remote head
+ *  @hw: pointer to hardware structure
  *  @mbx: pointer to mailbox
  *  @head: acknowledgement number
  *
  *  This function will generate an outgoing message based on the current
- *  mailbox state and the remote fifo head.  It will return the length
+ *  mailbox state and the remote FIFO head.  It will return the length
  *  of the outgoing message ex

Re: [RFC PATCH 16/17] calipso: Add validation of CALIPSO option.

2015-12-22 Thread Hannes Frederic Sowa
On 22.12.2015 12:46, Huw Davies wrote:
>  
> +/* CALIPSO RFC 5570 */
> +
> +static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> +{
> + const unsigned char *nh = skb_network_header(skb);
> +
> + if (nh[optoff + 1] < 8)
> + goto drop;
> +
> + if (nh[optoff + 6] * 4 + 8 > nh[optoff + 1])
> + goto drop;
> +
> + if (!calipso_validate(skb, nh + optoff))
> + goto drop;
> +
> + return true;
> +
> +drop:
> + kfree_skb(skb);
> + return false;
> +}
> +

Formally, if an extension header could not be processed, the packet
should be discarded and an icmp error parameter extension should be
send. I think we shouldn't let those packets pass here.

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 13/17] calipso: Allow request sockets to be relabelled by the lsm.

2015-12-22 Thread Hannes Frederic Sowa
On 22.12.2015 12:46, Huw Davies wrote:
>   tot_len += sizeof(*opt2);
> - opt2 = sock_kmalloc(sk, tot_len, GFP_ATOMIC);
> + if (sk)
> + opt2 = sock_kmalloc(sk, tot_len, GFP_ATOMIC);
> + else
> + opt2 = kmalloc(tot_len, GFP_ATOMIC);
>   if (!opt2)
>   return ERR_PTR(-ENOBUFS);

This change looks dangerous to me in terms of control of memory
depletion from a remote host. Could you use sk_to_full_sk and account
options towards the listener socket?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 08/17] ipv6: Add ipv6_renew_options_kern() that accepts a kernel mem pointer.

2015-12-22 Thread Hannes Frederic Sowa
On 22.12.2015 12:46, Huw Davies wrote:
> The functionality is equivalent to ipv6_renew_options() except
> that the newopt pointer is in kernel, not user, memory
> 
> The kernel memory implementation will be used by the CALIPSO network
> labelling engine, which needs to be able to set IPv6 hop-by-hop
> options.
> 
> Signed-off-by: Huw Davies 
> ---
>  include/net/ipv6.h |   6 +++
>  net/ipv6/exthdrs.c | 131 
> -
>  2 files changed, 125 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 9a5c9f0..5a72ffd 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -304,6 +304,12 @@ struct ipv6_txoptions *ipv6_renew_options(struct sock 
> *sk,
> int newtype,
> struct ipv6_opt_hdr __user *newopt,
> int newoptlen);
> +struct ipv6_txoptions *
> +ipv6_renew_options_kern(struct sock *sk,
> + struct ipv6_txoptions *opt,
> + int newtype,
> + struct ipv6_opt_hdr *newopt,
> + int newoptlen);
>  struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
> struct ipv6_txoptions *opt);
>  
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index ea7c4d6..9426b26 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -734,11 +734,16 @@ ipv6_dup_options(struct sock *sk, struct ipv6_txoptions 
> *opt)
>  EXPORT_SYMBOL_GPL(ipv6_dup_options);
>  
>  static int ipv6_renew_option(void *ohdr,
> -  struct ipv6_opt_hdr __user *newopt, int newoptlen,
> +  struct ipv6_opt_hdr __user *newopt_user,
> +  struct ipv6_opt_hdr *newopt,
> +  int newoptlen,
>int inherit,
>struct ipv6_opt_hdr **hdr,
>char **p)


This looks quite ugly to me.

Wouldn't it be possible to do something like this:


ipv6_renew_option_kern(...)
{
int ret;
const mm_segment_t old_fs = get_fs();
set_fs(KERNEL_DS);
ret = ipv6_renew_option(...); // maybe you need to forcefully cast the
user away here
set_fs(old_fs);
return ret;
}

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] net: phy: adds backplane driver for Freescale's PCS PHY

2015-12-22 Thread Shaohui Xie
> > I did missed the device tree binding documentation.
> > This driver expected a property "lane-instance" in mdio bus node, and
> > "lane-handle" and "lane-range" properties in phy node.
> >
> > The "lane-instance" indicates what the phy should be probed as,
> > 1000BASE-KX or 10GBASE-KR, seems phy node is a better place than mdio
> > bus node to hold this property, maybe a better name "phy-mode" should
> be used?
> 
> Ideally you want all the properties in the phy node. It can get
> complicated, if you have an mdio mux in the chain. Extending phy-mode
> would make, rather than adding a new property.
[S.H] Yes, phy-mode is a better choice.

> 
> >
> > The "lane-handle" pointed to a serdes node which looks like below:
> > E.g. in arch/powerpc/boot/dts/fsl/t4240si-post.dtsi:
> >
> > serdes: serdes@ea000 {
> > compatible = "fsl,t4240-serdes";
> > reg= <0xea000 0x4000>;
> > };
> >
> > The "lane-handle" property would be: lane-handle = <&serdes>;
> 
> There does not appear to be a driver which uses fsl,t4240-serdes?  Is
> there a driver for it? Ah, this is the driver right? It maps the io space
> and uses the registers in that space.
[S.H] There is no driver for fsl,t4240-serdes or any serdes, there are 
many SoCs have a serdes node (grep -r serdes arch/powerpc/boot/dts/).
Only some SoCs can support 1G-KX or 10G-KR, e.g. T4240, T2080, T1024, T1040.
Serdes may be different on different SoCs.

The phy driver needs to configure lane control registers at runtime 
to tune lane signal, but lane control registers are only a small part 
of serdes, there are many other registers for different functions.
It's not a driver for serdes.

> 
> So there is an architectural question. Should there be a separate serdes
> driver, or is it O.K. to include the serdes driver within the phy driver?
[S.H] A serdes driver might be too much, the phy driver only needs to use
Some lane control registers.

> 
> Is the PHY embedded inside the Soc? Or is it discrete? Could the same phy
> be used with a different MAC/serdes interface?
[S.H] Yes, the PHY embedded inside the SoC, each MAC has an internal MDIO 
Controller to access the internal PCS PHY, the same phy cannot be used with
a different MAC/serdes interface.

Thanks,
Shaohui
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 02/17] netlabel: Add an address family to domain hash entries.

2015-12-22 Thread Huw Davies
The reason is to allow different labelling protocols for
different address families with the same domain.

This requires the addition of an address family attribute
in the netlink communication protocol.  It is used in several
messages:

NLBL_MGMT_C_ADD and NLBL_MGMT_C_ADDDEF take it as an optional
attribute for the unlabelled protocol.  It may be one of AF_INET,
AF_INET6 and AF_UNSPEC (to specify both address families).  If is
it missing, it defaults to AF_UNSPEC.

NLBL_MGMT_C_LISTALL and NLBL_MGMT_C_LISTDEF return it as part of
the enumeration of each item.  Addtionally, it may be sent to
LISTDEF to specify which address family to return.

Signed-off-by: Huw Davies 
---
 net/netlabel/netlabel_domainhash.c | 173 -
 net/netlabel/netlabel_domainhash.h |   8 +-
 net/netlabel/netlabel_kapi.c   |   6 +-
 net/netlabel/netlabel_mgmt.c   |  29 ++-
 net/netlabel/netlabel_mgmt.h   |  24 +++--
 net/netlabel/netlabel_unlabeled.c  |   1 +
 6 files changed, 186 insertions(+), 55 deletions(-)

diff --git a/net/netlabel/netlabel_domainhash.c 
b/net/netlabel/netlabel_domainhash.c
index d4d6640..b14a1ed 100644
--- a/net/netlabel/netlabel_domainhash.c
+++ b/net/netlabel/netlabel_domainhash.c
@@ -56,7 +56,8 @@ static DEFINE_SPINLOCK(netlbl_domhsh_lock);
 #define netlbl_domhsh_rcu_deref(p) \
rcu_dereference_check(p, lockdep_is_held(&netlbl_domhsh_lock))
 static struct netlbl_domhsh_tbl __rcu *netlbl_domhsh;
-static struct netlbl_dom_map __rcu *netlbl_domhsh_def;
+static struct netlbl_dom_map __rcu *netlbl_domhsh_def_ipv4;
+static struct netlbl_dom_map __rcu *netlbl_domhsh_def_ipv6;
 
 /*
  * Domain Hash Table Helper Functions
@@ -126,18 +127,26 @@ static u32 netlbl_domhsh_hash(const char *key)
return val & (netlbl_domhsh_rcu_deref(netlbl_domhsh)->size - 1);
 }
 
+static bool netlbl_family_match(u16 f1, u16 f2)
+{
+   return (f1 == f2) || (f1 == AF_UNSPEC) || (f2 == AF_UNSPEC);
+}
+
 /**
  * netlbl_domhsh_search - Search for a domain entry
  * @domain: the domain
+ * @family: the address family
  *
  * Description:
  * Searches the domain hash table and returns a pointer to the hash table
- * entry if found, otherwise NULL is returned.  The caller is responsible for
+ * entry if found, otherwise NULL is returned.  @family may be %AF_UNSPEC
+ * which matches any address family entries.  The caller is responsible for
  * ensuring that the hash table is protected with either a RCU read lock or the
  * hash table lock.
  *
  */
-static struct netlbl_dom_map *netlbl_domhsh_search(const char *domain)
+static struct netlbl_dom_map *netlbl_domhsh_search(const char *domain,
+  u16 family)
 {
u32 bkt;
struct list_head *bkt_list;
@@ -147,7 +156,9 @@ static struct netlbl_dom_map *netlbl_domhsh_search(const 
char *domain)
bkt = netlbl_domhsh_hash(domain);
bkt_list = &netlbl_domhsh_rcu_deref(netlbl_domhsh)->tbl[bkt];
list_for_each_entry_rcu(iter, bkt_list, list)
-   if (iter->valid && strcmp(iter->domain, domain) == 0)
+   if (iter->valid &&
+   netlbl_family_match(iter->family, family) &&
+   strcmp(iter->domain, domain) == 0)
return iter;
}
 
@@ -157,25 +168,35 @@ static struct netlbl_dom_map *netlbl_domhsh_search(const 
char *domain)
 /**
  * netlbl_domhsh_search_def - Search for a domain entry
  * @domain: the domain
- * @def: return default if no match is found
+ * @family: the address family
  *
  * Description:
  * Searches the domain hash table and returns a pointer to the hash table
  * entry if an exact match is found, if an exact match is not present in the
  * hash table then the default entry is returned if valid otherwise NULL is
- * returned.  The caller is responsible ensuring that the hash table is
+ * returned.  @family may be %AF_UNSPEC which matches any address family
+ * entries.  The caller is responsible ensuring that the hash table is
  * protected with either a RCU read lock or the hash table lock.
  *
  */
-static struct netlbl_dom_map *netlbl_domhsh_search_def(const char *domain)
+static struct netlbl_dom_map *netlbl_domhsh_search_def(const char *domain,
+  u16 family)
 {
struct netlbl_dom_map *entry;
 
-   entry = netlbl_domhsh_search(domain);
+   entry = netlbl_domhsh_search(domain, family);
if (entry == NULL) {
-   entry = netlbl_domhsh_rcu_deref(netlbl_domhsh_def);
-   if (entry != NULL && !entry->valid)
-   entry = NULL;
+   if (family == AF_INET || family == AF_UNSPEC) {
+   entry = netlbl_domhsh_rcu_deref(netlbl_domhsh_def_ipv4);
+   if (entry != NULL && !entry->valid)
+   entry = NULL;
+   }

[RFC PATCH 05/17] netlabel: Add support for enumerating the CALIPSO DOI list.

2015-12-22 Thread Huw Davies
Enumerate the DOI list through the NLBL_CALIPSO_C_LISTALL command.
It takes no attributes.

Signed-off-by: Huw Davies 
---
 include/net/netlabel.h  |   4 ++
 net/ipv6/calipso.c  |  41 
 net/netlabel/netlabel_calipso.c | 106 
 net/netlabel/netlabel_calipso.h |   3 ++
 4 files changed, 154 insertions(+)

diff --git a/include/net/netlabel.h b/include/net/netlabel.h
index e072350..c6e1ce9 100644
--- a/include/net/netlabel.h
+++ b/include/net/netlabel.h
@@ -225,6 +225,7 @@ struct netlbl_lsm_secattr {
  * @doi_free: free a CALIPSO DOI
  * @doi_getdef: returns a reference to a DOI
  * @doi_putdef: releases a reference of a DOI
+ * @doi_walk: enumerate the DOI list
  *
  * Description:
  * This structure is filled out by the CALIPSO engine and passed
@@ -238,6 +239,9 @@ struct netlbl_calipso_ops {
void (*doi_free)(struct calipso_doi *doi_def);
struct calipso_doi *(*doi_getdef)(u32 doi);
void (*doi_putdef)(struct calipso_doi *doi_def);
+   int (*doi_walk)(u32 *skip_cnt,
+   int (*callback)(struct calipso_doi *doi_def, void *arg),
+   void *cb_arg);
 };
 
 /*
diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c
index 128cc69..fa17c7a 100644
--- a/net/ipv6/calipso.c
+++ b/net/ipv6/calipso.c
@@ -210,11 +210,52 @@ static void calipso_doi_putdef(struct calipso_doi 
*doi_def)
call_rcu(&doi_def->rcu, calipso_doi_free_rcu);
 }
 
+/**
+ * calipso_doi_walk - Iterate through the DOI definitions
+ * @skip_cnt: skip past this number of DOI definitions, updated
+ * @callback: callback for each DOI definition
+ * @cb_arg: argument for the callback function
+ *
+ * Description:
+ * Iterate over the DOI definition list, skipping the first @skip_cnt entries.
+ * For each entry call @callback, if @callback returns a negative value stop
+ * 'walking' through the list and return.  Updates the value in @skip_cnt upon
+ * return.  Returns zero on success, negative values on failure.
+ *
+ */
+static int calipso_doi_walk(u32 *skip_cnt,
+   int (*callback)(struct calipso_doi *doi_def,
+   void *arg),
+   void *cb_arg)
+{
+   int ret_val = -ENOENT;
+   u32 doi_cnt = 0;
+   struct calipso_doi *iter_doi;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(iter_doi, &calipso_doi_list, list)
+   if (atomic_read(&iter_doi->refcount) > 0) {
+   if (doi_cnt++ < *skip_cnt)
+   continue;
+   ret_val = callback(iter_doi, cb_arg);
+   if (ret_val < 0) {
+   doi_cnt--;
+   goto doi_walk_return;
+   }
+   }
+
+doi_walk_return:
+   rcu_read_unlock();
+   *skip_cnt = doi_cnt;
+   return ret_val;
+}
+
 static const struct netlbl_calipso_ops ops = {
.doi_add  = calipso_doi_add,
.doi_free = calipso_doi_free,
.doi_getdef   = calipso_doi_getdef,
.doi_putdef   = calipso_doi_putdef,
+   .doi_walk = calipso_doi_walk,
 };
 
 /**
diff --git a/net/netlabel/netlabel_calipso.c b/net/netlabel/netlabel_calipso.c
index 1effc61..a014ec2 100644
--- a/net/netlabel/netlabel_calipso.c
+++ b/net/netlabel/netlabel_calipso.c
@@ -46,6 +46,13 @@
 #include "netlabel_mgmt.h"
 #include "netlabel_domainhash.h"
 
+/* Argument struct for calipso_doi_walk() */
+struct netlbl_calipso_doiwalk_arg {
+   struct netlink_callback *nl_cb;
+   struct sk_buff *skb;
+   u32 seq;
+};
+
 /* NetLabel Generic NETLINK CALIPSO family */
 static struct genl_family netlbl_calipso_gnl_family = {
.id = GENL_ID_GENERATE,
@@ -187,6 +194,73 @@ list_failure:
return ret_val;
 }
 
+/**
+ * netlbl_calipso_listall_cb - calipso_v4_doi_walk() callback for LISTALL
+ * @doi_def: the CALIPSO DOI definition
+ * @arg: the netlbl_calipso_doiwalk_arg structure
+ *
+ * Description:
+ * This function is designed to be used as a callback to the
+ * calipso_doi_walk() function for use in generating a response for a LISTALL
+ * message.  Returns the size of the message on success, negative values on
+ * failure.
+ *
+ */
+static int netlbl_calipso_listall_cb(struct calipso_doi *doi_def, void *arg)
+{
+   int ret_val = -ENOMEM;
+   struct netlbl_calipso_doiwalk_arg *cb_arg = arg;
+   void *data;
+
+   data = genlmsg_put(cb_arg->skb, NETLINK_CB(cb_arg->nl_cb->skb).portid,
+  cb_arg->seq, &netlbl_calipso_gnl_family,
+  NLM_F_MULTI, NLBL_CALIPSO_C_LISTALL);
+   if (!data)
+   goto listall_cb_failure;
+
+   ret_val = nla_put_u32(cb_arg->skb, NLBL_CALIPSO_A_DOI, doi_def->doi);
+   if (ret_val != 0)
+   goto listall_cb_failure;
+   ret_val = nla_put_u32(cb_arg->skb,
+  

[RFC PATCH 11/17] netlabel: Prevent setsockopt() from changing the hop-by-hop option.

2015-12-22 Thread Huw Davies
If a socket has a netlabel in place then don't let setsockopt() alter
the socket's IPv6 hop-by-hop option.  This is in the same spirit as
the existing check for IPv4.

Signed-off-by: Huw Davies 
---
 security/selinux/netlabel.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/security/selinux/netlabel.c b/security/selinux/netlabel.c
index 5470f32..2477a75 100644
--- a/security/selinux/netlabel.c
+++ b/security/selinux/netlabel.c
@@ -410,6 +410,21 @@ int selinux_netlbl_sock_rcv_skb(struct sk_security_struct 
*sksec,
 }
 
 /**
+ * selinux_netlbl_option - Is this a NetLabel option
+ * @level: the socket level or protocol
+ * @optname: the socket option name
+ *
+ * Description:
+ * Returns true if @level and @optname refer to a NetLabel option.
+ * Helper for selinux_netlbl_socket_setsockopt().
+ */
+static inline int selinux_netlbl_option(int level, int optname)
+{
+   return (level == IPPROTO_IP && optname == IP_OPTIONS) ||
+   (level == IPPROTO_IPV6 && optname == IPV6_HOPOPTS);
+}
+
+/**
  * selinux_netlbl_socket_setsockopt - Do not allow users to remove a NetLabel
  * @sock: the socket
  * @level: the socket level or protocol
@@ -431,7 +446,7 @@ int selinux_netlbl_socket_setsockopt(struct socket *sock,
struct sk_security_struct *sksec = sk->sk_security;
struct netlbl_lsm_secattr secattr;
 
-   if (level == IPPROTO_IP && optname == IP_OPTIONS &&
+   if (selinux_netlbl_option(level, optname) &&
(sksec->nlbl_state == NLBL_LABELED ||
 sksec->nlbl_state == NLBL_CONNLABELED)) {
netlbl_secattr_init(&secattr);
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 14/17] calipso: Allow the lsm to label the skbuff directly.

2015-12-22 Thread Huw Davies
In some cases, the lsm needs to add the label to the skbuff directly.
A NF_INET_LOCAL_OUT IPv6 hook is added to selinux to match the IPv4
behaviour.  This allows selinux to label the skbuffs that it requires.

Signed-off-by: Huw Davies 
---
 include/net/ipv6.h  |   2 +-
 include/net/netlabel.h  |  11 +++
 net/ipv6/calipso.c  | 180 
 net/ipv6/exthdrs_core.c |   2 +-
 net/netlabel/netlabel_calipso.c |  82 ++
 net/netlabel/netlabel_calipso.h |   7 ++
 net/netlabel/netlabel_kapi.c|  32 ++-
 security/selinux/hooks.c|  15 
 8 files changed, 325 insertions(+), 6 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 5f9c252..71b5045 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -920,7 +920,7 @@ enum {
 int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset, int target,
  unsigned short *fragoff, int *fragflg);
 
-int ipv6_find_tlv(struct sk_buff *skb, int offset, int type);
+int ipv6_find_tlv(const struct sk_buff *skb, int offset, int type);
 
 struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
const struct ipv6_txoptions *opt,
diff --git a/include/net/netlabel.h b/include/net/netlabel.h
index 771a11c..d88bf4a 100644
--- a/include/net/netlabel.h
+++ b/include/net/netlabel.h
@@ -231,6 +231,10 @@ struct netlbl_lsm_secattr {
  * @sock_delattr: remove the socket's attr
  * @req_setattr: set the req socket's attr
  * @req_delattr: remove the req socket's attr
+ * @optptr: find option in packet
+ * @getattr: retrieve attr from memory block
+ * @skbuff_setattr: set the skbuff's attr
+ * @skbuff_delattr: remove the skbuff's attr
  *
  * Description:
  * This structure is filled out by the CALIPSO engine and passed
@@ -258,6 +262,13 @@ struct netlbl_calipso_ops {
   const struct calipso_doi *doi_def,
   const struct netlbl_lsm_secattr *secattr);
void (*req_delattr)(struct request_sock *req);
+   unsigned char *(*optptr)(const struct sk_buff *skb);
+   int (*getattr)(const unsigned char *calipso,
+  struct netlbl_lsm_secattr *secattr);
+   int (*skbuff_setattr)(struct sk_buff *skb,
+ const struct calipso_doi *doi_def,
+ const struct netlbl_lsm_secattr *secattr);
+   int (*skbuff_delattr)(struct sk_buff *skb);
 };
 
 /*
diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c
index 5d72669..a91d71d 100644
--- a/net/ipv6/calipso.c
+++ b/net/ipv6/calipso.c
@@ -56,6 +56,12 @@
  */
 #define CALIPSO_HDR_LEN (2 + 8)
 
+ /* Maximium size of u32 aligned buffer required to hold calipso
+  * option.  Max of 3 initial pad bytes starting from buffer + 3.
+  * i.e. the worst case is when the previous tlv finishes on 4n + 3.
+  */
+#define CALIPSO_MAX_BUFFER (6 + CALIPSO_OPT_LEN_MAX)
+
 /* List of available DOI definitions */
 static DEFINE_SPINLOCK(calipso_doi_list_lock);
 static LIST_HEAD(calipso_doi_list);
@@ -962,6 +968,176 @@ static void calipso_req_delattr(struct request_sock *req)
kfree(new);
 }
 
+/* skbuff functions.
+ */
+
+/**
+ * calipso_optptr - Find the CALIPSO option in the packet
+ * @skb: the packet
+ *
+ * Description:
+ * Parse the packet's IP header looking for a CALIPSO option.  Returns a 
pointer
+ * to the start of the CALIPSO option on success, NULL if one if not found.
+ *
+ */
+static unsigned char *calipso_optptr(const struct sk_buff *skb)
+{
+   const struct ipv6hdr *ip6_hdr = ipv6_hdr(skb);
+   int offset;
+
+   if (ip6_hdr->nexthdr != NEXTHDR_HOP)
+   return NULL;
+
+   offset = ipv6_find_tlv(skb, sizeof(*ip6_hdr), IPV6_TLV_CALIPSO);
+   if (offset >= 0)
+   return (unsigned char *)ip6_hdr + offset;
+
+   return NULL;
+}
+
+/**
+ * calipso_skbuff_setattr - Set the CALIPSO option on a packet
+ * @skb: the packet
+ * @doi_def: the CALIPSO DOI to use
+ * @secattr: the security attributes
+ *
+ * Description:
+ * Set the CALIPSO option on the given packet based on the security attributes.
+ * Returns a pointer to the IP header on success and NULL on failure.
+ *
+ */
+static int calipso_skbuff_setattr(struct sk_buff *skb,
+ const struct calipso_doi *doi_def,
+ const struct netlbl_lsm_secattr *secattr)
+{
+   int ret_val;
+   struct ipv6hdr *ip6_hdr;
+   struct ipv6_opt_hdr *hop;
+   unsigned char buf[CALIPSO_MAX_BUFFER];
+   int len_delta;
+   unsigned int start, end, next_opt, pad;
+
+   ip6_hdr = ipv6_hdr(skb);
+   if (ip6_hdr->nexthdr == NEXTHDR_HOP) {
+   hop = (struct ipv6_opt_hdr *)(ip6_hdr + 1);
+   ret_val = calipso_opt_find(hop, &start, &end);
+   if (ret_val && ret_val != -ENOENT)
+   return ret_val;
+   if (end != ipv6_optlen(hop))
+   

[RFC PATCH 12/17] ipv6: Allow request socks to contain IPv6 options.

2015-12-22 Thread Huw Davies
If set, these will that precedence over the parent's options during
both sending and child creation.  If they're not set, the parent's
options (if any) will be used.

This is to allow the security_inet_conn_request() hook to modify the
IPv6 options in just the same way that it already may do for IPv4.

Signed-off-by: Huw Davies 
---
 include/net/inet_sock.h |  7 ++-
 net/dccp/ipv6.c | 12 +---
 net/ipv4/tcp_input.c|  3 +++
 net/ipv6/tcp_ipv6.c | 12 +---
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 625bdf9..39bbe8d 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -96,7 +96,12 @@ struct inet_request_sock {
u32 ir_mark;
union {
struct ip_options_rcu   *opt;
-   struct sk_buff  *pktopts;
+#if IS_ENABLED(CONFIG_IPV6)
+   struct {
+   struct ipv6_txoptions   *ipv6_opt;
+   struct sk_buff  *pktopts;
+   };
+#endif
};
 };
 
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 9c6d050..8bb1c3a 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -216,14 +216,17 @@ static int dccp_v6_send_response(const struct sock *sk, 
struct request_sock *req
skb = dccp_make_response(sk, dst, req);
if (skb != NULL) {
struct dccp_hdr *dh = dccp_hdr(skb);
+   struct ipv6_txoptions *opt;
 
dh->dccph_checksum = dccp_v6_csum_finish(skb,
 &ireq->ir_v6_loc_addr,
 &ireq->ir_v6_rmt_addr);
fl6.daddr = ireq->ir_v6_rmt_addr;
rcu_read_lock();
-   err = ip6_xmit(sk, skb, &fl6, rcu_dereference(np->opt),
-  np->tclass);
+   opt = ireq->ipv6_opt;
+   if (!opt)
+   opt = rcu_dereference(np->opt);
+   err = ip6_xmit(sk, skb, &fl6, opt, np->tclass);
rcu_read_unlock();
err = net_xmit_eval(err);
}
@@ -236,6 +239,7 @@ done:
 static void dccp_v6_reqsk_destructor(struct request_sock *req)
 {
dccp_feat_list_purge(&dccp_rsk(req)->dreq_featneg);
+   kfree(inet_rsk(req)->ipv6_opt);
kfree_skb(inet_rsk(req)->pktopts);
 }
 
@@ -494,7 +498,9 @@ static struct sock *dccp_v6_request_recv_sock(const struct 
sock *sk,
 * Yes, keeping reference count would be much more clever, but we make
 * one more one thing there: reattach optmem to newsk.
 */
-   opt = rcu_dereference(np->opt);
+   opt = ireq->ipv6_opt;
+   if (!opt)
+   opt = rcu_dereference(np->opt);
if (opt) {
opt = ipv6_dup_options(newsk, opt);
RCU_INIT_POINTER(newnp->opt, opt);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2d656ee..af91a95 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6094,6 +6094,9 @@ struct request_sock *inet_reqsk_alloc(const struct 
request_sock_ops *ops,
 
kmemcheck_annotate_bitfield(ireq, flags);
ireq->opt = NULL;
+#if IS_ENABLED(CONFIG_IPV6)
+   ireq->pktopts = NULL;
+#endif
atomic64_set(&ireq->ir_cookie, 0);
ireq->ireq_state = TCP_NEW_SYN_RECV;
write_pnet(&ireq->ireq_net, sock_net(sk_listener));
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 6b8a8a9..a9041b2 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -443,6 +443,7 @@ static int tcp_v6_send_synack(const struct sock *sk, struct 
dst_entry *dst,
 {
struct inet_request_sock *ireq = inet_rsk(req);
struct ipv6_pinfo *np = inet6_sk(sk);
+   struct ipv6_txoptions *opt;
struct flowi6 *fl6 = &fl->u.ip6;
struct sk_buff *skb;
int err = -ENOMEM;
@@ -462,8 +463,10 @@ static int tcp_v6_send_synack(const struct sock *sk, 
struct dst_entry *dst,
if (np->repflow && ireq->pktopts)
fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts));
 
-   err = ip6_xmit(sk, skb, fl6, rcu_dereference(np->opt),
-  np->tclass);
+   opt = ireq->ipv6_opt;
+   if (!opt)
+   opt = rcu_dereference(np->opt);
+   err = ip6_xmit(sk, skb, fl6, opt, np->tclass);
err = net_xmit_eval(err);
}
 
@@ -474,6 +477,7 @@ done:
 
 static void tcp_v6_reqsk_destructor(struct request_sock *req)
 {
+   kfree(inet_rsk(req)->ipv6_opt);
kfree_skb(inet_rsk(req)->pktopts);
 }
 
@@ -1101,7 +1105,9 @@ static struct sock *tcp_v6_syn_recv_sock(const struct 
sock *sk, struct sk_buff *
   but we make one more one thing there: reattach optmem
   to newsk.
 */
-   opt = rcu_dereference(np->

[RFC PATCH 09/17] netlabel: Move bitmap manipulation functions to the NetLabel core.

2015-12-22 Thread Huw Davies
This is to allow the CALIPSO labelling engine to use these.

Signed-off-by: Huw Davies 
---
 include/net/netlabel.h   |  6 +++
 net/ipv4/cipso_ipv4.c| 88 +---
 net/netlabel/netlabel_kapi.c | 70 +++
 3 files changed, 85 insertions(+), 79 deletions(-)

diff --git a/include/net/netlabel.h b/include/net/netlabel.h
index 0ffe32f..e6ac0da 100644
--- a/include/net/netlabel.h
+++ b/include/net/netlabel.h
@@ -434,6 +434,12 @@ int netlbl_catmap_setlong(struct netlbl_lsm_catmap 
**catmap,
  unsigned long bitmap,
  gfp_t flags);
 
+/* Bitmap functions
+ */
+int netlbl_bitmap_walk(const unsigned char *bitmap, u32 bitmap_len,
+  u32 offset, u8 state);
+void netlbl_bitmap_setbit(unsigned char *bitmap, u32 bit, u8 state);
+
 /*
  * LSM protocol operations (NetLabel LSM/kernel API)
  */
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index bdb2a07..d710d4e 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -135,76 +135,6 @@ int cipso_v4_rbm_strictvalid = 1;
  */
 
 /**
- * cipso_v4_bitmap_walk - Walk a bitmap looking for a bit
- * @bitmap: the bitmap
- * @bitmap_len: length in bits
- * @offset: starting offset
- * @state: if non-zero, look for a set (1) bit else look for a cleared (0) bit
- *
- * Description:
- * Starting at @offset, walk the bitmap from left to right until either the
- * desired bit is found or we reach the end.  Return the bit offset, -1 if
- * not found, or -2 if error.
- */
-static int cipso_v4_bitmap_walk(const unsigned char *bitmap,
-   u32 bitmap_len,
-   u32 offset,
-   u8 state)
-{
-   u32 bit_spot;
-   u32 byte_offset;
-   unsigned char bitmask;
-   unsigned char byte;
-
-   /* gcc always rounds to zero when doing integer division */
-   byte_offset = offset / 8;
-   byte = bitmap[byte_offset];
-   bit_spot = offset;
-   bitmask = 0x80 >> (offset % 8);
-
-   while (bit_spot < bitmap_len) {
-   if ((state && (byte & bitmask) == bitmask) ||
-   (state == 0 && (byte & bitmask) == 0))
-   return bit_spot;
-
-   bit_spot++;
-   bitmask >>= 1;
-   if (bitmask == 0) {
-   byte = bitmap[++byte_offset];
-   bitmask = 0x80;
-   }
-   }
-
-   return -1;
-}
-
-/**
- * cipso_v4_bitmap_setbit - Sets a single bit in a bitmap
- * @bitmap: the bitmap
- * @bit: the bit
- * @state: if non-zero, set the bit (1) else clear the bit (0)
- *
- * Description:
- * Set a single bit in the bitmask.  Returns zero on success, negative values
- * on error.
- */
-static void cipso_v4_bitmap_setbit(unsigned char *bitmap,
-  u32 bit,
-  u8 state)
-{
-   u32 byte_spot;
-   u8 bitmask;
-
-   /* gcc always rounds to zero when doing integer division */
-   byte_spot = bit / 8;
-   bitmask = 0x80 >> (bit % 8);
-   if (state)
-   bitmap[byte_spot] |= bitmask;
-   else
-   bitmap[byte_spot] &= ~bitmask;
-}
-
-/**
  * cipso_v4_cache_entry_free - Frees a cache entry
  * @entry: the entry to free
  *
@@ -840,10 +770,10 @@ static int cipso_v4_map_cat_rbm_valid(const struct 
cipso_v4_doi *doi_def,
cipso_cat_size = doi_def->map.std->cat.cipso_size;
cipso_array = doi_def->map.std->cat.cipso;
for (;;) {
-   cat = cipso_v4_bitmap_walk(bitmap,
-  bitmap_len_bits,
-  cat + 1,
-  1);
+   cat = netlbl_bitmap_walk(bitmap,
+bitmap_len_bits,
+cat + 1,
+1);
if (cat < 0)
break;
if (cat >= cipso_cat_size ||
@@ -909,7 +839,7 @@ static int cipso_v4_map_cat_rbm_hton(const struct 
cipso_v4_doi *doi_def,
}
if (net_spot >= net_clen_bits)
return -ENOSPC;
-   cipso_v4_bitmap_setbit(net_cat, net_spot, 1);
+   netlbl_bitmap_setbit(net_cat, net_spot, 1);
 
if (net_spot > net_spot_max)
net_spot_max = net_spot;
@@ -951,10 +881,10 @@ static int cipso_v4_map_cat_rbm_ntoh(const struct 
cipso_v4_doi *doi_def,
}
 
for (;;) {
-   net_spot = cipso_v4_bitmap_walk(net_cat,
-   net_clen_bits,
-   net_spot + 1,
- 

[RFC PATCH 13/17] calipso: Allow request sockets to be relabelled by the lsm.

2015-12-22 Thread Huw Davies
Request sockets need to have a label that takes into account the
incoming connection as well as their parent's label.  This is used
for the outgoing SYN-ACK and for their child full-socket.

Signed-off-by: Huw Davies 
---
 include/net/netlabel.h  |  6 
 net/ipv6/calipso.c  | 78 +
 net/ipv6/exthdrs.c  | 16 ++---
 net/netlabel/netlabel_calipso.c | 40 +
 net/netlabel/netlabel_calipso.h |  4 +++
 net/netlabel/netlabel_kapi.c| 39 -
 security/selinux/netlabel.c |  2 +-
 7 files changed, 171 insertions(+), 14 deletions(-)

diff --git a/include/net/netlabel.h b/include/net/netlabel.h
index b7ec76c..771a11c 100644
--- a/include/net/netlabel.h
+++ b/include/net/netlabel.h
@@ -229,6 +229,8 @@ struct netlbl_lsm_secattr {
  * @sock_getattr: retrieve the socket's attr
  * @sock_setattr: set the socket's attr
  * @sock_delattr: remove the socket's attr
+ * @req_setattr: set the req socket's attr
+ * @req_delattr: remove the req socket's attr
  *
  * Description:
  * This structure is filled out by the CALIPSO engine and passed
@@ -252,6 +254,10 @@ struct netlbl_calipso_ops {
const struct calipso_doi *doi_def,
const struct netlbl_lsm_secattr *secattr);
void (*sock_delattr)(struct sock *sk);
+   int (*req_setattr)(struct request_sock *req,
+  const struct calipso_doi *doi_def,
+  const struct netlbl_lsm_secattr *secattr);
+   void (*req_delattr)(struct request_sock *req);
 };
 
 /*
diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c
index ce803e2..5d72669 100644
--- a/net/ipv6/calipso.c
+++ b/net/ipv6/calipso.c
@@ -886,6 +886,82 @@ done:
txopt_put(txopts);
 }
 
+/* request sock functions.
+ */
+
+/**
+ * calipso_req_setattr - Add a CALIPSO option to a connection request socket
+ * @req: the connection request socket
+ * @doi_def: the CALIPSO DOI to use
+ * @secattr: the specific security attributes of the socket
+ *
+ * Description:
+ * Set the CALIPSO option on the given socket using the DOI definition and
+ * security attributes passed to the function.  Returns zero on success and
+ * negative values on failure.
+ *
+ */
+static int calipso_req_setattr(struct request_sock *req,
+  const struct calipso_doi *doi_def,
+  const struct netlbl_lsm_secattr *secattr)
+{
+   struct ipv6_txoptions *txopts;
+   struct inet_request_sock *req_inet = inet_rsk(req);
+   struct ipv6_opt_hdr *old, *new;
+
+   if (req_inet->ipv6_opt && req_inet->ipv6_opt->hopopt)
+   old = req_inet->ipv6_opt->hopopt;
+   else
+   old = NULL;
+
+   new = calipso_opt_insert(old, doi_def, secattr);
+   if (IS_ERR(new))
+   return PTR_ERR(new);
+
+   txopts = ipv6_renew_options_kern(NULL, req_inet->ipv6_opt, IPV6_HOPOPTS,
+new, new ? ipv6_optlen(new) : 0);
+
+   kfree(new);
+
+   if (IS_ERR(txopts))
+   return PTR_ERR(txopts);
+
+   txopts = xchg(&req_inet->ipv6_opt, txopts);
+   txopt_put(txopts);
+
+   return 0;
+}
+
+/**
+ * calipso_req_delattr - Delete the CALIPSO option from a request socket
+ * @reg: the request socket
+ *
+ * Description:
+ * Removes the CALIPSO option from a request socket, if present.
+ *
+ */
+static void calipso_req_delattr(struct request_sock *req)
+{
+   struct inet_request_sock *req_inet = inet_rsk(req);
+   struct ipv6_opt_hdr *new;
+   struct ipv6_txoptions *txopts;
+
+   if (!req_inet->ipv6_opt || !req_inet->ipv6_opt->hopopt)
+   return;
+
+   if (calipso_opt_del(req_inet->ipv6_opt->hopopt, &new))
+   return; /* Nothing to do */
+
+   txopts = ipv6_renew_options_kern(NULL, req_inet->ipv6_opt, IPV6_HOPOPTS,
+new, new ? ipv6_optlen(new) : 0);
+
+   if (!IS_ERR(txopts)) {
+   txopts = xchg(&req_inet->ipv6_opt, txopts);
+   txopt_put(txopts);
+   }
+   kfree(new);
+}
+
 static const struct netlbl_calipso_ops ops = {
.doi_add  = calipso_doi_add,
.doi_free = calipso_doi_free,
@@ -896,6 +972,8 @@ static const struct netlbl_calipso_ops ops = {
.sock_getattr = calipso_sock_getattr,
.sock_setattr = calipso_sock_setattr,
.sock_delattr = calipso_sock_delattr,
+   .req_setattr  = calipso_req_setattr,
+   .req_delattr  = calipso_req_delattr,
 };
 
 /**
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 9426b26..e87c89b 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -793,7 +793,7 @@ static int ipv6_renew_option(void *ohdr,
  * specified option type is not copied into the new set of options.
  *
  * The new set of options is allocated from the socket option memory
- * buff

  1   2   >