date:20150616

This series contains updates to fm10k only.

Alex provides two fixes for the fm10k, first folds the fm10k_pull_tail()
call into fm10k_add_rx_frag(), this way the fragment does not have to be
modified after it is added to the skb.  The second fixes missing braces
to an if statement.

The remaining patches are from Jacob which contain improvements and fixes
for fm10k.  First fix makes it so that invalid address will simply be
skipped and allows synchronizing the full list to proceed with using
iproute2 tool.  Fixed a possible kernel panic by using the correct
transmit timestamp function.  Simplified the code flow for setting the
IN_PROGRESS bit of the shinfo for an skb that we will be timestamping.
Fix a bug in the timestamping transmit enqueue code responsible for a
NULL pointer dereference and invalid access of the skb list by freeing
the clone in the cases where we did not add it to the queue.  Update the
PF code so that it resets the empty TQMAP/RQMAP regirsters post-VFLR to
prevent innocent VF drivers from triggering malicious driver events.
The SYSTIME_CFG.Adjust direction bit is actually supposed to indicate
that the adjustment is positive, so fix the code to align correctly with
the hardware and documentation.  Cleanup local variable that is no longer
used after a previous refactor of the code.  Fix the code flow so that we
actually clear the enabled flag as part of our removal of the LPORT.

The following are changes since commit 89d256bb69f2596c3a31ac51466eac9e1791c388:
  bpf: disallow bpf tc programs access current-pid,uid
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue master

Alexander Duyck (2):
  fm10k: fold fm10k_pull_tail into fm10k_add_rx_frag
  fm10k: Fix missing braces after if statement

Jacob Keller (15):
  fm10k: ignore invalid multicast address entries
  fm10k: use correct ethernet driver Tx timestamp function
  fm10k: move setting shinfo inside ts_tx_enqueue
  fm10k: fix incorrect free on skb in ts_tx_enqueue
  fm10k: add call to fm10k_clean_all_rx_rings in fm10k_down
  fm10k: use an unsigned int for i in ethtool_get_strings
  fm10k: remove extraneous NULL check on l2_accel
  fm10k: trivial fixup message style to include a colon
  fm10k: use dma_set_mask_and_coherent in fm10k_probe
  fm10k: force LPORT delete when updating VLAN or MAC address
  fm10k: re-map all possible VF queues after a VFLR
  fm10k: pack TLV overlay structures
  fm10k: fix incorrect DIR_NEVATIVE bit in 1588 code
  fm10k: remove err_no reference in fm10k_mbx.c
  fm10k: fix iov_msg_lport_state_pf issue

 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c |  5 +-
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 38 --
 drivers/net/ethernet/intel/fm10k/fm10k_main.c| 66 +++-
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.c |  5 --
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  | 11 +---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 27 +++---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c  | 18 ++-
 drivers/net/ethernet/intel/fm10k/fm10k_pf.h  |  8 +--
 drivers/net/ethernet/intel/fm10k/fm10k_ptp.c | 13 ++---
 drivers/net/ethernet/intel/fm10k/fm10k_type.h|  2 +-
 10 files changed, 84 insertions(+), 109 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 01/17] fm10k: fold fm10k_pull_tail into fm10k_add_rx_frag

From: Alexander Duyck alexander.h.du...@redhat.com

This change folds the fm10k_pull_tail call into fm10k_add_rx_frag.  The
advantage to doing this is that the fragment doesn't have to be modified
after it is added to the skb.

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 66 ---
 1 file changed, 20 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index c754b20..982fdcd 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -269,16 +269,19 @@ static bool fm10k_add_rx_frag(struct fm10k_rx_buffer 
*rx_buffer,
  struct sk_buff *skb)
 {
struct page *page = rx_buffer-page;
+   unsigned char *va = page_address(page) + rx_buffer-page_offset;
unsigned int size = le16_to_cpu(rx_desc-w.length);
 #if (PAGE_SIZE  8192)
unsigned int truesize = FM10K_RX_BUFSZ;
 #else
-   unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
+   unsigned int truesize = SKB_DATA_ALIGN(size);
 #endif
+   unsigned int pull_len;
 
-   if ((size = FM10K_RX_HDR_LEN)  !skb_is_nonlinear(skb)) {
-   unsigned char *va = page_address(page) + rx_buffer-page_offset;
+   if (unlikely(skb_is_nonlinear(skb)))
+   goto add_tail_frag;
 
+   if (likely(size = FM10K_RX_HDR_LEN)) {
memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
 
/* page is not reserved, we can reuse buffer as-is */
@@ -290,8 +293,21 @@ static bool fm10k_add_rx_frag(struct fm10k_rx_buffer 
*rx_buffer,
return false;
}
 
+   /* we need the header to contain the greater of either ETH_HLEN or
+* 60 bytes if the skb-len is less than 60 for skb_pad.
+*/
+   pull_len = eth_get_headlen(va, FM10K_RX_HDR_LEN);
+
+   /* align pull length to size of long to optimize memcpy performance */
+   memcpy(__skb_put(skb, pull_len), va, ALIGN(pull_len, sizeof(long)));
+
+   /* update all of the pointers */
+   va += pull_len;
+   size -= pull_len;
+
+add_tail_frag:
skb_add_rx_frag(skb, skb_shinfo(skb)-nr_frags, page,
-   rx_buffer-page_offset, size, truesize);
+   (unsigned long)va  ~PAGE_MASK, size, truesize);
 
return fm10k_can_reuse_rx_page(rx_buffer, page, truesize);
 }
@@ -518,44 +534,6 @@ static bool fm10k_is_non_eop(struct fm10k_ring *rx_ring,
 }
 
 /**
- * fm10k_pull_tail - fm10k specific version of skb_pull_tail
- * @skb: pointer to current skb being adjusted
- *
- * This function is an fm10k specific version of __pskb_pull_tail.  The
- * main difference between this version and the original function is that
- * this function can make several assumptions about the state of things
- * that allow for significant optimizations versus the standard function.
- * As a result we can do things like drop a frag and maintain an accurate
- * truesize for the skb.
- */
-static void fm10k_pull_tail(struct sk_buff *skb)
-{
-   struct skb_frag_struct *frag = skb_shinfo(skb)-frags[0];
-   unsigned char *va;
-   unsigned int pull_len;
-
-   /* it is valid to use page_address instead of kmap since we are
-* working with pages allocated out of the lomem pool per
-* alloc_page(GFP_ATOMIC)
-*/
-   va = skb_frag_address(frag);
-
-   /* we need the header to contain the greater of either ETH_HLEN or
-* 60 bytes if the skb-len is less than 60 for skb_pad.
-*/
-   pull_len = eth_get_headlen(va, FM10K_RX_HDR_LEN);
-
-   /* align pull length to size of long to optimize memcpy performance */
-   skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
-
-   /* update all of the pointers */
-   skb_frag_size_sub(frag, pull_len);
-   frag-page_offset += pull_len;
-   skb-data_len -= pull_len;
-   skb-tail += pull_len;
-}
-
-/**
  * fm10k_cleanup_headers - Correct corrupted or empty headers
  * @rx_ring: rx descriptor ring packet is being transacted on
  * @rx_desc: pointer to the EOP Rx descriptor
@@ -580,10 +558,6 @@ static bool fm10k_cleanup_headers(struct fm10k_ring 
*rx_ring,
return true;
}
 
-   /* place header in linear portion of buffer */
-   if (skb_is_nonlinear(skb))
-   fm10k_pull_tail(skb);
-
/* if eth_skb_pad returns an error the skb was freed */
if (eth_skb_pad(skb))
return true;
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 12/17] fm10k: re-map all possible VF queues after a VFLR

From: Jacob Keller jacob.e.kel...@intel.com

During initialization, the VF counts its rings by walking the TQDLOC
registers. This works only if the TQMAP/RQMAP registers are set to map
all of the out-of-bound rings back to the first one. This allows the VF
to cleanly detect when it has run out of queues. Update the PF code so
that it resets the empty TQMAP/RQMAP registers post-VFLR to prevent
innocent VF drivers from triggering malicious driver events.

Signed-off-by: Matthew Vick matthew.v...@intel.com
Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 891e218..3b94206 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1046,6 +1046,12 @@ static s32 fm10k_iov_reset_resources_pf(struct fm10k_hw 
*hw,
fm10k_write_reg(hw, FM10K_RQMAP(qmap_idx + i), vf_q_idx + i);
}
 
+   /* repeat the first ring for all the remaining VF rings */
+   for (i = queues_per_pool; i  qmap_stride; i++) {
+   fm10k_write_reg(hw, FM10K_TQMAP(qmap_idx + 1), vf_q_idx);
+   fm10k_write_reg(hw, FM10K_RQMAP(qmap_idx + 1), vf_q_idx);
+   }
+
return 0;
 }
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: initial bulk free implementation

2015-06-16 17:57 GMT+09:00 Jesper Dangaard Brouer bro...@redhat.com:
 On Tue, 16 Jun 2015 10:21:10 +0200
 Jesper Dangaard Brouer bro...@redhat.com wrote:


 On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim iamjoonsoo@lge.com 
 wrote:

  Is this really better than just calling __kmem_cache_free_bulk()?

 Yes, as can be seen by cover-letter, but my cover-letter does not seem
 to have reached mm-list.

 Measurements for the entire patchset:

 Bulk - Fallback bulking   - fastpath-bulking
1 -  47 cycles(tsc) 11.921 ns  -  45 cycles(tsc) 11.461 ns   improved  
 4.3%
2 -  46 cycles(tsc) 11.649 ns  -  28 cycles(tsc)  7.023 ns   improved 
 39.1%
3 -  46 cycles(tsc) 11.550 ns  -  22 cycles(tsc)  5.671 ns   improved 
 52.2%
4 -  45 cycles(tsc) 11.398 ns  -  19 cycles(tsc)  4.967 ns   improved 
 57.8%
8 -  45 cycles(tsc) 11.303 ns  -  17 cycles(tsc)  4.298 ns   improved 
 62.2%
   16 -  44 cycles(tsc) 11.221 ns  -  17 cycles(tsc)  4.423 ns   improved 
 61.4%
   30 -  75 cycles(tsc) 18.894 ns  -  57 cycles(tsc) 14.497 ns   improved 
 24.0%
   32 -  73 cycles(tsc) 18.491 ns  -  56 cycles(tsc) 14.227 ns   improved 
 23.3%
   34 -  75 cycles(tsc) 18.962 ns  -  58 cycles(tsc) 14.638 ns   improved 
 22.7%
   48 -  80 cycles(tsc) 20.049 ns  -  64 cycles(tsc) 16.247 ns   improved 
 20.0%
   64 -  87 cycles(tsc) 21.929 ns  -  74 cycles(tsc) 18.598 ns   improved 
 14.9%
  128 -  98 cycles(tsc) 24.511 ns  -  89 cycles(tsc) 22.295 ns   improved  
 9.2%
  158 - 101 cycles(tsc) 25.389 ns  -  93 cycles(tsc) 23.390 ns   improved  
 7.9%
  250 - 104 cycles(tsc) 26.170 ns  - 100 cycles(tsc) 25.112 ns   improved  
 3.8%

 I'll do a compare against the previous patch, and post the results.

 Compare against previous patch:

 Run:   previous-patch- this patch
   1 -   49 cycles(tsc) 12.378 ns -  43 cycles(tsc) 10.775 ns  improved 12.2%
   2 -   37 cycles(tsc)  9.297 ns -  26 cycles(tsc)  6.652 ns  improved 29.7%
   3 -   33 cycles(tsc)  8.348 ns -  21 cycles(tsc)  5.347 ns  improved 36.4%
   4 -   31 cycles(tsc)  7.930 ns -  18 cycles(tsc)  4.669 ns  improved 41.9%
   8 -   30 cycles(tsc)  7.693 ns -  17 cycles(tsc)  4.404 ns  improved 43.3%
  16 -   32 cycles(tsc)  8.059 ns -  17 cycles(tsc)  4.493 ns  improved 46.9%

So, in your test, most of objects may come from one or two slabs and your
algorithm is well optimized for this case. But, is this workload normal case?
If most of objects comes from many different slabs, bulk free API does
enabling/disabling interrupt very much so I guess it work worse than
just calling __kmem_cache_free_bulk(). Could you test this case?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: initial bulk free implementation

2015-06-16 18:20 GMT+09:00 Jesper Dangaard Brouer bro...@redhat.com:
 On Tue, 16 Jun 2015 16:23:28 +0900
 Joonsoo Kim iamjoonsoo@lge.com wrote:

 On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote:
  This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
  now both have bulk alloc and free implemented.
 
  Play nice and reenable local IRQs while calling slowpath.
 
  Signed-off-by: Jesper Dangaard Brouer bro...@redhat.com
  ---
   mm/slub.c |   32 +++-
   1 file changed, 31 insertions(+), 1 deletion(-)
 
  diff --git a/mm/slub.c b/mm/slub.c
  index 98d0e6f73ec1..cc4f870677bb 100644
  --- a/mm/slub.c
  +++ b/mm/slub.c
  @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
 
   void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
   {
  -   __kmem_cache_free_bulk(s, size, p);
  +   struct kmem_cache_cpu *c;
  +   struct page *page;
  +   int i;
  +
  +   local_irq_disable();
  +   c = this_cpu_ptr(s-cpu_slab);
  +
  +   for (i = 0; i  size; i++) {
  +   void *object = p[i];
  +
  +   if (unlikely(!object))
  +   continue; // HOW ABOUT BUG_ON()???
  +
  +   page = virt_to_head_page(object);
  +   BUG_ON(s != page-slab_cache); /* Check if valid slab page */

 You need to use cache_from_objt() to support kmemcg accounting.
 And, slab_free_hook() should be called before free.

 Okay, but Christoph choose to not support kmem_cache_debug() in patch2/7.

 Should we/I try to add kmem cache debugging support?

kmem_cache_debug() is the check for slab internal debugging feature.
slab_free_hook() and others mentioned from me are also related to external
debugging features like as kasan and kmemleak. So, even if
debugged kmem_cache isn't supported by bulk API, external debugging
feature should be supported.

 If adding these, then I would also need to add those on alloc path...

Yes, please.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next 07/17] fm10k: use an unsigned int for i in ethtool_get_strings


On 6/16/2015 4:47 PM, Jeff Kirsher wrote:


From: Jacob Keller jacob.e.kel...@intel.com



The value will never be negative, and we use the %i print format, use


   %i is the same as %d, AFAIR. You need %u for the unsigned variables.


unsigned int for the loop counter. Issue found using cppcheck.



Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com


WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/11] IB/cma: Refactor RDMA IP CM private-data parsing code

On 16/06/2015 01:33, Hefty, Sean wrote:
 -static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id
 *listen_id,
 - struct ib_cm_event *ib_event)
 +static u16 cma_port_from_service_id(__be64 service_id)
  {
 -struct cma_hdr *hdr;
 +return be64_to_cpu(service_id);
 +}
 
 Nit - Does the compiler not complain about the cast from u64 to u16?
 

Apparently it does, but only with W=3 (-Wconversion is included there).
W=3 produces about 6k warnings when compiling cma.c, so I don't usually
enable it.

I'll add a cast there to prevent the warning.

Haggai
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next 10/17] fm10k: use dma_set_mask_and_coherent in fm10k_probe


On 6/16/2015 4:47 PM, Jeff Kirsher wrote:


From: Jacob Keller jacob.e.kel...@intel.com



This patch cleans up the use of dma_get_required_mask and uses the
simpler dma_set_mask_and_coherent function instead of doing these as
separate steps.



I removed the dma_get_required_mask call because based on some minimal
testing it appears that either (a) we're not doing the right thing with
the call or (b) we don't need it anyways. If the value returned is
48bits, we'll end up trying with 48 bits anyways. If it's over 48bits,
fm10k can't support that anyways, and we should try 48bits. If 48bits
fails, we'll fallback to 32bits. This cleans up some very funky code.



Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
  drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 24 ++--
  1 file changed, 6 insertions(+), 18 deletions(-)



diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 5269b16..0381c8d1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1741,30 +1741,18 @@ static int fm10k_probe(struct pci_dev *pdev,
struct fm10k_intfc *interface;
struct fm10k_hw *hw;
int err;
-   u64 dma_mask;

err = pci_enable_device_mem(pdev);
if (err)
return err;

-   /* By default fm10k only supports a 48 bit DMA mask */
-   dma_mask = DMA_BIT_MASK(48) | dma_get_required_mask(pdev-dev);
-
-   if ((dma_mask = DMA_BIT_MASK(32)) ||
-   dma_set_mask_and_coherent(pdev-dev, dma_mask)) {
-   dma_mask = DMA_BIT_MASK(32);
-
+   err = dma_set_mask_and_coherent(pdev-dev, DMA_BIT_MASK(48));
+   if (err)
err = dma_set_mask_and_coherent(pdev-dev, DMA_BIT_MASK(32));
-   err = dma_set_mask(pdev-dev, DMA_BIT_MASK(32));
-   if (err) {
-   err = dma_set_coherent_mask(pdev-dev,
-   DMA_BIT_MASK(32));
-   if (err) {
-   dev_err(pdev-dev,
-   No usable DMA configuration, 
aborting\n);
-   goto err_dma;
-   }
-   }
+   if (err) {
+   dev_err(pdev-dev,
+   DMA configuration failed: 0x%x\n, err);


Again, %d seems more suitable here.

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] net: mvneta: introduce tx_csum_limit property

2015-06-16 Thread Jason Cooper

Thomas, Simon,

On Tue, Jun 16, 2015 at 12:00:34PM +0200, Thomas Petazzoni wrote:
 On Mon, 15 Jun 2015 17:54:41 +0200, Simon Guinot wrote:
   The current armada-370-neta would limit the HW checksumming features to
   packets smaller than 1600 bytes, while a new armada-xp-neta would not
   have this limit.
  
  This was also my first idea. But by doing this, we take the risk of
  losing the HW checksumming feature with jumbo frames on some currently
  working Armada XP setups. This may happen for example if a user is able
  to update the kernel but not the on-board DTB. In order to fix a feature
  on a SoC, we are breaking the DTB-kernel compatibility for the very same
  feature on an another SoC. I am not sure it is OK.

Frankly, this isn't a realistic scenario.  We've said from day one that
if the dtb is provided with the board that it needs to be updateable.
For exactly these kinds of situations.  Also, Thomas' assessment is
correct, everyone we've ever spoken to is keeping the dtb in sync with
the kernel.

As long as a board with the old dtb boots a newer kernel without
crashing, then it's fine.  afaict in this situation, the updated driver
should limit HW checksumming for packets 1600 bytes if the compatible
string is 'armada-370-neta'.  Regardless of actual SoC underneath.
If the driver gets 'armada-xp-neta' then there is no checksum limit.

Users with an Armada XP SoC and an old dtb will need to upgrade the dtb
in order to make use of HW checksumming on jumbo packets with newer
kernels.  Seems sane to me.

thx,

Jason.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/11] IB/cm: Share listening CM IDs

On 16/06/2015 01:13, Hefty, Sean wrote:
 @@ -722,6 +725,7 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device
 *device,
  INIT_LIST_HEAD(cm_id_priv-work_list);
  atomic_set(cm_id_priv-work_count, -1);
  atomic_set(cm_id_priv-refcount, 1);
 +cm_id_priv-listen_sharecount = 1;
 
 This is setting the listen count before we know whether the cm_id will 
 actually be used to listen.

Right. I'll move it to the new_id case in ib_cm_id_create_and_listen.

 
 
  return cm_id_priv-id;

  error:
 @@ -847,11 +851,21 @@ retest:
  spin_lock_irq(cm_id_priv-lock);
  switch (cm_id-state) {
  case IB_CM_LISTEN:
 -cm_id-state = IB_CM_IDLE;
  spin_unlock_irq(cm_id_priv-lock);
 +
  spin_lock_irq(cm.lock);
 +if (--cm_id_priv-listen_sharecount  0) {
 +/* The id is still shared. */
 +atomic_dec(cm_id_priv-refcount);
 +spin_unlock_irq(cm.lock);
 +return;
 +}
  rb_erase(cm_id_priv-service_node, cm.listen_service_table);
  spin_unlock_irq(cm.lock);
 +
 +spin_lock_irq(cm_id_priv-lock);
 +cm_id-state = IB_CM_IDLE;
 +spin_unlock_irq(cm_id_priv-lock);
 
 Why is the state being changed?  The cm_id is about to be freed anyway.

It matches the rest of the code, but I don't think it is actually being
used for listening ids. I will drop it.

 
 
  break;
  case IB_CM_SIDR_REQ_SENT:
  cm_id-state = IB_CM_IDLE;
 @@ -929,11 +943,32 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id)
  }
  EXPORT_SYMBOL(ib_destroy_cm_id);

 -int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64
 service_mask,
 - struct ib_cm_compare_data *compare_data)
 +/**
 + * __ib_cm_listen - Initiates listening on the specified service ID for
 + *   connection and service ID resolution requests.
 + * @cm_id: Connection identifier associated with the listen request.
 + * @service_id: Service identifier matched against incoming connection
 + *   and service ID resolution requests.  The service ID should be
 specified
 + *   network-byte order.  If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
 + *   assign a service ID to the caller.
 + * @service_mask: Mask applied to service ID used to listen across a
 + *   range of service IDs.  If set to 0, the service ID is matched
 + *   exactly.  This parameter is ignored if %service_id is set to
 + *   IB_CM_ASSIGN_SERVICE_ID.
 + * @compare_data: This parameter is optional.  It specifies data that
 must
 + *   appear in the private data of a connection request for the specified
 + *   listen request.
 + * @lock: If set, lock the cm.lock spin-lock when adding the id to the
 + *   listener tree. When false, the caller must already hold the spin-
 lock,
 + *   and compare_data must be NULL.
 + */
 +static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
 +  __be64 service_mask,
 +  struct ib_cm_compare_data *compare_data,
 +  bool lock)
  {
  struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
 -unsigned long flags;
 +unsigned long flags = 0;
  int ret = 0;

  service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
 @@ -959,7 +994,8 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64
 service_id, __be64 service_mask,

  cm_id-state = IB_CM_LISTEN;

 -spin_lock_irqsave(cm.lock, flags);
 +if (lock)
 +spin_lock_irqsave(cm.lock, flags);
 
 I'm not a fan of this sort of locking structure.  Why not just move the 
 locking into the outside calls completely?  I.e. move to ib_cm_listen() 
 instead of passing in true.

The reason is that this function can sleep when called compare_data !=
NULL, allocating the id's compare_data with GFP_KERNEL. But, since the
compare_data is going away in a later patch, I can actually fix the
locking at that point. I'll change the patch that removes compare_data
to also remove the lock parameter.

 
 
  if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
  cm_id-service_id = cpu_to_be64(cm.listen_service_id++);
  cm_id-service_mask = ~cpu_to_be64(0);
 @@ -968,7 +1004,8 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64
 service_id, __be64 service_mask,
  cm_id-service_mask = service_mask;
  }
  cur_cm_id_priv = cm_insert_listen(cm_id_priv);
 -spin_unlock_irqrestore(cm.lock, flags);
 +if (lock)
 +spin_unlock_irqrestore(cm.lock, flags);

  if (cur_cm_id_priv) {
  cm_id-state = IB_CM_IDLE;
 @@ -978,8 +1015,86 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64
 service_id, __be64 service_mask,
  }
  return ret;
  }
 +
 +int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64
 service_mask,
 + struct ib_cm_compare_data *compare_data)
 +{
 +return __ib_cm_listen(cm_id, service_id, service_mask, compare_data,
 +

[net-next 16/17] fm10k: fix iov_msg_lport_state_pf issue

From: Jacob Keller jacob.e.kel...@intel.com

When a VF issues an LPORT_STATE request to enable a port that is already
enabled, the PF will first disable the VF LPORT. Then it should
re-enable the VF again with the new requested settings. This ensures
that any switch rules are cleared by deleting the LPORT on the switch.
However, the flow is bugged because we actually check if the VF is
enabled at the end, and thus don't re-enable it. Fix the flow so that we
actually clear the enabled flags as part of our removal of the LPORT.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index ab81c00..54d1cd9 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1351,6 +1351,14 @@ s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *hw, 
u32 **results,
err = fm10k_update_lport_state_pf(hw, vf_info-glort,
  1, false);
 
+   /* we need to clear VF_FLAG_ENABLED flags in order to ensure
+* that we actually re-enable the LPORT state below. Note that
+* this has no impact if the VF is already disabled, as the
+* flags are already cleared.
+*/
+   if (!err)
+   vf_info-vf_flags = FM10K_VF_FLAG_CAPABLE(vf_info);
+
/* when enabling the port we should reset the rate limiters */
hw-iov.ops.configure_tc(hw, vf_info-vf_idx, vf_info-rate);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 08/17] fm10k: remove extraneous NULL check on l2_accel

From: Jacob Keller jacob.e.kel...@intel.com

l2_accel was checked for NULL at the top of fm10k_dfwd_del_station, and
we return if it is not defined. Due to this, we already know it can't be
null here so a separate check is meaningless. Discovered via cppcheck.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 4c6b511..99228bf 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1333,8 +1333,7 @@ static void fm10k_dfwd_del_station(struct net_device 
*dev, void *priv)
dglort.rss_l = fls(interface-ring_feature[RING_F_RSS].mask);
dglort.pc_l = fls(interface-ring_feature[RING_F_QOS].mask);
dglort.glort = interface-glort;
-   if (l2_accel)
-   dglort.shared_l = fls(l2_accel-size);
+   dglort.shared_l = fls(l2_accel-size);
hw-mac.ops.configure_dglort_map(hw, dglort);
 
/* If table is empty remove it */
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 06/17] fm10k: add call to fm10k_clean_all_rx_rings in fm10k_down

From: Jacob Keller jacob.e.kel...@intel.com

This prevents a memory leak in fm10k_set_ringparams. The leak occurs
because we go down, change ring parameters, and then come up. However,
fm10k_down on its own is not clearing the Rx rings. Since fm10k_up
assumes the rings are clean we basically drop the buffers and leak a
bunch of memory. Eventually we hit dirty page faults and reboot the
system. This issue does not occur elsewhere because other flows that
involve fm10k_down go through fm10k_close which immediately called
fm10k_free_all_rx_resources which properly cleans the rings.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index df9fda3..445014a 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1559,6 +1559,7 @@ void fm10k_down(struct fm10k_intfc *interface)
 
/* free any buffers still on the rings */
fm10k_clean_all_tx_rings(interface);
+   fm10k_clean_all_rx_rings(interface);
 }
 
 /**
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 02/17] fm10k: ignore invalid multicast address entries

From: Jacob Keller jacob.e.kel...@intel.com

This change fixes an issue with adding an invalid multicast address
using the iproute2 tool (ip maddr add MADDR dev dev). The iproute2
tool and the kernel do not validate or filter the multicast addresses
when adding them to the multicast list. Thus, when synchronizing this
list with an invalid entry, the action will be aborted with an error
since the fm10k driver currently validates the list. Consequently,
multicast entries beyond the invalid one will not be processed and
communicated with the switch via the mailbox. This change makes it so
that invalid addresses will simply be skipped and allows synchronizing
the full list to proceed.

Signed-off-by: Ngai-Mint Kwan ngai-mint.k...@intel.com
Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 2f4f41b..4c6b511 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -923,18 +923,12 @@ static int __fm10k_mc_sync(struct net_device *dev,
struct fm10k_intfc *interface = netdev_priv(dev);
struct fm10k_hw *hw = interface-hw;
u16 vid, glort = interface-glort;
-   s32 err;
-
-   if (!is_multicast_ether_addr(addr))
-   return -EADDRNOTAVAIL;
 
/* update table with current entries */
for (vid = hw-mac.default_vid ? fm10k_find_next_vlan(interface, 0) : 0;
 vid  VLAN_N_VID;
 vid = fm10k_find_next_vlan(interface, vid)) {
-   err = hw-mac.ops.update_mc_addr(hw, glort, addr, vid, sync);
-   if (err)
-   return err;
+   hw-mac.ops.update_mc_addr(hw, glort, addr, vid, sync);
}
 
return 0;
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 17/17] fm10k: Fix missing braces after if statement

From: Alexander Duyck alexander.h.du...@redhat.com

While reviewing the code I noticed that one of the commits added an if
statement followed by a for loop, but the if statement was missing the
braces around the loop.  This change corrects the coding style error.

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
Acked-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 06f0b08..c6dc968 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -143,12 +143,13 @@ static void fm10k_get_strings(struct net_device *dev, u32 
stringset, u8 *data)
p += ETH_GSTRING_LEN;
}
 
-   if (interface-hw.mac.type != fm10k_mac_vf)
+   if (interface-hw.mac.type != fm10k_mac_vf) {
for (i = 0; i  FM10K_PF_STATS_LEN; i++) {
memcpy(p, 
fm10k_gstrings_pf_stats[i].stat_string,
   ETH_GSTRING_LEN);
p += ETH_GSTRING_LEN;
}
+   }
 
for (i = 0; i  interface-hw.mac.max_queues; i++) {
sprintf(p, tx_queue_%u_packets, i);
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 09/17] fm10k: trivial fixup message style to include a colon

From: Jacob Keller jacob.e.kel...@intel.com

Fix up error message style to include a colon.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 445014a..5269b16 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1773,7 +1773,7 @@ static int fm10k_probe(struct pci_dev *pdev,
   fm10k_driver_name);
if (err) {
dev_err(pdev-dev,
-   pci_request_selected_regions failed 0x%x\n, err);
+   pci_request_selected_regions failed: 0x%x\n, err);
goto err_pci_reg;
}
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 15/17] fm10k: remove err_no reference in fm10k_mbx.c

From: Jacob Keller jacob.e.kel...@intel.com

The reference to err_no was left around after a previous code refactor.
We never use the value, and it doesn't seem to be used in side a hidden
macro reference. Discovered via cppcheck.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
index 1b27383..1a4b526 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
@@ -1259,16 +1259,11 @@ static s32 fm10k_mbx_process_error(struct fm10k_hw *hw,
   struct fm10k_mbx_info *mbx)
 {
const u32 *hdr = mbx-mbx_hdr;
-   s32 err_no;
u16 head;
 
/* we will need to pull all of the fields for verification */
head = FM10K_MSG_HDR_FIELD_GET(*hdr, HEAD);
 
-   /* we only have lower 10 bits of error number so add upper bits */
-   err_no = FM10K_MSG_HDR_FIELD_GET(*hdr, ERR_NO);
-   err_no |= ~FM10K_MSG_HDR_MASK(ERR_NO);
-
switch (mbx-state) {
case FM10K_STATE_OPEN:
case FM10K_STATE_DISCONNECT:
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 14/17] fm10k: fix incorrect DIR_NEVATIVE bit in 1588 code

From: Jacob Keller jacob.e.kel...@intel.com

The SYSTIME_CFG.Adjust Direction bit is actually supposed to indicate
that the adjustment is positive. Fix the code to align correctly with
hardware and documentation.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c   | 4 ++--
 drivers/net/ethernet/intel/fm10k/fm10k_type.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 3b94206..ab81c00 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1792,8 +1792,8 @@ static s32 fm10k_adjust_systime_pf(struct fm10k_hw *hw, 
s32 ppb)
if (systime_adjust  FM10K_SW_SYSTIME_ADJUST_MASK)
return FM10K_ERR_PARAM;
 
-   if (ppb  0)
-   systime_adjust |= FM10K_SW_SYSTIME_ADJUST_DIR_NEGATIVE;
+   if (ppb  0)
+   systime_adjust |= FM10K_SW_SYSTIME_ADJUST_DIR_POSITIVE;
 
fm10k_write_sw_reg(hw, FM10K_SW_SYSTIME_ADJUST, (u32)systime_adjust);
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_type.h 
b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
index 4af9668..2a17d82 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_type.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
@@ -369,7 +369,7 @@ struct fm10k_hw;
 /* Registers contained in BAR 4 for Switch management */
 #define FM10K_SW_SYSTIME_ADJUST0x0224D
 #define FM10K_SW_SYSTIME_ADJUST_MASK   0x3FFF
-#define FM10K_SW_SYSTIME_ADJUST_DIR_NEGATIVE   0x8000
+#define FM10K_SW_SYSTIME_ADJUST_DIR_POSITIVE   0x8000
 #define FM10K_SW_SYSTIME_PULSE(_n) ((_n) + 0x02252)
 
 enum fm10k_int_source {
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 10/17] fm10k: use dma_set_mask_and_coherent in fm10k_probe

From: Jacob Keller jacob.e.kel...@intel.com

This patch cleans up the use of dma_get_required_mask and uses the
simpler dma_set_mask_and_coherent function instead of doing these as
separate steps.

I removed the dma_get_required_mask call because based on some minimal
testing it appears that either (a) we're not doing the right thing with
the call or (b) we don't need it anyways. If the value returned is
48bits, we'll end up trying with 48 bits anyways. If it's over 48bits,
fm10k can't support that anyways, and we should try 48bits. If 48bits
fails, we'll fallback to 32bits. This cleans up some very funky code.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 5269b16..0381c8d1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1741,30 +1741,18 @@ static int fm10k_probe(struct pci_dev *pdev,
struct fm10k_intfc *interface;
struct fm10k_hw *hw;
int err;
-   u64 dma_mask;
 
err = pci_enable_device_mem(pdev);
if (err)
return err;
 
-   /* By default fm10k only supports a 48 bit DMA mask */
-   dma_mask = DMA_BIT_MASK(48) | dma_get_required_mask(pdev-dev);
-
-   if ((dma_mask = DMA_BIT_MASK(32)) ||
-   dma_set_mask_and_coherent(pdev-dev, dma_mask)) {
-   dma_mask = DMA_BIT_MASK(32);
-
+   err = dma_set_mask_and_coherent(pdev-dev, DMA_BIT_MASK(48));
+   if (err)
err = dma_set_mask_and_coherent(pdev-dev, DMA_BIT_MASK(32));
-   err = dma_set_mask(pdev-dev, DMA_BIT_MASK(32));
-   if (err) {
-   err = dma_set_coherent_mask(pdev-dev,
-   DMA_BIT_MASK(32));
-   if (err) {
-   dev_err(pdev-dev,
-   No usable DMA configuration, 
aborting\n);
-   goto err_dma;
-   }
-   }
+   if (err) {
+   dev_err(pdev-dev,
+   DMA configuration failed: 0x%x\n, err);
+   goto err_dma;
}
 
err = pci_request_selected_regions(pdev,
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 13/17] fm10k: pack TLV overlay structures

From: Jacob Keller jacob.e.kel...@intel.com

This patch adds the __attribute__((packed)) indicator to some structures
which are overlayed onto a TLV message. These structures must be packed
as small as possible in order to correctly align when copied into the
mailbox buffer. Without doing so, the receiving mailbox code incorrectly
parses the values and we get invalid message responses from the switch
manager software.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
index 7ab1db4..40a0dbc 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
@@ -81,26 +81,26 @@ struct fm10k_mac_update {
__le16  glort;
u8  flags;
u8  action;
-};
+} __packed;
 
 struct fm10k_global_table_data {
__le32  used;
__le32  avail;
-};
+} __packed;
 
 struct fm10k_swapi_error {
__le32  status;
struct fm10k_global_table_data  mac;
struct fm10k_global_table_data  nexthop;
struct fm10k_global_table_data  ffu;
-};
+} __packed;
 
 struct fm10k_swapi_1588_timestamp {
__le64 egress;
__le64 ingress;
__le16 dglort;
__le16 sglort;
-};
+} __packed;
 
 s32 fm10k_msg_lport_map_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *);
 extern const struct fm10k_tlv_attr fm10k_lport_map_msg_attr[];
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 07/17] fm10k: use an unsigned int for i in ethtool_get_strings

From: Jacob Keller jacob.e.kel...@intel.com

The value will never be negative, and we use the %i print format, use
unsigned int for the loop counter. Issue found using cppcheck.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 4b9d9f8..06f0b08 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -124,7 +124,7 @@ static void fm10k_get_strings(struct net_device *dev, u32 
stringset, u8 *data)
 {
struct fm10k_intfc *interface = netdev_priv(dev);
char *p = (char *)data;
-   int i;
+   unsigned int i;
 
switch (stringset) {
case ETH_SS_TEST:
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 05/17] fm10k: fix incorrect free on skb in ts_tx_enqueue

From: Jacob Keller jacob.e.kel...@intel.com

This patch resolves a bug in the ts_tx_enqueue code responsible for a
NULL pointer dereference and invalid access of the skb list. We
incorrectly freed the actual skb we found instead of our copy. Thus the
skb queue is essentially invalidated. Resolve this by freeing our clone
in the cases where we did not add it to the queue. This also avoids the
skb memory leak caused by failure to free the clone.

[  589.719320] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  589.722344] IP: [a0310e60] fm10k_ts_tx_subtask+0xb0/0x160 [fm10k]
[  589.723796] PGD 0
[  589.725228] Oops:  [#1] SMP

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_ptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
index 39b8328..b4945e8 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
@@ -79,7 +79,7 @@ void fm10k_ts_tx_enqueue(struct fm10k_intfc *interface, 
struct sk_buff *skb)
 
/* if list is already has one then we just free the clone */
if (skb)
-   kfree_skb(skb);
+   dev_kfree_skb(clone);
 }
 
 void fm10k_ts_tx_hwtstamp(struct fm10k_intfc *interface, __le16 dglort,
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 03/17] fm10k: use correct ethernet driver Tx timestamp function

From: Jacob Keller jacob.e.kel...@intel.com

skb_complete_tx_timestamp is intended for use by PHY drivers which
implement a different method of returning timestamps. This method is
intended to be used after a PHY driver accepts a cloned packet via its
phy_driver.txtstamp function. It is not correct to use in the standard
ethernet driver such as fm10k. This patch fixes the following possible
kernel panic.

[ 2744.552896] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW  OE  
3.19.3-200.fc21.x86_64 #1
[ 2744.552899] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS 
SE5C600.86B.02.03.8x23.060520140825 06/05/2014
[ 2744.552901]   2f4c8b10ea3f9848 88081ee03a38 
8176e215
[ 2744.552906]    88081ee03a78 
8109bc1a
[ 2744.552910]  88081ee03c50 88080e55fc00 88080e55fc00 
81647c50
[ 2744.552914] Call Trace:
[ 2744.552917]  IRQ  [8176e215] dump_stack+0x45/0x57
[ 2744.552931]  [8109bc1a] warn_slowpath_common+0x8a/0xc0
[ 2744.552936]  [81647c50] ? skb_queue_purge+0x20/0x40
[ 2744.552941]  [8109bd4a] warn_slowpath_null+0x1a/0x20
[ 2744.552946]  [81646911] skb_release_head_state+0xe1/0xf0
[ 2744.552950]  [81647b26] skb_release_all+0x16/0x30
[ 2744.552954]  [81647ba6] kfree_skb+0x36/0x90
[ 2744.552958]  [81647c50] skb_queue_purge+0x20/0x40
[ 2744.552964]  [81751f8d] packet_sock_destruct+0x1d/0x90
[ 2744.552968]  [81642053] __sk_free+0x23/0x140
[ 2744.552973]  [81642189] sk_free+0x19/0x20
[ 2744.552977]  [81647d60] skb_complete_tx_timestamp+0x50/0x60
[ 2744.552988]  [a02eee40] fm10k_ts_tx_hwtstamp+0xd0/0x100 [fm10k]
[ 2744.552994]  [a02e054e] fm10k_1588_msg_pf+0x12e/0x140 [fm10k]
[ 2744.553002]  [a02edf1d] fm10k_tlv_msg_parse+0x8d/0xc0 [fm10k]
[ 2744.553010]  [a02eb2d0] fm10k_mbx_dequeue_rx+0x60/0xb0 [fm10k]
[ 2744.553016]  [a02ebf98] fm10k_sm_mbx_process+0x178/0x3c0 [fm10k]
[ 2744.553022]  [a02e09ca] fm10k_msix_mbx_pf+0xfa/0x360 [fm10k]
[ 2744.553030]  [811030a7] ? get_next_timer_interrupt+0x1f7/0x270
[ 2744.553036]  [810f2a47] handle_irq_event_percpu+0x77/0x1a0
[ 2744.553041]  [810f2bab] handle_irq_event+0x3b/0x60
[ 2744.553045]  [810f5d6e] handle_edge_irq+0x6e/0x120
[ 2744.553054]  [81017414] handle_irq+0x74/0x140
[ 2744.553061]  [810bb54a] ? atomic_notifier_call_chain+0x1a/0x20
[ 2744.553066]  [817f] do_IRQ+0x4f/0xf0
[ 2744.553072]  [8177556d] common_interrupt+0x6d/0x6d
[ 2744.553074]  EOI  [81609b16] ? cpuidle_enter_state+0x66/0x160
[ 2744.553084]  [81609b01] ? cpuidle_enter_state+0x51/0x160
[ 2744.553087]  [81609cf7] cpuidle_enter+0x17/0x20
[ 2744.553092]  [810de101] cpu_startup_entry+0x321/0x3c0
[ 2744.553098]  [81764497] rest_init+0x77/0x80
[ 2744.553103]  [81d4f02c] start_kernel+0x4a4/0x4c5
[ 2744.553107]  [81d4e120] ? early_idt_handlers+0x120/0x120
[ 2744.553110]  [81d4e4d7] x86_64_start_reservations+0x2a/0x2c
[ 2744.553114]  [81d4e62b] x86_64_start_kernel+0x152/0x175

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_ptp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
index 9043633..95f1d62 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
@@ -103,9 +103,10 @@ void fm10k_ts_tx_hwtstamp(struct fm10k_intfc *interface, 
__le16 dglort,
if (!skb)
return;
 
-   /* timestamp the sk_buff and return it to the socket */
+   /* timestamp the sk_buff and free out copy */
fm10k_systime_to_hwtstamp(interface, shhwtstamps, systime);
-   skb_complete_tx_timestamp(skb, shhwtstamps);
+   skb_tstamp_tx(skb, shhwtstamps);
+   dev_kfree_skb_any(skb);
 }
 
 void fm10k_ts_tx_subtask(struct fm10k_intfc *interface)
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 04/17] fm10k: move setting shinfo inside ts_tx_enqueue

From: Jacob Keller jacob.e.kel...@intel.com

This patch simplifies the code flow for setting the IN_PROGRESS bit of
the shinfo for an skb we will be timestamping.

Reported-by: Eric Dumazet eric.duma...@gmail.com
Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_ptp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
index 95f1d62..39b8328 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ptp.c
@@ -70,16 +70,16 @@ void fm10k_ts_tx_enqueue(struct fm10k_intfc *interface, 
struct sk_buff *skb)
 * if none are present then insert skb in tail of list
 */
skb = fm10k_ts_tx_skb(interface, FM10K_CB(clone)-fi.w.dglort);
-   if (!skb)
+   if (!skb) {
+   skb_shinfo(clone)-tx_flags |= SKBTX_IN_PROGRESS;
__skb_queue_tail(list, clone);
+   }
 
spin_unlock_irqrestore(list-lock, flags);
 
/* if list is already has one then we just free the clone */
if (skb)
kfree_skb(skb);
-   else
-   skb_shinfo(clone)-tx_flags |= SKBTX_IN_PROGRESS;
 }
 
 void fm10k_ts_tx_hwtstamp(struct fm10k_intfc *interface, __le16 dglort,
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 11/17] fm10k: force LPORT delete when updating VLAN or MAC address

From: Jacob Keller jacob.e.kel...@intel.com

Currently, we don't notify the switch at all when the PF
administratively sets a new VLAN or MAC address. This causes the old
addresses to remain valid on the switch table. Since the PF is
overriding any configuration done directly by the VF, we choose to
simply re-create the LPORT for the VF. This does mean that all rules for
the VF will be dropped when we set something directly via the PF, but it
prevents some weird issues where the MAC/VLAN table retains some stale
configuration.

Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 38 +---
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 5b08e62..94571e6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -400,11 +400,31 @@ int fm10k_iov_configure(struct pci_dev *pdev, int num_vfs)
return num_vfs;
 }
 
+static inline void fm10k_reset_vf_info(struct fm10k_intfc *interface,
+  struct fm10k_vf_info *vf_info)
+{
+   struct fm10k_hw *hw = interface-hw;
+
+   /* assigning the MAC address will send a mailbox message */
+   fm10k_mbx_lock(interface);
+
+   /* disable LPORT for this VF which clears switch rules */
+   hw-iov.ops.reset_lport(hw, vf_info);
+
+   /* assign new MAC+VLAN for this VF */
+   hw-iov.ops.assign_default_mac_vlan(hw, vf_info);
+
+   /* re-enable the LPORT for this VF */
+   hw-iov.ops.set_lport(hw, vf_info, vf_info-vf_idx,
+ FM10K_VF_FLAG_MULTI_CAPABLE);
+
+   fm10k_mbx_unlock(interface);
+}
+
 int fm10k_ndo_set_vf_mac(struct net_device *netdev, int vf_idx, u8 *mac)
 {
struct fm10k_intfc *interface = netdev_priv(netdev);
struct fm10k_iov_data *iov_data = interface-iov_data;
-   struct fm10k_hw *hw = interface-hw;
struct fm10k_vf_info *vf_info;
 
/* verify SR-IOV is active and that vf idx is valid */
@@ -419,13 +439,7 @@ int fm10k_ndo_set_vf_mac(struct net_device *netdev, int 
vf_idx, u8 *mac)
vf_info = iov_data-vf_info[vf_idx];
ether_addr_copy(vf_info-mac, mac);
 
-   /* assigning the MAC will send a mailbox message so lock is needed */
-   fm10k_mbx_lock(interface);
-
-   /* assign MAC address to VF */
-   hw-iov.ops.assign_default_mac_vlan(hw, vf_info);
-
-   fm10k_mbx_unlock(interface);
+   fm10k_reset_vf_info(interface, vf_info);
 
return 0;
 }
@@ -455,16 +469,10 @@ int fm10k_ndo_set_vf_vlan(struct net_device *netdev, int 
vf_idx, u16 vid,
/* record default VLAN ID for VF */
vf_info-pf_vid = vid;
 
-   /* assigning the VLAN will send a mailbox message so lock is needed */
-   fm10k_mbx_lock(interface);
-
/* Clear the VLAN table for the VF */
hw-mac.ops.update_vlan(hw, FM10K_VLAN_ALL, vf_info-vsi, false);
 
-   /* Update VF assignment and trigger reset */
-   hw-iov.ops.assign_default_mac_vlan(hw, vf_info);
-
-   fm10k_mbx_unlock(interface);
+   fm10k_reset_vf_info(interface, vf_info);
 
return 0;
 }
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next 09/17] fm10k: trivial fixup message style to include a colon


Hello.

On 6/16/2015 4:47 PM, Jeff Kirsher wrote:


From: Jacob Keller jacob.e.kel...@intel.com



Fix up error message style to include a colon.



Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
Tested-by: Krishneil Singh krishneil.k.si...@intel.com
Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
---
  drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 445014a..5269b16 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1773,7 +1773,7 @@ static int fm10k_probe(struct pci_dev *pdev,
   fm10k_driver_name);
if (err) {
dev_err(pdev-dev,
-   pci_request_selected_regions failed 0x%x\n, err);
+   pci_request_selected_regions failed: 0x%x\n, err);


   I don't think printing error in hexadecimal makes much sense, so you might 
as well fix that format to %d...


WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Setting up interfaces in loopback mode using SIOCETHTOOL

2015-06-16 Thread Skidmore, Donald C

 -Original Message-
 From: netdev-ow...@vger.kernel.org [mailto:netdev-
 ow...@vger.kernel.org] On Behalf Of Ashutosh Tripathi
 Sent: Tuesday, June 16, 2015 3:52 AM
 To: netdev@vger.kernel.org
 Subject: Setting up interfaces in loopback mode using SIOCETHTOOL

 Hi All,

 In spite of several days of trying to get information out of ethtool source
 code, I could not get a way to set the loopback mode for network interfaces.
 I was also referring to the driver code for ixgbe devices to get more info on
 the support it provides to ethtool. In the driver code, there seems to be a
 way to set the loopback mode. But I had no idea how to refer to this code
 from user space.

 The SIOCETHTOOL is pretty general purpose. There is a way to test the
 interface in loopback mode via the command line interface. But I just wanted
 to set the interface in loopback mode so that I could send packets on the
 interface myself and stress the interface.

 Please help me with any suggestion in this regard.

 Thanks,

 Ashutosh
 --
 To unsubscribe from this list: send the line unsubscribe netdev in the body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html

Hey Ashutosh,

Currently ixgbe only uses MAC loopback via the ethtool self_test, which doesn't 
sound like what you're looking for.  I don't know of an ethtool interface to 
tell an adapter to go into loopback mode, but this might just be because the 
driver I currently work on (ixgbe) doesn't support it. :)

Thanks,
-Don Skidmore donald.c.skidm...@intel.com

Re: [PATCH 7/7] slub: initial bulk free implementation

On Tue, 16 Jun 2015, Joonsoo Kim wrote:

 So, in your test, most of objects may come from one or two slabs and your
 algorithm is well optimized for this case. But, is this workload normal case?

It is normal if the objects were bulk allocated because SLUB ensures that
all objects are first allocated from one page before moving to another.

 If most of objects comes from many different slabs, bulk free API does
 enabling/disabling interrupt very much so I guess it work worse than
 just calling __kmem_cache_free_bulk(). Could you test this case?

In case of SLAB this would be an issue since the queueing mechanism
destroys spatial locality. This is much less an issue for SLUB.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: initial bulk free implementation

2015-06-16 Thread Jesper Dangaard Brouer

On Tue, 16 Jun 2015 10:10:25 -0500 (CDT)
Christoph Lameter c...@linux.com wrote:

On Tue, 16 Jun 2015, Joonsoo Kim wrote:

So, in your test, most of objects may come from one or two slabs and your
algorithm is well optimized for this case. But, is this workload normal
case?

It is normal if the objects were bulk allocated because SLUB ensures that
all objects are first allocated from one page before moving to another.

Yes, exactly. Maybe SLAB is different? If so, then we can handle that
in the SLAB specific bulk implementation.

If most of objects comes from many different slabs, bulk free API does
enabling/disabling interrupt very much so I guess it work worse than
just calling __kmem_cache_free_bulk(). Could you test this case?

In case of SLAB this would be an issue since the queueing mechanism
destroys spatial locality. This is much less an issue for SLUB.

I think Kim is worried about the cost of the enable/disable calls, when
the slowpath gets called. But it is not a problem because the cost of
local_irq_{disable,enable} is very low (total cost 7 cycles).

It is very important that everybody realizes that the save+restore
variant is very expensive, this is key:

CPU: i7-4790K CPU @ 4.00GHz
* local_irq_{disable,enable}: 7 cycles(tsc) - 1.821 ns
* local_irq_{save,restore} : 37 cycles(tsc) - 9.443 ns

Even if EVERY object need to call slowpath/__slab_free() it will be
faster than calling the fallback. Because I've demonstrated the call
this_cpu_cmpxchg_double() costs 9 cycles.

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer

p.s. for comparison[1] a function call cost is 5-6 cycles, and a function
pointer call cost is 6-10 cycles, depending on CPU.

[1]
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH net] ipv4: include NLM_F_APPEND flag in append route notifications

2015-06-16 Thread Roopa Prabhu

From: Roopa Prabhu ro...@cumulusnetworks.com

This patch adds NLM_F_APPEND flag to struct nlmsg_hdr-nlmsg_flags
in newroute notifications if the route add was an append.
(This is similar to how NLM_F_REPLACE is already part of new
route replace notifications today)

This helps userspace determine if the route add operation was
an append.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 net/ipv4/fib_trie.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 3c699c4..38ffc20 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1082,6 +1082,7 @@ int fib_table_insert(struct fib_table *tb, struct 
fib_config *cfg)
struct trie *t = (struct trie *)tb-tb_data;
struct fib_alias *fa, *new_fa;
struct key_vector *l, *tp;
+   unsigned int nlflags = 0;
struct fib_info *fi;
u8 plen = cfg-fc_dst_len;
u8 slen = KEYLENGTH - plen;
@@ -1203,6 +1204,8 @@ int fib_table_insert(struct fib_table *tb, struct 
fib_config *cfg)
 
if (!(cfg-fc_nlflags  NLM_F_APPEND))
fa = fa_first;
+   else
+   nlflags |= NLM_F_APPEND;
}
err = -ENOENT;
if (!(cfg-fc_nlflags  NLM_F_CREATE))
@@ -1238,7 +1241,7 @@ int fib_table_insert(struct fib_table *tb, struct 
fib_config *cfg)
 
rt_cache_flush(cfg-fc_nlinfo.nl_net);
rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa-tb_id,
- cfg-fc_nlinfo, 0);
+ cfg-fc_nlinfo, nlflags);
 succeeded:
return 0;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: initial bulk free implementation

On Tue, 16 Jun 2015, Joonsoo Kim wrote:

  If adding these, then I would also need to add those on alloc path...

 Yes, please.

Lets fall back to the generic implementation for any of these things. We
need to focus on maximum performance in these functions. The more special
cases we have to handle the more all of this gets compromised.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2] ss: Include -E option for socket destroy events

2015-06-16 Thread Craig Gallek

Use the IPv4/IPv6/TCP/UDP multicast groups of NETLINK_SOCK_DIAG
to filter and display socket statistics as they are destroyed.

Kernel support patch series: 24029a3603cfa633e8bc2b3fb3e48e76c497831d

Signed-off-by: Craig Gallek kr...@google.com
---
 include/linux/inet_diag.h |  3 +-
 include/linux/sock_diag.h | 10 +++
 misc/ss.c | 72 ++-
 3 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 0fb76bb..e83340b 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -111,9 +111,10 @@ enum {
INET_DIAG_SKMEMINFO,
INET_DIAG_SHUTDOWN,
INET_DIAG_DCTCPINFO,
+   INET_DIAG_PROTOCOL,  /* response attribute only */
 };
 
-#define INET_DIAG_MAX INET_DIAG_DCTCPINFO
+#define INET_DIAG_MAX INET_DIAG_PROTOCOL
 
 /* INET_DIAG_MEM */
 
diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index 78996e2..024e1f4 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -23,4 +23,14 @@ enum {
SK_MEMINFO_VARS,
 };
 
+enum sknetlink_groups {
+   SKNLGRP_NONE,
+   SKNLGRP_INET_TCP_DESTROY,
+   SKNLGRP_INET_UDP_DESTROY,
+   SKNLGRP_INET6_TCP_DESTROY,
+   SKNLGRP_INET6_UDP_DESTROY,
+   __SKNLGRP_MAX,
+};
+#define SKNLGRP_MAX(__SKNLGRP_MAX - 1)
+
 #endif /* __SOCK_DIAG_H__ */
diff --git a/misc/ss.c b/misc/ss.c
index dba0901..9e59257 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -99,6 +99,7 @@ int show_proc_ctx = 0;
 int show_sock_ctx = 0;
 /* If show_users  show_proc_ctx only do user_ent_hash_build() once */
 int user_ent_hash_build_init = 0;
+int follow_events = 0;
 
 int netid_width;
 int state_width;
@@ -2030,6 +2031,9 @@ static int inet_show_sock(struct nlmsghdr *nlh, struct 
filter *f, int protocol)
if (f  f-f  run_ssfilter(f-f, s) == 0)
return 0;
 
+   if (tb[INET_DIAG_PROTOCOL])
+   protocol = *(__u8 *)RTA_DATA(tb[INET_DIAG_PROTOCOL]);
+
inet_stats_print(s, protocol);
 
if (show_options) {
@@ -3217,6 +3221,64 @@ static int netlink_show(struct filter *f)
return 0;
 }
 
+struct sock_diag_msg {
+   __u8 sdiag_family;
+};
+
+static int generic_show_sock(const struct sockaddr_nl *addr,
+   struct nlmsghdr *nlh, void *arg)
+{
+   struct sock_diag_msg *r = NLMSG_DATA(nlh);
+   struct inet_diag_arg inet_arg = { .f = arg, .protocol = IPPROTO_MAX };
+
+   switch (r-sdiag_family) {
+   case AF_INET:
+   case AF_INET6:
+   return show_one_inet_sock(addr, nlh, inet_arg);
+   case AF_UNIX:
+   return unix_show_sock(addr, nlh, arg);
+   case AF_PACKET:
+   return packet_show_sock(addr, nlh, arg);
+   case AF_NETLINK:
+   return netlink_show_sock(addr, nlh, arg);
+   default:
+   return -1;
+   }
+}
+
+static int handle_follow_request(struct filter *f)
+{
+   int ret = -1;
+   int groups = 0;
+   struct rtnl_handle rth;
+
+   if (f-families  (1  AF_INET)  f-dbs  (1  TCP_DB))
+   groups |= 1  (SKNLGRP_INET_TCP_DESTROY - 1);
+   if (f-families  (1  AF_INET)  f-dbs  (1  UDP_DB))
+   groups |= 1  (SKNLGRP_INET_UDP_DESTROY - 1);
+   if (f-families  (1  AF_INET6)  f-dbs  (1  TCP_DB))
+   groups |= 1  (SKNLGRP_INET6_TCP_DESTROY - 1);
+   if (f-families  (1  AF_INET6)  f-dbs  (1  UDP_DB))
+   groups |= 1  (SKNLGRP_INET6_UDP_DESTROY - 1);
+
+   if (groups == 0)
+   return -1;
+
+   if (rtnl_open_byproto(rth, groups, NETLINK_SOCK_DIAG))
+   return -1;
+
+   rth.dump = 0;
+   rth.local.nl_pid = 0;
+
+   if (rtnl_dump_filter(rth, generic_show_sock, f))
+   goto Exit;
+
+   ret = 0;
+Exit:
+   rtnl_close(rth);
+   return ret;
+}
+
 struct snmpstat
 {
int tcp_estab;
@@ -3399,6 +3461,7 @@ static void _usage(FILE *dest)
-i, --info  show internal TCP information\n
-s, --summary   show socket usage summary\n
-b, --bpf   show bpf filter socket information\n
+   -E, --eventscontinually display sockets as they are destroyed\n
-Z, --context   display process SELinux security contexts\n
-z, --contexts  display process and socket SELinux security contexts\n
-N, --net   switch to the specified network namespace name\n
@@ -3481,6 +3544,7 @@ static const struct option long_opts[] = {
{ info, 0, 0, 'i' },
{ processes, 0, 0, 'p' },
{ bpf, 0, 0, 'b' },
+   { events, 0, 0, 'E' },
{ dccp, 0, 0, 'd' },
{ tcp, 0, 0, 't' },
{ udp, 0, 0, 'u' },
@@ -3516,7 +3580,7 @@ int main(int argc, char *argv[])
int ch;
int state_filter = 0;
 
-   while ((ch = getopt_long(argc, argv, 
dhaletuwxnro460spbf:miA:D:F:vVzZN:,
+   while ((ch = getopt_long(argc, argv,

Re: [PATCH net] ipv4: include NLM_F_APPEND flag in append route notifications

2015-06-16 Thread roopa


On 6/16/15, 9:11 AM, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

This patch adds NLM_F_APPEND flag to struct nlmsg_hdr-nlmsg_flags
in newroute notifications if the route add was an append.
(This is similar to how NLM_F_REPLACE is already part of new
route replace notifications today)

This helps userspace determine if the route add operation was
an append.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com

please ignore, I plan to resend the patch against net-next
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab

On Tue, 16 Jun 2015, Joonsoo Kim wrote:

 Now I found that we need to call slab_pre_alloc_hook() before any operation
 on kmem_cache to support kmemcg accounting. And, we need to call
 slab_post_alloc_hook() on every allocated objects to support many
 debugging features like as kasan and kmemleak

Use the fallback function for any debugging avoids that. This needs to be
fast. If the performance is not wanted (debugging etc active) then the
fallback should be fine.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: initial bulk free implementation

On Tue, 16 Jun 2015, Jesper Dangaard Brouer wrote:

 It is very important that everybody realizes that the save+restore
 variant is very expensive, this is key:

 CPU: i7-4790K CPU @ 4.00GHz
  * local_irq_{disable,enable}:  7 cycles(tsc) - 1.821 ns
  * local_irq_{save,restore}  : 37 cycles(tsc) - 9.443 ns

 Even if EVERY object need to call slowpath/__slab_free() it will be
 faster than calling the fallback.  Because I've demonstrated the call
 this_cpu_cmpxchg_double() costs 9 cycles.

But the cmpxchg also stores a value. You need to add the cost of the store
to the cycles.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ipv4: include NLM_F_APPEND flag in append route notifications

2015-06-16 Thread Roopa Prabhu

From: Roopa Prabhu ro...@cumulusnetworks.com

This patch adds NLM_F_APPEND flag to struct nlmsg_hdr-nlmsg_flags
in newroute notifications if the route add was an append.
(This is similar to how NLM_F_REPLACE is already part of new
route replace notifications today)

This helps userspace determine if the route add operation was
an append.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 net/ipv4/fib_trie.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 3c699c4..38ffc20 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1082,6 +1082,7 @@ int fib_table_insert(struct fib_table *tb, struct 
fib_config *cfg)
struct trie *t = (struct trie *)tb-tb_data;
struct fib_alias *fa, *new_fa;
struct key_vector *l, *tp;
+   unsigned int nlflags = 0;
struct fib_info *fi;
u8 plen = cfg-fc_dst_len;
u8 slen = KEYLENGTH - plen;
@@ -1203,6 +1204,8 @@ int fib_table_insert(struct fib_table *tb, struct 
fib_config *cfg)
 
if (!(cfg-fc_nlflags  NLM_F_APPEND))
fa = fa_first;
+   else
+   nlflags |= NLM_F_APPEND;
}
err = -ENOENT;
if (!(cfg-fc_nlflags  NLM_F_CREATE))
@@ -1238,7 +1241,7 @@ int fib_table_insert(struct fib_table *tb, struct 
fib_config *cfg)
 
rt_cache_flush(cfg-fc_nlinfo.nl_net);
rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa-tb_id,
- cfg-fc_nlinfo, 0);
+ cfg-fc_nlinfo, nlflags);
 succeeded:
return 0;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] packet: read num_members once in packet_rcv_fanout()

2015-06-16 Thread Eric Dumazet

From: Eric Dumazet eduma...@google.com

We need to tell compiler it must not read f-num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()

Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb (packet: Add 'cpu' fanout policy.)

Fixes: dc99f600698dc (packet: Add fanout support.)
Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Willem de Bruijn will...@google.com
---
Note : I chose to use READ_ONCE() but stable backports should use
ACCESS_ONCE()

 net/packet/af_packet.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b5989c6ee551..131545a06f05 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1353,7 +1353,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct 
net_device *dev,
 struct packet_type *pt, struct net_device 
*orig_dev)
 {
struct packet_fanout *f = pt-af_packet_priv;
-   unsigned int num = f-num_members;
+   unsigned int num = READ_ONCE(f-num_members);
struct packet_sock *po;
unsigned int idx;
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crush:Make the function crush_ln static

On Tue, Jun 9, 2015 at 11:00 AM, Ilya Dryomov idryo...@gmail.com wrote:
 On Tue, Jun 9, 2015 at 1:51 AM, Nicholas Krause xerofo...@gmail.com wrote:
 This makes the function, crush_ln static now due to having
 only one caller in its own definition and declaration file
 of mapper.c

 Signed-off-by: Nicholas Krause xerofo...@gmail.com
 ---
  net/ceph/crush/mapper.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c
 index 5b47736..86778d4 100644
 --- a/net/ceph/crush/mapper.c
 +++ b/net/ceph/crush/mapper.c
 @@ -239,7 +239,7 @@ static int bucket_straw_choose(struct crush_bucket_straw 
 *bucket,
  }

  // compute 2^44*log2(input+1)
 -uint64_t crush_ln(unsigned xin)
 +static uint64_t crush_ln(unsigned xin)
  {
  unsigned x=xin, x1;
  int iexpon, index1, index2;

 This is tied up with a bunch of style cleanups, I'll apply it after
 they are sorted out.

I folded your patch into a cleanup commit [1] that also fixed
crush_ln() formatting - I hope you don't mind.

[1] 
https://github.com/ceph/ceph-client/commit/45c7a1f5df419e53c17f2eeb5680d6fb20a07162

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5] NET: Add ezchip ethernet driver

2015-06-16 Thread Noam Camus

From: Noam Camus no...@ezchip.com

Simple LAN device for debug or management purposes.
Device supports interrupts for RX and TX(completion).
Device does not have DMA ability.

Signed-off-by: Noam Camus no...@ezchip.com
Signed-off-by: Tal Zilcer t...@ezchip.com
Acked-by: Alexey Brodkin abrod...@synopsys.com
---
Change log for v5:
Basically its all based on Florian comments.
Main items are:
1) Move all interrupt chore to bottom-half
2) use memcpy_toio/fromio
3) dev_kfree_skb() moved to bottom-half
4) add set_rx_mode callback
5) use platform api toward non-DT platforms
---
 .../devicetree/bindings/net/ezchip_enet.txt|   15 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/ezchip/Kconfig|   27 +
 drivers/net/ethernet/ezchip/Makefile   |1 +
 drivers/net/ethernet/ezchip/nps_enet.c |  652 
 drivers/net/ethernet/ezchip/nps_enet.h |  336 ++
 7 files changed, 1033 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/ezchip_enet.txt
 create mode 100644 drivers/net/ethernet/ezchip/Kconfig
 create mode 100644 drivers/net/ethernet/ezchip/Makefile
 create mode 100644 drivers/net/ethernet/ezchip/nps_enet.c
 create mode 100644 drivers/net/ethernet/ezchip/nps_enet.h

diff --git a/Documentation/devicetree/bindings/net/ezchip_enet.txt 
b/Documentation/devicetree/bindings/net/ezchip_enet.txt
new file mode 100644
index 000..4e29b2b
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/ezchip_enet.txt
@@ -0,0 +1,15 @@
+* EZchip NPS Management Ethernet port driver
+
+Required properties:
+- compatible: Should be ezchip,nps-mgt-enet
+- reg: Address and length of the register set for the device
+- interrupts: Should contain the ENET interrupt
+
+Examples:
+
+   ethernet@f0003000 {
+   compatible = ezchip,nps-mgt-enet;
+   reg = 0xf0003000 0x44;
+   interrupts = 7;
+   mac-address = [ 00 11 22 33 44 55 ];
+   };
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index eadcb05..1a6b1ba 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -66,6 +66,7 @@ config DNET
 source drivers/net/ethernet/dec/Kconfig
 source drivers/net/ethernet/dlink/Kconfig
 source drivers/net/ethernet/emulex/Kconfig
+source drivers/net/ethernet/ezchip/Kconfig
 source drivers/net/ethernet/neterion/Kconfig
 source drivers/net/ethernet/faraday/Kconfig
 source drivers/net/ethernet/freescale/Kconfig
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 1367afc..489f9cc 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_DNET) += dnet.o
 obj-$(CONFIG_NET_VENDOR_DEC) += dec/
 obj-$(CONFIG_NET_VENDOR_DLINK) += dlink/
 obj-$(CONFIG_NET_VENDOR_EMULEX) += emulex/
+obj-$(CONFIG_NET_VENDOR_EZCHIP) += ezchip/
 obj-$(CONFIG_NET_VENDOR_EXAR) += neterion/
 obj-$(CONFIG_NET_VENDOR_FARADAY) += faraday/
 obj-$(CONFIG_NET_VENDOR_FREESCALE) += freescale/
diff --git a/drivers/net/ethernet/ezchip/Kconfig 
b/drivers/net/ethernet/ezchip/Kconfig
new file mode 100644
index 000..d031177
--- /dev/null
+++ b/drivers/net/ethernet/ezchip/Kconfig
@@ -0,0 +1,27 @@
+#
+# EZchip network device configuration
+#
+
+config NET_VENDOR_EZCHIP
+   bool EZchip devices
+   default y
+   ---help---
+ If you have a network (Ethernet) device belonging to this class, say Y
+ and read the Ethernet-HOWTO, available from
+ http://www.tldp.org/docs.html#howto.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about EZchip devices. If you say Y, you will be asked 
for
+ your specific device in the following questions.
+
+if NET_VENDOR_EZCHIP
+
+config EZCHIP_NPS_MANAGEMENT_ENET
+   tristate EZchip NPS management enet support
+   ---help---
+ Simple LAN device for debug or management purposes.
+ Device supports interrupts for RX and TX(completion).
+ Device does not have DMA ability.
+
+endif
diff --git a/drivers/net/ethernet/ezchip/Makefile 
b/drivers/net/ethernet/ezchip/Makefile
new file mode 100644
index 000..e490176
--- /dev/null
+++ b/drivers/net/ethernet/ezchip/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_EZCHIP_NPS_MANAGEMENT_ENET) += nps_enet.o
diff --git a/drivers/net/ethernet/ezchip/nps_enet.c 
b/drivers/net/ethernet/ezchip/nps_enet.c
new file mode 100644
index 000..df3d5b0
--- /dev/null
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -0,0 +1,652 @@
+/*
+ * Copyright(c) 2015 EZchip Technologies.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software

RE: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events

2015-06-16 Thread Hefty, Sean

 The idea is to allow SIDR request to be sorted by the GID, when we will
 have alias GIDs for IPoIB.

Please limit this series, or at least the early patches in this series, to 
simply moving the de-mux out of the ib_cm and into the rdma_cm.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding

2015-06-16 Thread Scott Feldman

On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko j...@resnulli.us wrote:
 Tue, Jun 16, 2015 at 01:25:51AM CEST, da...@davemloft.net wrote:
From: sfel...@gmail.com
Date: Sat, 13 Jun 2015 11:04:26 -0700

 The switchdev port driver must do two things:

 1) Generate a fwd_mark for each switch port, using some unique key of the
switch device (and optionally port).  This is a one-time operation done
when port's netdev is setup.

 2) On packet ingress from port, mark the skb with the ingress port's
fwd_mark.  If the device supports it, it's useful to only mark skbs
which were already forwarded by the device.  If the device does not
support such indication, all skbs can be marked, even if they're
local dst.

 Two new 32-bit fields are added to struct sk_buff and struct netdevice to
 hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
 tried using skb-mark for this purpose, but ebtables can overwrite the
 skb-mark before the bridge gets it, so that will not work.

 In general, this fwd_mark can be used for any case where a packet is
 forwarded by the device and a copy is sent to the CPU, to avoid the kernel
 re-forwarding the packet.  sFlow is another use-case that comes to mind,
 but I haven't explored the details.

Generally I'm against adding new fields fo sk_buff but I'm trying to be
open minded. :-)

About the per-device fwd_mark, if the key attribute is uniqueness,
let's just do it right and use something like lib/idr.c to generate
truly unique indices at probe time for all devices using this
facility.  I like that better than having them be unique by a happy
accident.

 We already have per-device uniqueue key. dev-ifindex.
 That should be good for fwd_mark purposes I believe.

It would be great if we could use dev-index, but fwd_mark is really
to mark device ports that belong to a group.  In the case of a bridge,
the device ports in the bridge should all have the same mark.  And
another device's ports in the same bridge would have a different mark
(so we can't use the bridge's dev-ifindex).  On ingress, the skb is
marked with the ingress port's mark.  If the skb is to be forwarded
out an egress port, the skb mark is compared with egress port's mark.
If marks compare, then the device has already forwarded the pkt so the
kernel can consume_skb to avoid duplicate pkts on the wire.

So what we need is a unique mark for device ports within a fwding
group, such as a bridge.

I'm investigating Dave's suggestion to use IDR.  I think this will work...
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] stmmac: explicitly zero des0 des1 on init

2015-06-16 Thread Alexey Brodkin

Current implementtion of descriptor init procedure only takes care about
ownership flag. While it is perfectly possible to have underlying memory
filled with garbage on boot or driver installation.

And randomly set flags in non-zeroed des0 and des1 fields may lead to
unpredictable behavior of the GMAC DMA block.

Solution to this problem is as simple as explicit zeroing of both des0
and des1 fields of all buffer descriptors.

Signed-off-by: Alexey Brodkin abrod...@synopsys.com
Cc: Giuseppe Cavallaro peppe.cavall...@st.com
Cc: arc-linux-...@synopsys.com
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
---
 drivers/net/ethernet/stmicro/stmmac/descs.h | 2 ++
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c  | 3 ++-
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c | 3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/descs.h 
b/drivers/net/ethernet/stmicro/stmmac/descs.h
index ad39960..799c292 100644
--- a/drivers/net/ethernet/stmicro/stmmac/descs.h
+++ b/drivers/net/ethernet/stmicro/stmmac/descs.h
@@ -158,6 +158,8 @@ struct dma_desc {
u32 buffer2_size:13;
u32 reserved4:3;
} etx;  /* -- enhanced -- */
+
+   u64 all_flags;
} des01;
unsigned int des2;
unsigned int des3;
diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c 
b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
index 1e2bcf5..7d9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
@@ -240,6 +240,7 @@ static int enh_desc_get_rx_status(void *data, struct 
stmmac_extra_stats *x,
 static void enh_desc_init_rx_desc(struct dma_desc *p, int disable_rx_ic,
  int mode, int end)
 {
+   p-des01.all_flags = 0;
p-des01.erx.own = 1;
p-des01.erx.buffer1_size = BUF_SIZE_8KiB - 1;
 
@@ -254,7 +255,7 @@ static void enh_desc_init_rx_desc(struct dma_desc *p, int 
disable_rx_ic,
 
 static void enh_desc_init_tx_desc(struct dma_desc *p, int mode, int end)
 {
-   p-des01.etx.own = 0;
+   p-des01.all_flags = 0;
if (mode == STMMAC_CHAIN_MODE)
ehn_desc_tx_set_on_chain(p, end);
else
diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c 
b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index 35ad4f4..48c3456 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -123,6 +123,7 @@ static int ndesc_get_rx_status(void *data, struct 
stmmac_extra_stats *x,
 static void ndesc_init_rx_desc(struct dma_desc *p, int disable_rx_ic, int mode,
   int end)
 {
+   p-des01.all_flags = 0;
p-des01.rx.own = 1;
p-des01.rx.buffer1_size = BUF_SIZE_2KiB - 1;
 
@@ -137,7 +138,7 @@ static void ndesc_init_rx_desc(struct dma_desc *p, int 
disable_rx_ic, int mode,
 
 static void ndesc_init_tx_desc(struct dma_desc *p, int mode, int end)
 {
-   p-des01.tx.own = 0;
+   p-des01.all_flags = 0;
if (mode == STMMAC_CHAIN_MODE)
ndesc_tx_set_on_chain(p, end);
else
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crush:Make the function crush_ln static

On Tue, Jun 16, 2015 at 7:18 PM, nick xerofo...@gmail.com wrote:
 Ilya,
 That's fine I do have a unrelated question through. I have over 90 patches 
 lying around in terms of cleanups/
 fixes and need help getting them merged. Should I just resend them or wait 
 until the end of this merge window.

Are all of those ceph patches?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 0/3] bpf: share helpers between tracing and networking

2015-06-16 Thread Alexei Starovoitov


On 6/16/15 2:19 AM, Daniel Borkmann wrote:

if you really want to, you
could go via skb-sk-sk_socket-file and then retrieve credentials
from there for egress side (you can have a look at xt_owner). You'd
need a different *_proto helper for tc in that case, which would
then map to BPF_FUNC_get_current_uid_gid, etc. But that doesn't work
for ingress however, even if you would have early demux, so you
would need to let the eBPF helper function return an error code in
that case.


was looking at cls_flow to do exactly that, but with different helper
name. Like bpf_get_socket_uid_gid(). The use case is to collect
network statistics per-user and per-process. I think android still using
some out of tree hacks for that. Ingress indeed is not solved by this
skb-sk-sk_socket approach. I considered kprobe style, but accessing
skb-len via probe_read is kernel specific, nonportable and slow-ish.
Ideally we would allow a blend of tracing and networking programs,
then the best solution would be one or two stable tracepoints in
networking stack where skb is visible and receiving/transmitting task
is also visible, then skb-len and task-pid together would give nice
foundation for accurate stats.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crush:Make the function crush_ln static

On Tue, Jun 16, 2015 at 8:22 PM, nick xerofo...@gmail.com wrote:


 On 2015-06-16 01:16 PM, Ilya Dryomov wrote:
 On Tue, Jun 16, 2015 at 8:07 PM, nick xerofo...@gmail.com wrote:


 On 2015-06-16 12:47 PM, Ilya Dryomov wrote:
 On Tue, Jun 16, 2015 at 7:18 PM, nick xerofo...@gmail.com wrote:
 Ilya,
 That's fine I do have a unrelated question through. I have over 90 
 patches lying around in terms of cleanups/
 fixes and need help getting them merged. Should I just resend them or 
 wait until the end of this merge window.

 Are all of those ceph patches?

 Thanks,

 Ilya

 Ilya,
 Sorry for the second email but two of them are. I just checked, would you 
 like me to resend
 these two patches.

 No need to resend, just point them out in a reply.

 Thanks,

 Ilya

 The two patches have these subject headers:
 ceph:Remove the unused marco AES_KEY_SIZE
 ceph:Make the function ceph_monmap_contains bool
 If you can't find them I will resend them.

I can't - looks like ceph-devel wasn't CC'ed.  Please resend.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2 net-next] net: rds: use for_each_sg() for scatterlist parsing

2015-06-16 Thread Fabian Frederick

This patch also renames sg to sglist and aligns function parameters.
See Documentation/DMA-API.txt - Part Id for scatterlist details

Signed-off-by: Fabian Frederick f...@skynet.be
---
This is untested.
V2: reorder variables (suggested by David S. Miller)

 net/rds/ib.h | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/net/rds/ib.h b/net/rds/ib.h
index c36d713..2de2898 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -235,28 +235,34 @@ extern struct workqueue_struct *rds_ib_wq;
  * doesn't define it.
  */
 static inline void rds_ib_dma_sync_sg_for_cpu(struct ib_device *dev,
-   struct scatterlist *sg, unsigned int sg_dma_len, int direction)
+ struct scatterlist *sglist,
+ unsigned int sg_dma_len,
+ int direction)
 {
+   struct scatterlist *sg;
unsigned int i;
 
-   for (i = 0; i  sg_dma_len; ++i) {
+   for_each_sg(sglist, sg, sg_dma_len, i) {
ib_dma_sync_single_for_cpu(dev,
-   ib_sg_dma_address(dev, sg[i]),
-   ib_sg_dma_len(dev, sg[i]),
+   ib_sg_dma_address(dev, sg),
+   ib_sg_dma_len(dev, sg),
direction);
}
 }
 #define ib_dma_sync_sg_for_cpu rds_ib_dma_sync_sg_for_cpu
 
 static inline void rds_ib_dma_sync_sg_for_device(struct ib_device *dev,
-   struct scatterlist *sg, unsigned int sg_dma_len, int direction)
+struct scatterlist *sglist,
+unsigned int sg_dma_len,
+int direction)
 {
+   struct scatterlist *sg;
unsigned int i;
 
-   for (i = 0; i  sg_dma_len; ++i) {
+   for_each_sg(sglist, sg, sg_dma_len, i) {
ib_dma_sync_single_for_device(dev,
-   ib_sg_dma_address(dev, sg[i]),
-   ib_sg_dma_len(dev, sg[i]),
+   ib_sg_dma_address(dev, sg),
+   ib_sg_dma_len(dev, sg),
direction);
}
 }
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2] ss: Include -E option for socket destroy events

2015-06-16 Thread Eric Dumazet

On Tue, 2015-06-16 at 12:02 -0400, Craig Gallek wrote:
 Use the IPv4/IPv6/TCP/UDP multicast groups of NETLINK_SOCK_DIAG
 to filter and display socket statistics as they are destroyed.
 
 Kernel support patch series: 24029a3603cfa633e8bc2b3fb3e48e76c497831d
 
 Signed-off-by: Craig Gallek kr...@google.com
 ---
  include/linux/inet_diag.h |  3 +-
  include/linux/sock_diag.h | 10 +++
  misc/ss.c | 72 
 ++-
  3 files changed, 83 insertions(+), 2 deletions(-)

I am not sure filter works, and apparently some fields are not properly
reported ? (source port for example looks wrong)

(Note : you probably need to filter in user space, but ss should already
have support for this)

lpaa23:~# ./ss -Eit dst 10.246.7.152
State   Recv-Q Send-Q Local Address:Port Peer Address:Port  
  
UNCONN  0  1 :::10.246.7.151:*  
:::10.126.178.16:14354
 wscale:7,7 rto:221 rtt:20.165/5.91 ato:40 mss:1448 cwnd:10 send 
5.7Mbps lastsnd:21 lastrcv:15023 lastack:1 pacing_rate 11.5Mbps unacked:1 
rcv_rtt:20 rcv_space:28960
UNCONN  0  1   127.0.0.1:* 127.0.0.1:9551   
  
 rto:1000 mss:524 cwnd:10 lastsnd:67817 lastrcv:67817 lastack:67817 
unacked:1
UNCONN  0  1   127.0.0.1:* 127.0.0.1:9551   
  
 rto:1000 mss:524 cwnd:10 lastsnd:67818 lastrcv:67818 lastack:67818 
unacked:1
UNCONN  0  1   127.0.0.1:* 127.0.0.1:9551   
  
 rto:1000 mss:524 cwnd:10 lastsnd:68742 lastrcv:68742 lastack:68742 
unacked:1
UNCONN  0  0  10.246.7.151:*10.246.7.152:50227  
  
 wscale:7,7 rto:201 rtt:0.203/0.209 mss:1448 cwnd:254 ssthresh:251 send 
14494.3Mbps lastsnd:1 lastrcv:71533 pacing_rate 28970.7Mbps retrans:0/5 
rcv_space:29200
UNCONN  0  0  10.246.7.151:*10.246.7.152:12865  
  
 wscale:7,7 rto:201 rtt:0.363/0.375 ato:40 mss:1448 cwnd:10 
ssthresh:293 send 319.1Mbps lastsnd:1004 lastrcv:1 lastack:1 pacing_rate 
636.7Mbps rcv_rtt:126.125 rcv_space:29200
UNCONN  0  1   127.0.0.1:* 127.0.0.1:9551   
  
 rto:1000 mss:524 cwnd:10 lastsnd:73036 lastrcv:73036 lastack:73036 
unacked:1
^C


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crush:Make the function crush_ln static

On Tue, Jun 16, 2015 at 7:57 PM, nick xerofo...@gmail.com wrote:


 On 2015-06-16 12:47 PM, Ilya Dryomov wrote:
 On Tue, Jun 16, 2015 at 7:18 PM, nick xerofo...@gmail.com wrote:
 Ilya,
 That's fine I do have a unrelated question through. I have over 90 patches 
 lying around in terms of cleanups/
 fixes and need help getting them merged. Should I just resend them or wait 
 until the end of this merge window.

 Are all of those ceph patches?

 Thanks,

 Ilya

 No there all over networking stack including drivers.

I'd suggest reaching out to the respective maintainers, especially if
those ~90 patches can be broken into a few chunks that a small number
of people can deal with.

If that doesn't get you anywhere, try resending the whole batch to
triv...@kernel.org.  Jiri may be able to merge it through his trivial
tree.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: displayed name changed in ip link show for bridge- and other interfaces

2015-06-16 Thread Oliver Hartkopp


On 16.06.2015 19:35, Oliver Hartkopp wrote:


ps. will apply the patch from Nicolas if it fixes the ip output.


No it didn't - I have no bridging configured in my kernel %-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] packet: free packet_rollover after synchronize_net

2015-06-16 Thread Willem de Bruijn

From: Willem de Bruijn will...@google.com

Destruction of the po-rollover must be delayed until there are no
more packets in flight that can access it. The field is destroyed in
packet_release, before synchronize_net. Delay using rcu.

Fixes: 0648ab70afe6 (packet: rollover prepare: per-socket state)

Suggested-by: Eric Dumazet eduma...@google.com
Signed-off-by: Willem de Bruijn will...@google.com
---
 net/packet/af_packet.c | 3 ++-
 net/packet/internal.h  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index fd51641..20e8c40 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1634,7 +1634,8 @@ static void fanout_release(struct sock *sk)
}
mutex_unlock(fanout_mutex);
 
-   kfree(po-rollover);
+   if (po-rollover)
+   kfree_rcu(po-rollover, rcu);
 }
 
 static const struct proto_ops packet_ops;
diff --git a/net/packet/internal.h b/net/packet/internal.h
index c035d26..e20b3e8 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -89,6 +89,7 @@ struct packet_fanout {
 
 struct packet_rollover {
int sock;
+   struct rcu_head rcu;
atomic_long_t   num;
atomic_long_t   num_huge;
atomic_long_t   num_failed;
-- 
2.2.0.rc0.207.ga3a616c

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crush:Make the function crush_ln static

On Tue, Jun 16, 2015 at 8:07 PM, nick xerofo...@gmail.com wrote:


 On 2015-06-16 12:47 PM, Ilya Dryomov wrote:
 On Tue, Jun 16, 2015 at 7:18 PM, nick xerofo...@gmail.com wrote:
 Ilya,
 That's fine I do have a unrelated question through. I have over 90 patches 
 lying around in terms of cleanups/
 fixes and need help getting them merged. Should I just resend them or wait 
 until the end of this merge window.

 Are all of those ceph patches?

 Thanks,

 Ilya

 Ilya,
 Sorry for the second email but two of them are. I just checked, would you 
 like me to resend
 these two patches.

No need to resend, just point them out in a reply.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] fm10k: Don't assume page fragments are page size

2015-06-16 Thread Alexander Duyck

This change pulls out the optimization that assumed that all fragments
would be limited to page size.  That hasn't been the case for some time now
and to assume this is incorrect as the TCP allocator can provide up to a
32K page fragment.

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 982fdcdc795b..620ff5e9dc59 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1079,9 +1079,7 @@ netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
struct fm10k_tx_buffer *first;
int tso;
u32 tx_flags = 0;
-#if PAGE_SIZE  FM10K_MAX_DATA_PER_TXD
unsigned short f;
-#endif
u16 count = TXD_USE_COUNT(skb_headlen(skb));
 
/* need: 1 descriptor per page * PAGE_SIZE/FM10K_MAX_DATA_PER_TXD,
@@ -1089,12 +1087,9 @@ netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
 *   + 2 desc gap to keep tail from touching head
 * otherwise try next time
 */
-#if PAGE_SIZE  FM10K_MAX_DATA_PER_TXD
for (f = 0; f  skb_shinfo(skb)-nr_frags; f++)
count += TXD_USE_COUNT(skb_shinfo(skb)-frags[f].size);
-#else
-   count += skb_shinfo(skb)-nr_frags;
-#endif
+
if (fm10k_maybe_stop_tx(tx_ring, count + 3)) {
tx_ring-tx_stats.tx_busy++;
return NETDEV_TX_BUSY;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] packet: avoid out of bounds read in round robin fanout

2015-06-16 Thread Willem de Bruijn

From: Willem de Bruijn will...@google.com

PACKET_FANOUT_LB computes f-rr_cur such that it is modulo
f-num_members. It returns the old value unconditionally, but
f-num_members may have changed since the last store. This can be
fixed with

  -return cur
  +return cur  num ? : 0;

When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.

Fixes: dc99f600698d (packet: Add fanout support.)
Suggested-by: Eric Dumazet eduma...@google.com
Signed-off-by: Willem de Bruijn will...@google.com
---
 net/packet/af_packet.c | 19 ++-
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b5989c6..efd35e8 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1272,16 +1272,6 @@ static void packet_sock_destruct(struct sock *sk)
sk_refcnt_debug_dec(sk);
 }
 
-static int fanout_rr_next(struct packet_fanout *f, unsigned int num)
-{
-   int x = atomic_read(f-rr_cur) + 1;
-
-   if (x = num)
-   x = 0;
-
-   return x;
-}
-
 static unsigned int fanout_demux_hash(struct packet_fanout *f,
  struct sk_buff *skb,
  unsigned int num)
@@ -1293,13 +1283,8 @@ static unsigned int fanout_demux_lb(struct packet_fanout 
*f,
struct sk_buff *skb,
unsigned int num)
 {
-   int cur, old;
-
-   cur = atomic_read(f-rr_cur);
-   while ((old = atomic_cmpxchg(f-rr_cur, cur,
-fanout_rr_next(f, num))) != cur)
-   cur = old;
-   return cur;
+   unsigned int val = atomic_inc_return(f-rr_cur);
+   return val % num;
 }
 
 static unsigned int fanout_demux_cpu(struct packet_fanout *f,
-- 
2.2.0.rc0.207.ga3a616c

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/7] slub: improve bulk alloc strategy

On Mon, 15 Jun 2015 17:52:46 +0200 Jesper Dangaard Brouer bro...@redhat.com 
wrote:

 Call slowpath __slab_alloc() from within the bulk loop, as the
 side-effect of this call likely repopulates c-freelist.
 
 Choose to reenable local IRQs while calling slowpath.
 
 Saving some optimizations for later.  E.g. it is possible to
 extract parts of __slab_alloc() and avoid the unnecessary and
 expensive (37 cycles) local_irq_{save,restore}.  For now, be
 happy calling __slab_alloc() this lower icache impact of this
 func and I don't have to worry about correctness.
 
 ...

 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -2776,8 +2776,23 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
 flags, size_t size,
   for (i = 0; i  size; i++) {
   void *object = c-freelist;
  
 - if (!object)
 - break;
 + if (unlikely(!object)) {
 + c-tid = next_tid(c-tid);
 + local_irq_enable();
 +
 + /* Invoke slow path one time, then retry fastpath
 +  * as side-effect have updated c-freelist
 +  */

That isn't very grammatical.

Block comments are formatted

/*
 * like this
 */

please.


 + p[i] = __slab_alloc(s, flags, NUMA_NO_NODE,
 + _RET_IP_, c);
 + if (unlikely(!p[i])) {
 + __kmem_cache_free_bulk(s, i, p);
 + return false;
 + }
 + local_irq_disable();
 + c = this_cpu_ptr(s-cpu_slab);
 + continue; /* goto for-loop */
 + }
  

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces

2015-06-16 Thread Eric W. Biederman

Pablo Neira Ayuso pa...@netfilter.org writes:

 On Mon, Jun 15, 2015 at 07:26:13PM -0500, Eric W. Biederman wrote:
 [...]
 So what I am in the processes of doing is reviewing and testing
 the combined set of patches and hopefully I will have something
 for you soon (tomorrow?).  Unless Pablo has objections.

 Please, feel free to take over my patchset and improve it. That has
 consumed part of my weekend and I have several open branches in my
 internal tree that I need to push forward.

 So I'd be really happy if you polish them and get this done.

 Let me know if I can help in any case.

Will do.  I have found more bugs lurking in your boilter plate changes
that I am comfortable with so I am in the process of refactoring things
a bit more than you did so the sweeping changes are more pedantic
and much less likely to introduce bugs.

Eric

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding

2015-06-16 Thread Jiri Pirko

Tue, Jun 16, 2015 at 06:47:47PM CEST, sfel...@gmail.com wrote:
On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko j...@resnulli.us wrote:
 Tue, Jun 16, 2015 at 01:25:51AM CEST, da...@davemloft.net wrote:
From: sfel...@gmail.com
Date: Sat, 13 Jun 2015 11:04:26 -0700

 The switchdev port driver must do two things:

 1) Generate a fwd_mark for each switch port, using some unique key of the
switch device (and optionally port).  This is a one-time operation done
when port's netdev is setup.

 2) On packet ingress from port, mark the skb with the ingress port's
fwd_mark.  If the device supports it, it's useful to only mark skbs
which were already forwarded by the device.  If the device does not
support such indication, all skbs can be marked, even if they're
local dst.

 Two new 32-bit fields are added to struct sk_buff and struct netdevice to
 hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
 tried using skb-mark for this purpose, but ebtables can overwrite the
 skb-mark before the bridge gets it, so that will not work.

 In general, this fwd_mark can be used for any case where a packet is
 forwarded by the device and a copy is sent to the CPU, to avoid the kernel
 re-forwarding the packet.  sFlow is another use-case that comes to mind,
 but I haven't explored the details.

Generally I'm against adding new fields fo sk_buff but I'm trying to be
open minded. :-)

About the per-device fwd_mark, if the key attribute is uniqueness,
let's just do it right and use something like lib/idr.c to generate
truly unique indices at probe time for all devices using this
facility.  I like that better than having them be unique by a happy
accident.

 We already have per-device uniqueue key. dev-ifindex.
 That should be good for fwd_mark purposes I believe.

It would be great if we could use dev-index, but fwd_mark is really
to mark device ports that belong to a group.  In the case of a bridge,
the device ports in the bridge should all have the same mark.  And
another device's ports in the same bridge would have a different mark
(so we can't use the bridge's dev-ifindex).  On ingress, the skb is
marked with the ingress port's mark.  If the skb is to be forwarded
out an egress port, the skb mark is compared with egress port's mark.
If marks compare, then the device has already forwarded the pkt so the
kernel can consume_skb to avoid duplicate pkts on the wire.

So what we need is a unique mark for device ports within a fwding
group, such as a bridge.

Yep, have a group of netdevs, pick one of them and use it's ifindex for
the whole group.

I'm investigating Dave's suggestion to use IDR.  I think this will work...
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Intel-wired-lan] [PATCH] fm10k: Don't assume page fragments are page size

Acked-by: Jacob Keller jacob.e.kel...@intel.com

Regards,
Jake

On Tue, 2015-06-16 at 11:47 -0700, Alexander Duyck wrote:
 This change pulls out the optimization that assumed that all 
 fragments
 would be limited to page size.  That hasn't been the case for some 
 time now
 and to assume this is incorrect as the TCP allocator can provide up 
 to a
 32K page fragment.
 
 Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
 ---
  drivers/net/ethernet/intel/fm10k/fm10k_main.c |7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)
 
 diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
 b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
 index 982fdcdc795b..620ff5e9dc59 100644
 --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
 +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
 @@ -1079,9 +1079,7 @@ netdev_tx_t fm10k_xmit_frame_ring(struct 
 sk_buff *skb,
   struct fm10k_tx_buffer *first;
   int tso;
   u32 tx_flags = 0;
 -#if PAGE_SIZE  FM10K_MAX_DATA_PER_TXD
   unsigned short f;
 -#endif
   u16 count = TXD_USE_COUNT(skb_headlen(skb));
  
   /* need: 1 descriptor per page * 
 PAGE_SIZE/FM10K_MAX_DATA_PER_TXD,
 @@ -1089,12 +1087,9 @@ netdev_tx_t fm10k_xmit_frame_ring(struct 
 sk_buff *skb,
*   + 2 desc gap to keep tail from touching head
* otherwise try next time
*/
 -#if PAGE_SIZE  FM10K_MAX_DATA_PER_TXD
   for (f = 0; f  skb_shinfo(skb)-nr_frags; f++)
   count += TXD_USE_COUNT(skb_shinfo(skb)
 -frags[f].size);
 -#else
 - count += skb_shinfo(skb)-nr_frags;
 -#endif
 +
   if (fm10k_maybe_stop_tx(tx_ring, count + 3)) {
   tx_ring-tx_stats.tx_busy++;
   return NETDEV_TX_BUSY;
 
 ___
 Intel-wired-lan mailing list
 intel-wired-...@lists.osuosl.org
 http://lists.osuosl.org/mailman/listinfo/intel-wired-lanN�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing

On Mon, 15 Jun 2015 17:51:56 +0200 Jesper Dangaard Brouer bro...@redhat.com 
wrote:

 +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 + void **p)
 +{
 + return kmem_cache_alloc_bulk(s, flags, size, p);
 +}

hm, any call to this function is going to be nasty, brutal and short.

--- 
a/mm/slab.c~slab-infrastructure-for-bulk-object-allocation-and-freeing-v3-fix
+++ a/mm/slab.c
@@ -3425,7 +3425,7 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
void **p)
 {
-   return kmem_cache_alloc_bulk(s, flags, size, p);
+   return __kmem_cache_alloc_bulk(s, flags, size, p);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 
_

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab

On Mon, 15 Jun 2015 17:52:07 +0200 Jesper Dangaard Brouer bro...@redhat.com 
wrote:

 From: Christoph Lameter c...@linux.com
 
 [NOTICE: Already in AKPM's quilt-queue]
 
 First piece: acceleration of retrieval of per cpu objects
 
 If we are allocating lots of objects then it is advantageous to disable
 interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
 faster.
 
 Note that we cannot do the fast operation if debugging is enabled, because
 we would have to add extra code to do all the debugging checks.  And it
 would not be fast anyway.
 
 Note also that the requirement of having interrupts disabled
 avoids having to do processor flag operations.
 
 Allocate as many objects as possible in the fast way and then fall back to
 the generic implementation for the rest of the objects.
 
 ...

 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
  bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
   void **p)
  {
 - return kmem_cache_alloc_bulk(s, flags, size, p);
 + if (!kmem_cache_debug(s)) {
 + struct kmem_cache_cpu *c;
 +
 + /* Drain objects in the per cpu slab */
 + local_irq_disable();
 + c = this_cpu_ptr(s-cpu_slab);
 +
 + while (size) {
 + void *object = c-freelist;
 +
 + if (!object)
 + break;
 +
 + c-freelist = get_freepointer(s, object);
 + *p++ = object;
 + size--;
 +
 + if (unlikely(flags  __GFP_ZERO))
 + memset(object, 0, s-object_size);
 + }
 + c-tid = next_tid(c-tid);
 +
 + local_irq_enable();

It might be worth adding

if (!size)
return true;

here.  To avoid the pointless call to __kmem_cache_alloc_bulk().

It depends on the typical success rate of this allocation loop.  Do you
know what this is?

 + }
 +
 + return __kmem_cache_alloc_bulk(s, flags, size, p);
  }
  EXPORT_SYMBOL(kmem_cache_alloc_bulk);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/3] net/xen-netback: Don't mix hexa and decimal with 0x in the printf format

2015-06-16 Thread Joe Perches

On Tue, 2015-06-16 at 23:07 +0300, Sergei Shtylyov wrote:
 On 06/16/2015 10:10 PM, Julien Grall wrote:
  Append 0x to all %x in order to avoid while reading when there is other
  decimal value in the log.
[]
  @@ -874,7 +874,7 @@ static inline void xenvif_grant_handle_set(struct 
  xenvif_queue *queue,
  if (unlikely(queue-grant_tx_handle[pending_idx] !=
   NETBACK_INVALID_HANDLE)) {
  netdev_err(queue-vif-dev,
  -  Trying to overwrite active handle! pending_idx: 
  %x\n,
  +  Trying to overwrite active handle! pending_idx: 
  0x%x\n,
 
 Using %#x is shorter ind does the same.

That's true, but it's also far less common.

$ git grep -E %#[\*\d\.]*x | wc -l
1419
$ git grep 0x% | wc -l
29844


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: displayed name changed in ip link show for bridge- and other interfaces

2015-06-16 Thread Oliver Hartkopp


On 15.06.2015 17:54, Stephen Hemminger wrote:

On Mon, 15 Jun 2015 11:13:12 +0200
Nicolas Dichtel nicolas.dich...@6wind.com wrote:


Theoretically, virtual interfaces should advertise an IFLA_LINK to 0.
I don't know what is the best fix:
   - patching iproute2 to avoid this '@NONE'
   - patching the kernel (see below).



Sorry this is an ABI change. The kernel has to go back
to doing the same thing as before.



Isn't this too late right now at 4.1-rc8 stage???

At least the patch suggested for br_device.c at

http://marc.info/?l=linux-netdevm=143435960111768w=2

would been necessary in all networking drivers, right?

I currently see this @NONE stuff with virtual CAN devices too.

Regards,
Oliver

ps. will apply the patch from Nicolas if it fixes the ip output.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] packet: avoid out of bounds read in round robin fanout

2015-06-16 Thread Willem de Bruijn

On Tue, Jun 16, 2015 at 5:07 PM, Willem de Bruijn will...@google.com wrote:
 From: Willem de Bruijn will...@google.com

 PACKET_FANOUT_LB computes f-rr_cur such that it is modulo
 f-num_members. It returns the old value unconditionally, but
 f-num_members may have changed since the last store. This can be
 fixed with

   -return cur
   +return cur  num ? : 0;

Well, that test is bad. Should be return cur  num ? cur : 0. But the
patch is more concise, anyway.


 When modifying the logic, simplify it further by replacing the loop
 with an unconditional atomic increment.

 Fixes: dc99f600698d (packet: Add fanout support.)
 Suggested-by: Eric Dumazet eduma...@google.com
 Signed-off-by: Willem de Bruijn will...@google.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND] pkt_sched: sch_qfq: remove redundant -if- control statement

2015-06-16 Thread Andrea Parri

The control !hlist_unhashed() in qfq_destroy_agg() is unnecessary
because already performed in hlist_del_init(), so remove it.

Signed-off-by: Andrea Parri parri.and...@gmail.com
---
 net/sched/sch_qfq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 3ec7e88..b8d73bc 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -339,8 +339,7 @@ static struct qfq_aggregate *qfq_choose_next_agg(struct 
qfq_sched *);
 
 static void qfq_destroy_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 {
-   if (!hlist_unhashed(agg-nonfull_next))
-   hlist_del_init(agg-nonfull_next);
+   hlist_del_init(agg-nonfull_next);
q-wsum -= agg-class_weight;
if (q-wsum != 0)
q-iwsum = ONE_FP / q-wsum;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BUG,BISECTED] mvneta: second interface no more usable on mirabox

2015-06-16 Thread Arnaud Ebalard

Hi,

On Mirabox, the second ethernet interface is no more usable on 4.1-rc*
series (no packets coming out of the interface, when using dhclient for
instance). It works as expected on 4.0.

Bisecting the issue, I ended up on 898b2970e2c9 (mvneta: implement
SGMII-based in-band link state signaling). Reverting that commit gives 
me back the second interface.

Then, I also tested on a NETGEAR ReadyNAS 104, which is also powered by
the same SoC (Armada 370) and also has two (mvneta-supported) ethernet
interfaces. With an unmodified 4.1-rc8, only one of the two interfaces
is available. Reverting 898b2970e2c9 makes both usable again.

FWIW, mirabox and RN104 ethernet interfaces use RGMII.

Cheers,

a+
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] slub: fix error path bug in kmem_cache_alloc_bulk

On Mon, 15 Jun 2015 17:52:26 +0200 Jesper Dangaard Brouer bro...@redhat.com 
wrote:

 The current kmem_cache/SLAB bulking API need to release all objects
 in case the layer cannot satisfy the full request.
 
 If __kmem_cache_alloc_bulk() fails, all allocated objects in array
 should be freed, but, __kmem_cache_alloc_bulk() can't know
 about objects allocated by this slub specific kmem_cache_alloc_bulk()
 function.

Can we fold patches 2, 3 and 4 into a single patch?

And maybe patch 5 as well.  I don't think we need all these
development-time increments in the permanent record.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 1/3] net/xen-netfront: Correct printf format in xennet_get_responses

2015-06-16 Thread Julien Grall

rx-status is an int16_t, print it using %d rather than %u in order to
have a meaningful value when the field is negative.

Also use %u rather than %x for rx-offset.

Signed-off-by: Julien Grall julien.gr...@citrix.com
Reviewed-by: David Vrabel david.vra...@citrix.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: netdev@vger.kernel.org

---
Changes in v4:
- Use %u for the rx-offset because offset is unsigned

Changes in v3:
- Use %d for the rx-offset too.

Changes in v2:
- Add David's Reviewed-by
---
 drivers/net/xen-netfront.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index e031c94..281720f 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -733,7 +733,7 @@ static int xennet_get_responses(struct netfront_queue 
*queue,
if (unlikely(rx-status  0 ||
 rx-offset + rx-status  PAGE_SIZE)) {
if (net_ratelimit())
-   dev_warn(dev, rx-offset: %x, size: %u\n,
+   dev_warn(dev, rx-offset: %u, size: %d\n,
 rx-offset, rx-status);
xennet_move_rx_slot(queue, skb, ref);
err = -EINVAL;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] packet: free packet_rollover after synchronize_net

2015-06-16 Thread Eric Dumazet

On Tue, 2015-06-16 at 12:51 -0400, Willem de Bruijn wrote:
 From: Willem de Bruijn will...@google.com
 
 Destruction of the po-rollover must be delayed until there are no
 more packets in flight that can access it. The field is destroyed in
 packet_release, before synchronize_net. Delay using rcu.
 
 Fixes: 0648ab70afe6 (packet: rollover prepare: per-socket state)
 
 Suggested-by: Eric Dumazet eduma...@google.com
 Signed-off-by: Willem de Bruijn will...@google.com

Acked-by: Eric Dumazet eduma...@google.com


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC net-next] openvswitch: allow output of MPLS packets on tunnel vports

2015-06-16 Thread Pravin Shelar

On Mon, Jun 15, 2015 at 5:39 PM, Simon Horman
simon.hor...@netronome.com wrote:
 Currently output of MPLS packets on tunnel vports is not allowed by the
 datapath and, moreover, flows that match on MPLS packets and output to
 tunnel vports are rejected by the datapath. The flows are rejected
 regardless of if they also output to non-tunnel vports which is allowed for
 MPLS packets and the following is logged by the kernel.

 openvswitch: netlink: Flow actions may not be safe on all matching packets.

 This patch addresses the above by allowing output of MPLS packets to tunnel
 vports.

 My recollection of adding MPLS support to the datapath was that a rather
 conservative approach was taken in order to minimise the chance of fallout.
 This patch proposes relaxing one restriction which was introduced at that
 time.

 My limited testing has not isolated any side effects of this change.

 Signed-off-by: Simon Horman simon.hor...@netronome.com
 ---
  net/openvswitch/flow_netlink.c | 3 ---
  1 file changed, 3 deletions(-)

 diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
 index 624e41c4267f..a5d3c0ae8ac8 100644
 --- a/net/openvswitch/flow_netlink.c
 +++ b/net/openvswitch/flow_netlink.c
 @@ -1847,9 +1847,6 @@ static int validate_set(const struct nlattr *a,
 break;

 case OVS_KEY_ATTR_TUNNEL:
 -   if (eth_p_mpls(eth_type))
 -   return -EINVAL;
 -
One of the problem is with setting skb-inner_protocol. MPLS and
tunnel both needs to set inner protocol field. So outer encapsulation
would just overwrite earlier inner protocol field on packet transit
path.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next 09/17] fm10k: trivial fixup message style to include a colon

On Tue, 2015-06-16 at 17:16 +0300, Sergei Shtylyov wrote:
 Hello.
 
 On 6/16/2015 4:47 PM, Jeff Kirsher wrote:
 
  From: Jacob Keller jacob.e.kel...@intel.com
 
  Fix up error message style to include a colon.
 
  Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
  Tested-by: Krishneil Singh krishneil.k.si...@intel.com
  Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
  ---
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
  b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
  index 445014a..5269b16 100644
  --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
  +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
  @@ -1773,7 +1773,7 @@ static int fm10k_probe(struct pci_dev *pdev,
 fm10k_driver_name);
  if (err) {
  dev_err(pdev-dev,
  -   pci_request_selected_regions failed 
  0x%x\n, err);
  +   pci_request_selected_regions failed: 
  0x%x\n, err);
 
 I don't think printing error in hexadecimal makes much sense, so 
 you might 
 as well fix that format to %d...
 
 WBR, Sergei
 

Sure thing.

Regards,
Jake

[PATCH RFC net] neigh: do not modify unlinked entries

2015-06-16 Thread Julian Anastasov

The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:

1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.

2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.

Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.

Fixes: 767e97e1e0db (neigh: RCU conversion of struct neighbour)
Fixes: a263b3093641 (ipv4: Make neigh lookups directly in output packet path.)
Fixes: 6fd6ce2056de (ipv6: Do not depend on rt-n in ip6_finish_output2().)
Cc: Eric Dumazet eric.duma...@gmail.com
Cc: Ying Xue ying@windriver.com
Signed-off-by: Julian Anastasov j...@ssi.bg
---
 net/core/neighbour.c | 13 +
 1 file changed, 13 insertions(+)

  This is an RFC, so that it can get proper commit message,
testing and reports. In fact, I'm interested to see valid
stack dumps for the NEIGH: BUG, double timer add, state is %x
message without this patch and without any debug patches that
dump stack from neigh_hold or other places...

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 3de6542..2237c1b 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -957,6 +957,8 @@ int __neigh_event_send(struct neighbour *neigh, struct 
sk_buff *skb)
rc = 0;
if (neigh-nud_state  (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
goto out_unlock_bh;
+   if (neigh-dead)
+   goto out_dead;
 
if (!(neigh-nud_state  (NUD_STALE | NUD_INCOMPLETE))) {
if (NEIGH_VAR(neigh-parms, MCAST_PROBES) +
@@ -1013,6 +1015,13 @@ out_unlock_bh:
write_unlock(neigh-lock);
local_bh_enable();
return rc;
+
+out_dead:
+   if (neigh-nud_state  NUD_STALE)
+   goto out_unlock_bh;
+   write_unlock_bh(neigh-lock);
+   kfree_skb(skb);
+   return 1;
 }
 EXPORT_SYMBOL(__neigh_event_send);
 
@@ -1076,6 +1085,8 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
if (!(flags  NEIGH_UPDATE_F_ADMIN) 
(old  (NUD_NOARP | NUD_PERMANENT)))
goto out;
+   if (neigh-dead)
+   goto out;
 
if (!(new  NUD_VALID)) {
neigh_del_timer(neigh);
@@ -1225,6 +1236,8 @@ EXPORT_SYMBOL(neigh_update);
  */
 void __neigh_set_probe_once(struct neighbour *neigh)
 {
+   if (neigh-dead)
+   return;
neigh-updated = jiffies;
if (!(neigh-nud_state  NUD_FAILED))
return;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

pull request [net]: batman-adv 20150616

2015-06-16 Thread Antonio Quartulli

Hello David,

this is a pull request intended for net/linux-4.1.
I know it is rather late in the release cycle, but the fixes are
pretty small and they were worth being sent now.

Patch 1/2 fixes an overflow of a u32 variable that may happen while
computing the metric to select the best batman-GW in the network.
** It would be great if this patch could be queued for inclusion in
any stable release = 3.2


Patch 2/2 prevents the Distributed ARP Table from forging unwanted
ARP replies on behalf of another host that may fool an Ethernet switch
sitting behind batman-adv.
** It would be great if this patch could be queued for inclusion in
any stable release = 3.8


Please pull or let me know if something is wrong

Thanks a lot,
Antonio



The following changes since commit ac0a72a3e6e8d817f60ce4d9a8f3b43dc256d847:

  net/mlx4_core: Disable Granular QoS per VF under IB/Eth VPI configuration 
(2015-06-15 16:42:57 -0700)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batman-adv-fix-for-davem

for you to fetch changes up to c065d51055924f6daad8e16307c364602b0f9805:

  batman-adv: avoid DAT to mess up LAN state (2015-06-16 11:13:12 +0200)


Included changes:
- fix gateway selection metric overflow
- avoid the Distriubted ARP Table to fool Ethernet switches by forging
  unexpected packets


Antonio Quartulli (1):
  batman-adv: avoid DAT to mess up LAN state

Ruben Wisniewski (1):
  batman-adv: Avoid u32 overflow during gateway select

 net/batman-adv/distributed-arp-table.c | 18 +-
 net/batman-adv/gateway_client.c|  2 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] batman-adv: Avoid u32 overflow during gateway select

2015-06-16 Thread Antonio Quartulli

From: Ruben Wisniewski ru...@freifunk-nrw.de

The gateway selection based on fast connections is using a single value
calculated from the average tq (0-255) and the download bandwidth (in
100Kibit). The formula for the first step (tq ** 2 * 1 * bandwidth)
tends to overflow a u32 with low bandwidth settings like 50 [100KiBit]
and a tq value of over 92.

Changing this to a 64 bit unsigned integer allows to support a
bandwidth_down with up to ~2.8e10 [100KiBit] and a perfect tq of 255. This
is ~6.6 times higher than the maximum possible value of the gateway
announcement TVLV.

This problem only affects the non-default gw_sel_class 1.

Signed-off-by: Ruben Wisniewsi ru...@vfn-nrw.de
[s...@narfation.org: rewritten commit message, changed to kernel type]
Signed-off-by: Sven Eckelmann s...@narfation.org
Signed-off-by: Marek Lindner mareklind...@neomailbox.ch

Signed-off-by: Antonio Quartulli anto...@meshcoding.com
---
 net/batman-adv/gateway_client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c
index 090828c..ca734f8 100644
--- a/net/batman-adv/gateway_client.c
+++ b/net/batman-adv/gateway_client.c
@@ -133,7 +133,7 @@ batadv_gw_get_best_gw_node(struct batadv_priv *bat_priv)
struct batadv_neigh_node *router;
struct batadv_neigh_ifinfo *router_ifinfo;
struct batadv_gw_node *gw_node, *curr_gw = NULL;
-   uint32_t max_gw_factor = 0, tmp_gw_factor = 0;
+   uint64_t max_gw_factor = 0, tmp_gw_factor = 0;
uint32_t gw_divisor;
uint8_t max_tq = 0;
uint8_t tq_avg;
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] batman-adv: avoid DAT to mess up LAN state

2015-06-16 Thread Antonio Quartulli

When a node running DAT receives an ARP request from the LAN for the
first time, it is likely that this node will request the ARP entry
through the distributed ARP table (DAT) in the mesh.

Once a DAT reply is received the asking node must check if the MAC
address for which the IP address has been asked is local. If it is, the
node must drop the ARP reply bceause the client should have replied on
its own locally.

Forwarding this reply means fooling any L2 bridge (e.g. Ethernet
switches) lying between the batman-adv node and the LAN. This happens
because the L2 bridge will think that the client sending the ARP reply
lies somewhere in the mesh, while this node is sitting in the same LAN.

Reported-by: Simon Wunderlich s...@simonwunderlich.de
Signed-off-by: Antonio Quartulli anto...@meshcoding.com
Signed-off-by: Marek Lindner mareklind...@neomailbox.ch
---
 net/batman-adv/distributed-arp-table.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/batman-adv/distributed-arp-table.c 
b/net/batman-adv/distributed-arp-table.c
index aad022d..5a4d45a 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -1107,6 +1107,9 @@ void batadv_dat_snoop_outgoing_arp_reply(struct 
batadv_priv *bat_priv,
  * @bat_priv: the bat priv with all the soft interface information
  * @skb: packet to check
  * @hdr_size: size of the encapsulation header
+ *
+ * Returns true if the packet was snooped and consumed by DAT. False if the
+ * packet has to be delivered to the interface
  */
 bool batadv_dat_snoop_incoming_arp_reply(struct batadv_priv *bat_priv,
 struct sk_buff *skb, int hdr_size)
@@ -1114,7 +1117,7 @@ bool batadv_dat_snoop_incoming_arp_reply(struct 
batadv_priv *bat_priv,
uint16_t type;
__be32 ip_src, ip_dst;
uint8_t *hw_src, *hw_dst;
-   bool ret = false;
+   bool dropped = false;
unsigned short vid;
 
if (!atomic_read(bat_priv-distributed_arp_table))
@@ -1143,12 +1146,17 @@ bool batadv_dat_snoop_incoming_arp_reply(struct 
batadv_priv *bat_priv,
/* if this REPLY is directed to a client of mine, let's deliver the
 * packet to the interface
 */
-   ret = !batadv_is_my_client(bat_priv, hw_dst, vid);
+   dropped = !batadv_is_my_client(bat_priv, hw_dst, vid);
+
+   /* if this REPLY is sent on behalf of a client of mine, let's drop the
+* packet because the client will reply by itself
+*/
+   dropped |= batadv_is_my_client(bat_priv, hw_src, vid);
 out:
-   if (ret)
+   if (dropped)
kfree_skb(skb);
-   /* if ret == false - packet has to be delivered to the interface */
-   return ret;
+   /* if dropped == false - deliver to the interface */
+   return dropped;
 }
 
 /**
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2] ss: Include -E option for socket destroy events

2015-06-16 Thread Craig Gallek

 I am not sure filter works, and apparently some fields are not properly
 reported ? (source port for example looks wrong)

 (Note : you probably need to filter in user space, but ss should already
 have support for this)
Ah, good catch on the filter.  It can be fixed by
- if ((err = inet_show_sock(h, NULL, diag_arg-protocol))  0)
+ if ((err = inet_show_sock(h, diag_arg-f, diag_arg-protocol))  0)
in show_one_inet_sock.  I believe this worked previously because the
filter determined the parameters of the netlink request, effectively
causing the filtering to happen in the kernel.  However, if the
ssfilter defined a parameter that could not be used select sockets via
the netlink request (either because it is a concept not available in
the request structure or because the multicast groups have no request
concept), this user space filtering parameter would be necessary.
Perhaps this was an optimization to not do userspace filtering when we
know we won't need it?  In that case, my updated version of this patch
should probably set the filter parameter of inet_diag_arg to NULL in
the case of a get request and to the actual userspace filter in the
case of monitoring the broadcast data.

The source port issue is a harder problem (related to the kernel
patches, not this one).  The point at which socket information is
broadcast happens after the unhash callback of the appropriate
protocol (called in sk_common_release).  This process sets the source
port of the socket to zero (via __inet_put_port for tcp and
udp_lib_unhash for udp).  If we want the source port to be returned, I
believe the only options are broadcasting earlier in the destruction
path (potentially missing any activity that may happen after that
point) or to store the source port in an additional location for later
use...
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next 07/17] fm10k: use an unsigned int for i in ethtool_get_strings

On Tue, 2015-06-16 at 17:19 +0300, Sergei Shtylyov wrote:
 On 6/16/2015 4:47 PM, Jeff Kirsher wrote:
 
  From: Jacob Keller jacob.e.kel...@intel.com
 
  The value will never be negative, and we use the %i print format, 
  use
 
 %i is the same as %d, AFAIR. You need %u for the unsigned 
 variables.
 
  unsigned int for the loop counter. Issue found using cppcheck.
 
  Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
  Tested-by: Krishneil Singh krishneil.k.si...@intel.com
  Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
 
 WBR, Sergei
 

Oops, you are right.

Regards,
Jake

Re: [net-next 07/17] fm10k: use an unsigned int for i in ethtool_get_strings

On Tue, 2015-06-16 at 13:33 -0700, Jacob E Keller wrote:
 On Tue, 2015-06-16 at 17:19 +0300, Sergei Shtylyov wrote:
  On 6/16/2015 4:47 PM, Jeff Kirsher wrote:
  
   From: Jacob Keller jacob.e.kel...@intel.com
  
   The value will never be negative, and we use the %i print format, 
   
   use
  
  %i is the same as %d, AFAIR. You need %u for the unsigned 
  
  variables.
  
   unsigned int for the loop counter. Issue found using cppcheck.
  
   Signed-off-by: Jacob Keller jacob.e.kel...@intel.com
   Tested-by: Krishneil Singh krishneil.k.si...@intel.com
   Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com
  
  WBR, Sergei
  
 
 Oops, you are right.
 
 Regards,
 Jake

The actual code does use %u, but I made a typo in the description. I am
sending an rc2.

Regards,
JakeN�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

[PATCH v4 3/3] net/xen-netback: Don't mix hexa and decimal with 0x in the printf format

2015-06-16 Thread Julien Grall

Append 0x to all %x in order to avoid while reading when there is other
decimal value in the log.

Also replace some of the hexadecimal print to decimal to uniformize the
format with netfront.

Signed-off-by: Julien Grall julien.gr...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: netdev@vger.kernel.org

---
Changes in v4:
- Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index ba3ae30..11bd9d8 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -748,7 +748,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
slots++;
 
if (unlikely((txp-offset + txp-size)  PAGE_SIZE)) {
-   netdev_err(queue-vif-dev, Cross page boundary, 
txp-offset: %x, size: %u\n,
+   netdev_err(queue-vif-dev, Cross page boundary, 
txp-offset: %u, size: %u\n,
 txp-offset, txp-size);
xenvif_fatal_tx_err(queue-vif);
return -EINVAL;
@@ -874,7 +874,7 @@ static inline void xenvif_grant_handle_set(struct 
xenvif_queue *queue,
if (unlikely(queue-grant_tx_handle[pending_idx] !=
 NETBACK_INVALID_HANDLE)) {
netdev_err(queue-vif-dev,
-  Trying to overwrite active handle! pending_idx: 
%x\n,
+  Trying to overwrite active handle! pending_idx: 
0x%x\n,
   pending_idx);
BUG();
}
@@ -887,7 +887,7 @@ static inline void xenvif_grant_handle_reset(struct 
xenvif_queue *queue,
if (unlikely(queue-grant_tx_handle[pending_idx] ==
 NETBACK_INVALID_HANDLE)) {
netdev_err(queue-vif-dev,
-  Trying to unmap invalid handle! pending_idx: %x\n,
+  Trying to unmap invalid handle! pending_idx: 
0x%x\n,
   pending_idx);
BUG();
}
@@ -1243,7 +1243,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
*queue,
/* No crossing a page as the payload mustn't fragment. */
if (unlikely((txreq.offset + txreq.size)  PAGE_SIZE)) {
netdev_err(queue-vif-dev,
-  txreq.offset: %x, size: %u, end: %lu\n,
+  txreq.offset: %u, size: %u, end: %lu\n,
   txreq.offset, txreq.size,
   (unsigned long)(txreq.offset~PAGE_MASK) + 
txreq.size);
xenvif_fatal_tx_err(queue-vif);
@@ -1593,12 +1593,12 @@ static inline void xenvif_tx_dealloc_action(struct 
xenvif_queue *queue)
queue-pages_to_unmap,
gop - queue-tx_unmap_ops);
if (ret) {
-   netdev_err(queue-vif-dev, Unmap fail: nr_ops %tx ret 
%d\n,
+   netdev_err(queue-vif-dev, Unmap fail: nr_ops %tu ret 
%d\n,
   gop - queue-tx_unmap_ops, ret);
for (i = 0; i  gop - queue-tx_unmap_ops; ++i) {
if (gop[i].status != GNTST_okay)
netdev_err(queue-vif-dev,
-   host_addr: %llx handle: %x 
status: %d\n,
+   host_addr: 0x%llx handle: 
0x%x status: %d\n,
   gop[i].host_addr,
   gop[i].handle,
   gop[i].status);
@@ -1731,7 +1731,7 @@ void xenvif_idx_unmap(struct xenvif_queue *queue, u16 
pending_idx)
queue-mmap_pages[pending_idx], 1);
if (ret) {
netdev_err(queue-vif-dev,
-  Unmap fail: ret: %d pending_idx: %d host_addr: %llx 
handle: %x status: %d\n,
+  Unmap fail: ret: %d pending_idx: %d host_addr: %llx 
handle: 0x%x status: %d\n,
   ret,
   pending_idx,
   tx_unmap_op.host_addr,
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 2/3] net/xen-netback: Remove unused code in xenvif_rx_action

2015-06-16 Thread Julien Grall

The variables old_req_cons and ring_slots_used are assigned but never
used since commit 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b xen-netback:
always fully coalesce guest Rx packets.

Signed-off-by: Julien Grall julien.gr...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: netdev@vger.kernel.org

---
Changes in v2:
- Add Wei's Acked-by
---
 drivers/net/xen-netback/netback.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 0d25943..ba3ae30 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -515,14 +515,9 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
while (xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX)
(skb = xenvif_rx_dequeue(queue)) != NULL) {
-   RING_IDX old_req_cons;
-   RING_IDX ring_slots_used;
-
queue-last_rx_time = jiffies;
 
-   old_req_cons = queue-rx.req_cons;
XENVIF_RX_CB(skb)-meta_slots_used = xenvif_gop_skb(skb, npo, 
queue);
-   ring_slots_used = queue-rx.req_cons - old_req_cons;
 
__skb_queue_tail(rxq, skb);
}
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/3] net/xen-netback: Don't mix hexa and decimal with 0x in the printf format


Hello.

On 06/16/2015 10:10 PM, Julien Grall wrote:


Append 0x to all %x in order to avoid while reading when there is other
decimal value in the log.



Also replace some of the hexadecimal print to decimal to uniformize the
format with netfront.



Signed-off-by: Julien Grall julien.gr...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: netdev@vger.kernel.org



---
 Changes in v4:
 - Patch added
---
  drivers/net/xen-netback/netback.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)



diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index ba3ae30..11bd9d8 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c

[...]

@@ -874,7 +874,7 @@ static inline void xenvif_grant_handle_set(struct 
xenvif_queue *queue,
if (unlikely(queue-grant_tx_handle[pending_idx] !=
 NETBACK_INVALID_HANDLE)) {
netdev_err(queue-vif-dev,
-  Trying to overwrite active handle! pending_idx: 
%x\n,
+  Trying to overwrite active handle! pending_idx: 
0x%x\n,


   Using %#x is shorter ind does the same.


   pending_idx);
BUG();
}
@@ -887,7 +887,7 @@ static inline void xenvif_grant_handle_reset(struct 
xenvif_queue *queue,
if (unlikely(queue-grant_tx_handle[pending_idx] ==
 NETBACK_INVALID_HANDLE)) {
netdev_err(queue-vif-dev,
-  Trying to unmap invalid handle! pending_idx: %x\n,
+  Trying to unmap invalid handle! pending_idx: 
0x%x\n,


   Same here.

[...]

@@ -1731,7 +1731,7 @@ void xenvif_idx_unmap(struct xenvif_queue *queue, u16 
pending_idx)
queue-mmap_pages[pending_idx], 1);
if (ret) {
netdev_err(queue-vif-dev,
-  Unmap fail: ret: %d pending_idx: %d host_addr: %llx 
handle: %x status: %d\n,
+  Unmap fail: ret: %d pending_idx: %d host_addr: %llx 
handle: 0x%x status: %d\n,


   And here.

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/3] net/xen-netback: Don't mix hexa and decimal with 0x in the printf format


Hello.

On 06/17/2015 01:09 AM, Joe Perches wrote:


Append 0x to all %x in order to avoid while reading when there is other
decimal value in the log.



[]



@@ -874,7 +874,7 @@ static inline void xenvif_grant_handle_set(struct 
xenvif_queue *queue,
if (unlikely(queue-grant_tx_handle[pending_idx] !=
 NETBACK_INVALID_HANDLE)) {
netdev_err(queue-vif-dev,
-  Trying to overwrite active handle! pending_idx: 
%x\n,
+  Trying to overwrite active handle! pending_idx: 
0x%x\n,



 Using %#x is shorter ind does the same.



That's true, but it's also far less common.


   Which is a pity... People just don't know the format specifiers well 
enough. :-(



$ git grep -E %#[\*\d\.]*x | wc -l
1419
$ git grep 0x% | wc -l
29844


   Which means 29 KB could theoretically be saved on allyesconfig build. :-)
(Actually less since the width specifiers will likely need to be fixed where 
present.)


WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG,BISECTED] mvneta: second interface no more usable on mirabox

2015-06-16 Thread Arnaud Ebalard

Hi,

Stas Sergeev s...@list.ru writes:

 17.06.2015 00:44, Arnaud Ebalard пишет:
 Hi,

 On Mirabox, the second ethernet interface is no more usable on 4.1-rc*
 series (no packets coming out of the interface, when using dhclient for
 instance). It works as expected on 4.0.

 Bisecting the issue, I ended up on 898b2970e2c9 (mvneta: implement
 SGMII-based in-band link state signaling). Reverting that commit gives
 me back the second interface.

 Then, I also tested on a NETGEAR ReadyNAS 104, which is also powered by
 the same SoC (Armada 370) and also has two (mvneta-supported) ethernet
 interfaces. With an unmodified 4.1-rc8, only one of the two interfaces
 is available. Reverting 898b2970e2c9 makes both usable again.

 FWIW, mirabox and RN104 ethernet interfaces use RGMII.
 Hi, hope someone who can reproduce the problem,
 can provide a better help.
 I looked into a patch, and it seems most things are done
 under if (pp-use_inband_status), which is not your case.

Looking at the patch, yes. Both platforms I have which encounter the
problem use RGMII and if (pp-use_inband_status) changes are for
SGMII.

 But it seems the patch can still change a couple of flags
 for you, and maybe that makes a problem?

AFAICT, autoneg config register (MVNETA_GMAC_AUTONEG_CONFIG) is
modified. And the logic when link status changes is also modified:
 - MVNETA_GMAC_FORCE_LINK_DOWN flag cleared when there is carrier. It was
   previously set when that event occured.
 - The link down event logic is also modified.


 Please try the attached (absolutely untested) patch.
 diff --git a/drivers/net/ethernet/marvell/mvneta.c 
 b/drivers/net/ethernet/marvell/mvneta.c
 index ce5f7f9..74176ec 100644
 --- a/drivers/net/ethernet/marvell/mvneta.c
 +++ b/drivers/net/ethernet/marvell/mvneta.c
 @@ -1013,6 +1013,12 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
   val = mvreg_read(pp, MVNETA_GMAC_CLOCK_DIVIDER);
   val |= MVNETA_GMAC_1MS_CLOCK_ENABLE;
   mvreg_write(pp, MVNETA_GMAC_CLOCK_DIVIDER, val);
 + } else {
 + val = mvreg_read(pp, MVNETA_GMAC_AUTONEG_CONFIG);
 + val = ~(MVNETA_GMAC_INBAND_AN_ENABLE |
 +MVNETA_GMAC_AN_SPEED_EN |
 +MVNETA_GMAC_AN_DUPLEX_EN);
 + mvreg_write(pp, MVNETA_GMAC_AUTONEG_CONFIG, val);
   }
  
   mvneta_set_ucast_table(pp, -1);

*Second interface is back w/ that patch applied*. Cannot tell if it is a
 proper fix, though or a valid workaround.

Thanks for your feedback.

a+

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 0/2] sctp: add new getsockopt option SCTP_SOCKOPT_PEELOFF_KERNEL

2015-06-16 Thread Marcelo Ricardo Leitner

Hi,

I'm trying to remove a direct dependency of dlm module on sctp one.
Currently dlm code is calling sctp_do_peeloff() directly and only this
call is causing the load of sctp module together with dlm. For that, we
have basically 3 options:
- Doing a module split on dlm
  - which I'm avoiding because it was already split and was merged (more
info on patch2 changelog)
  - and the sctp code on it is rather small if compared with sctp module
itself
- Using some other infra that gets indirectly activated, like getsockopt()
  - It was like this before, but the exposed sockopt created a file
descriptor for the new socket and that create some serious issues.
More info on 2f2d76cc3e93 (dlm: Do not allocate a fd for peeloff)
- Doing something like ipv6_stub (which is used by vxlan) or similar
  - but I don't feel that's a good way out here, it doesn't feel right.

So I'm approaching this by going with 2nd option again but this time
also creating a new sockopt that is only accessible for kernel users of
this protocol, so that we are safe to directly return a struct socket *
via getsockopt() results. This is the tricky part of it of this series. 

It smells hacky yes but currently most of sctp calls are wrapped behind
kernel_*(). Even if we set a flag (like netlink does) saying that this
is a kernel socket, we still have the issue of getting the function call
through and returning such non-usual return value.

I kept __user marker on sctp_getsockopt_peeloff_kernel() prototype and
its helpers just to avoid issues with static checkers.

Kernel path not really tested yet.. mainly willing to know what do you
think, is this feasible? getsockopt option only reachable by kernel
itself? Couldn't find any other like this.

Thanks,
Marcelo

Marcelo Ricardo Leitner (2):
  sctp: add new getsockopt option SCTP_SOCKOPT_PEELOFF_KERNEL
  dlm: avoid using sctp_do_peeloff directly

 fs/dlm/lowcomms.c | 17 -
 include/uapi/linux/sctp.h | 12 
 net/sctp/socket.c | 39 +++
 3 files changed, 59 insertions(+), 9 deletions(-)

-- 
2.4.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 1/2] sctp: add new getsockopt option SCTP_SOCKOPT_PEELOFF_KERNEL

2015-06-16 Thread Marcelo Ricardo Leitner

SCTP has this operation to peel off associations from a given socket and
create a new socket using this association. We currently have two ways
to use this operation:
- via getsockopt(), on which it will also create and return a file
  descriptor for this new socket
- via sctp_do_peeloff(), which is for kernel only

The caveat with using sctp_do_peeloff() directly is that it creates a
dependency to SCTP module, while all other operations are handled via
kernel_{socket,sendmsg,getsockopt...}() interface. This causes the
kernel to load SCTP module even when it's not directly used

This patch then creates a new sockopt that is to be used only by kernel
users of this protocol. This new sockopt will not allocate a file
descriptor but instead just return the socket pointer directly.

If called by an user application, it will just return -EPERM.

Even though it's not intended for user applications, it's listed under
uapi header. That's because hidding this wouldn't add any extra security
and to keep the sockopt list in one place, so it's easy to check
available numbers to use.

Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
---
 include/uapi/linux/sctp.h | 12 
 net/sctp/socket.c | 39 +++
 2 files changed, 51 insertions(+)

diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index 
ce70fe6b45df3e841c35accbdb6379c16563893c..b3aad3ce456ab3c1ebf4d81fdb7269ba40b3d92a
 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -105,6 +105,10 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_SOCKOPT_BINDX_ADD 100 /* BINDX requests for adding addrs */
 #define SCTP_SOCKOPT_BINDX_REM 101 /* BINDX requests for removing addrs. */
 #define SCTP_SOCKOPT_PEELOFF   102 /* peel off association. */
+#define SCTP_SOCKOPT_PEELOFF_KERNEL103 /* peel off association.
+* only valid for kernel
+* users
+*/
 /* Options 104-106 are deprecated and removed. Do not use this space */
 #define SCTP_SOCKOPT_CONNECTX_OLD  107 /* CONNECTX old requests. */
 #define SCTP_GET_PEER_ADDRS108 /* Get all peer address. */
@@ -892,6 +896,14 @@ typedef struct {
int sd;
 } sctp_peeloff_arg_t;
 
+/* This is the union that is passed as an argument(optval) to
+ * getsockopt(SCTP_SOCKOPT_PEELOFF_KERNEL).
+ */
+typedef union {
+   sctp_assoc_t associd;
+   struct socket *socket;
+} sctp_peeloff_kernel_arg_t;
+
 /*
  *  Peer Address Thresholds socket option
  */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 
f09de7fac2e6acddad8b2e046dbf626e329cb674..dab6f9be260229f012e10e8f67c80ce99c0d2d06
 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4448,6 +4448,32 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, 
struct socket **sockp)
 }
 EXPORT_SYMBOL(sctp_do_peeloff);
 
+static int sctp_getsockopt_peeloff_kernel(struct sock *sk, int len,
+ char __user *optval, int __user 
*optlen)
+{
+   sctp_peeloff_kernel_arg_t peeloff;
+   struct socket *newsock;
+   int retval = 0;
+
+   if (len  sizeof(sctp_peeloff_kernel_arg_t))
+   return -EINVAL;
+   len = sizeof(sctp_peeloff_kernel_arg_t);
+   if (copy_from_user(peeloff, optval, len))
+   return -EFAULT;
+
+   retval = sctp_do_peeloff(sk, peeloff.associd, newsock);
+   if (retval  0)
+   goto out;
+
+   peeloff.socket = newsock;
+   if (copy_to_user(optval, peeloff, len)) {
+   sock_release(newsock);
+   return -EFAULT;
+   }
+out:
+   return retval;
+}
+
 static int sctp_getsockopt_peeloff(struct sock *sk, int len, char __user 
*optval, int __user *optlen)
 {
sctp_peeloff_arg_t peeloff;
@@ -5943,6 +5969,11 @@ static int sctp_getsockopt_recvnxtinfo(struct sock *sk,  
int len,
return 0;
 }
 
+static inline bool sctp_is_kernel(void)
+{
+   return segment_eq(get_fs(), KERNEL_DS);
+}
+
 static int sctp_getsockopt(struct sock *sk, int level, int optname,
   char __user *optval, int __user *optlen)
 {
@@ -5986,6 +6017,14 @@ static int sctp_getsockopt(struct sock *sk, int level, 
int optname,
case SCTP_SOCKOPT_PEELOFF:
retval = sctp_getsockopt_peeloff(sk, len, optval, optlen);
break;
+   case SCTP_SOCKOPT_PEELOFF_KERNEL:
+   if (!sctp_is_kernel()) {
+   retval = -EPERM;
+   break;
+   }
+   retval = sctp_getsockopt_peeloff_kernel(sk, len, optval,
+   optlen);
+   break;
case SCTP_PEER_ADDR_PARAMS:
retval = sctp_getsockopt_peer_addr_params(sk, len, optval,

[RFC PATCH 2/2] dlm: avoid using sctp_do_peeloff directly

2015-06-16 Thread Marcelo Ricardo Leitner

This patch reverts 2f2d76cc3e93 (dlm: Do not allocate a fd for
peeloff) but also makes use of a new sockopt:
SCTP_SOCKOPT_PEELOFF_KERNEL, which avoids allocating file descriptors
while doing this operation.

By this we avoid creating a direct dependency from dlm to sctp module,
which can then be left unloaded if dlm is not really using it.

Note that this was preferred other than a module split as it once was
split and was merged back in 2007 by commit 6ed7257b4670 ([DLM]
Consolidate transport protocols) so that we don't revert it.

Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
---
 fs/dlm/lowcomms.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 
754fd6c0b7470bab272b071e6ca6e4969e4e4209..aa50131e51ceaf2d56cc2252fe6c0c17b80af769
 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -52,7 +52,6 @@
 #include linux/mutex.h
 #include linux/sctp.h
 #include linux/slab.h
-#include net/sctp/sctp.h
 #include net/ipv6.h
 
 #include dlm_internal.h
@@ -671,6 +670,8 @@ static void process_sctp_notification(struct connection 
*con,
int prim_len, ret;
int addr_len;
struct connection *new_con;
+   sctp_peeloff_kernel_arg_t parg;
+   int parglen = sizeof(parg);
 
/*
 * We get this before any data for an association.
@@ -719,19 +720,17 @@ static void process_sctp_notification(struct connection 
*con,
return;
 
/* Peel off a new sock */
-   lock_sock(con-sock-sk);
-   ret = sctp_do_peeloff(con-sock-sk,
-   sn-sn_assoc_change.sac_assoc_id,
-   new_con-sock);
-   release_sock(con-sock-sk);
+   parg.associd = sn-sn_assoc_change.sac_assoc_id;
+   ret = kernel_getsockopt(con-sock, IPPROTO_SCTP,
+   SCTP_SOCKOPT_PEELOFF_KERNEL,
+   (void *)parg, parglen);
if (ret  0) {
log_print(Can't peel off a socket for 
  connection %d to node %d: err=%d,
- (int)sn-sn_assoc_change.sac_assoc_id,
- nodeid, ret);
+ parg.associd, nodeid, ret);
return;
}
-   add_sock(new_con-sock, new_con);
+   add_sock(parg.socket, new_con);
 
linger.l_onoff = 1;
linger.l_linger = 0;
-- 
2.4.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] cxgb3: avoid needless buffer copy for firmware

2015-06-16 Thread Kees Cook

There's no reason to perform a buffer copy for the firmware name. This
also avoids a (currently impossible with current callers) NULL dereference
if there was no matching firmware.

Signed-off-by: Kees Cook keesc...@chromium.org
---
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c 
b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
index b96e4bfcac41..8f7aa53a4c4b 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
@@ -1025,19 +1025,19 @@ int t3_get_edc_fw(struct cphy *phy, int edc_idx, int 
size)
 {
struct adapter *adapter = phy-adapter;
const struct firmware *fw;
-   char buf[64];
+   const char *fw_name;
u32 csum;
const __be32 *p;
u16 *cache = phy-phy_cache;
-   int i, ret;
-
-   snprintf(buf, sizeof(buf), get_edc_fw_name(edc_idx));
+   int i, ret = -EINVAL;
 
-   ret = request_firmware(fw, buf, adapter-pdev-dev);
+   fw_name = get_edc_fw_name(edc_idx);
+   if (fw_name)
+   ret = request_firmware(fw, fw_name, adapter-pdev-dev);
if (ret  0) {
dev_err(adapter-pdev-dev,
could not upgrade firmware: unable to load %s\n,
-   buf);
+   fw_name);
return ret;
}
 
-- 
1.9.1


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] isdn: disable HiSax NetJet driver on microblaze arch

2015-06-16 Thread Nicolai Stange

Fix an allmodconfig compilation failer on microblaze due to big endian
architectures being apparently unsupported by the NetJet code:
  drivers/isdn/hisax/nj_s.c: In function 'setup_netjet_s':
  drivers/isdn/hisax/nj_s.c:265:2:
  error: #error not running on big endian machines now

Modify the relevant Kconfig such that the NetJet code is not built on
microblaze anymore.

Note that endianess on microblaze is not determined through Kconfig,
but by means of a compiler provided CPP macro, namely __MICROBLAZEEL__.
However, gcc defaults to big endianess on that platform.

Signed-off-by: Nicolai Stange nicsta...@gmail.com
---
 The maintainer tree listed under ISDN SUBSYSTEM in MAINTAINERS does
 not exist anymore. I created the diff against the Linus tree.

 drivers/isdn/hisax/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/isdn/hisax/Kconfig b/drivers/isdn/hisax/Kconfig
index 97465ac..eb83d94 100644
--- a/drivers/isdn/hisax/Kconfig
+++ b/drivers/isdn/hisax/Kconfig
@@ -237,7 +237,7 @@ config HISAX_MIC
 
 config HISAX_NETJET
bool NETjet card
-   depends on PCI  (BROKEN || !(PPC || PARISC || M68K || (MIPS  
!CPU_LITTLE_ENDIAN) || FRV || (XTENSA  !CPU_LITTLE_ENDIAN)))
+   depends on PCI  (BROKEN || !(PPC || PARISC || M68K || (MIPS  
!CPU_LITTLE_ENDIAN) || FRV || (XTENSA  !CPU_LITTLE_ENDIAN) || MICROBLAZE))
depends on VIRT_TO_BUS
help
  This enables HiSax support for the NetJet from Traverse
@@ -249,7 +249,7 @@ config HISAX_NETJET
 
 config HISAX_NETJET_U
bool NETspider U card
-   depends on PCI  (BROKEN || !(PPC || PARISC || M68K || (MIPS  
!CPU_LITTLE_ENDIAN) || FRV || (XTENSA  !CPU_LITTLE_ENDIAN)))
+   depends on PCI  (BROKEN || !(PPC || PARISC || M68K || (MIPS  
!CPU_LITTLE_ENDIAN) || FRV || (XTENSA  !CPU_LITTLE_ENDIAN) || MICROBLAZE))
depends on VIRT_TO_BUS
help
  This enables HiSax support for the Netspider U interface ISDN card
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ax88179_178a: ethernet to usb dongle disconnect crash

2015-06-16 Thread Vivek Bhagat

Hi All,

I have connected my pc and TV board as below -

PC (network i/f)  ethernet to usb dongle --- usb port of TV board

When I power off  my board, i get a kernel crash.
Please have a look at log attached here.I debug and found that
unregister_netdev() in usbnet_disconnect() clears net_device object
and still skb is under processing which tries to access net_device
object and leads to crash.Dongle is using ax88179_178.ko module.

Backtrace:
[1-11856.1782] [c045ea38] (fib_compute_spec_dst+0x0/0x17c) from
[c04308c0] (ipv4_pktinfo_prepare+0x38/0x70)
[1-11856.1880]  r6: r5:c5d55800 r4:c5d55800
[1-11856.1926] [c0430888] (ipv4_pktinfo_prepare+0x0/0x70) from
[c04512d4] (udp_queue_rcv_skb+0x1c8/0x384)
[1-11856.2023]  r5:c5d55800 r4:cbdf4380
[1-11856.2059] [c045110c] (udp_queue_rcv_skb+0x0/0x384) from
[c0451590] (flush_stack+0x100/0x118)
[1-11856.2148]  r7:cd398480 r6: r5:cbdf4380 r4:c5d55800
[1-11856.2205] [c0451490] (flush_stack+0x0/0x118) from [c04518b0]
(__udp4_lib_mcast_deliver.isra.44+0x308/0x370)
[1-11856.2308] [c04515a8]
(__udp4_lib_mcast_deliver.isra.44+0x0/0x370) from [c0451dd0]
(__udp4_lib_rcv+0x4b8/0x588)
[1-11856.2413] [c0451918] (__udp4_lib_rcv+0x0/0x588) from
[c0451ec0] (udp_rcv+0x20/0x28)
[1-11856.2495] [c0451ea0] (udp_rcv+0x0/0x28) from [c042836c]
(ip_local_deliver_finish+0x118/0x27c)
[1-11856.2585] [c0428254] (ip_local_deliver_finish+0x0/0x27c) from
[c0428988] (ip_local_deliver+0x8c/0x98)
[1-11856.2683]  r7:d4b01a40 r6:cd398480 r5: r4:cd398480
[1-11856.2740] [c04288fc] (ip_local_deliver+0x0/0x98) from
[c0428798] (ip_rcv_finish+0x2c8/0x344)
[1-11856.2829]  r4:00295a38
[1-11856.2855] [c04284d0] (ip_rcv_finish+0x0/0x344) from
[c0428cb8] (ip_rcv+0x324/0x3f0)
[1-11856.2936]  r7:d4b01a40 r6:cd398480 r5:d2e8b720 r4:c0737500
[1-11856.2993] [c0428994] (ip_rcv+0x0/0x3f0) from [c03f69d4]
(__netif_receive_skb_core+0x4d0/0x568)
[1-11856.3084]  r7: r6: r5:c0708b34 r4:c070a728
[1-11856.3141] [c03f6504] (__netif_receive_skb_core+0x0/0x568) from
[c03f7038] (__netif_receive_skb+0x20/0x70)
[1-11856.3242] [c03f7018] (__netif_receive_skb+0x0/0x70) from
[c03f8584] (process_backlog+0xec/0x1e8)
[1-11856.3335]  r5: r4:d7c828c4
[1-11856.3371] [c03f8498] (process_backlog+0x0/0x1e8) from
[c03f82e4] (net_rx_action+0x104/0x2b8)
[1-11856.3460] [c03f81e0] (net_rx_action+0x0/0x2b8) from
[c0044460] (__do_softirq+0x180/0x304)
[1-11856.3548] [c00442e0] (__do_softirq+0x0/0x304) from [c0044708]
(do_softirq+0x74/0xc4)
[1-11856.3630] [c0044694] (do_softirq+0x0/0xc4) from [c03f56f0]
(netif_rx_ni+0x50/0x78)
[1-11856.3711]  r7:d1708780 r6:cd399380 r5: r4:c850a020
[1-11856.3768] [c03f56a0] (netif_rx_ni+0x0/0x78) from [c03f57ec]
(dev_loopback_xmit+0xd4/0xe0)
[1-11856.3855]  r5: r4:cd398480
[1-11856.3891] [c03f5718] (dev_loopback_xmit+0x0/0xe0) from
[c042da08] (ip_mc_output+0x114/0x234)
[1-11856.3980]  r5: r4:cd398480
[1-11856.4016] [c042d8f4] (ip_mc_output+0x0/0x234) from [c042d3cc]
(ip_local_out+0x38/0x3c)
[1-11856.4100]  r9:00c6 r8:d2e8b734 r7:cbdf7800 r6:c0737500 r5:c0737500
r4:cd399380
[1-11856.4179] [c042d394] (ip_local_out+0x0/0x3c) from [c042e5bc]
(ip_send_skb+0x20/0x88)
[1-11856.4261]  r5:c0737500 r4:cd399380
[1-11856.4297] [c042e59c] (ip_send_skb+0x0/0x88) from [c044e400]
(udp_send_skb+0x250/0x314)
[1-11856.4382]  r5: r4:cd399380
[1-11856.4417] [c044e1b0] (udp_send_skb+0x0/0x314) from [c04505b4]
(udp_sendmsg+0x6a8/0x6d0)
[1-11856.4503] [c044ff0c] (udp_sendmsg+0x0/0x6d0) from [c04587a8]
(inet_sendmsg+0x94/0xc4)
[1-11856.4586] [c0458714] (inet_sendmsg+0x0/0xc4) from [c03e2624]
(sock_sendmsg+0xa0/0xbc)
[1-11856.4670]  r7:c8406e40 r6:c850bf5c r5:00be r4:c4cda300
[1-11856.4726] [c03e2584] (sock_sendmsg+0x0/0xbc) from [c03e3c2c]
(___sys_sendmsg.part.16+0x1a0/0x244)
[1-11856.4820]  r7:c4cda300 r6: r5:c850be7c r4:
[1-11856.4877] [c03e3a8c] (___sys_sendmsg.part.16+0x0/0x244) from
[c03e4db4] (__sys_sendmsg+0x5c/0x80)
[1-11856.4971] [c03e4d58] (__sys_sendmsg+0x0/0x80) from [c03e4df0]
(SyS_sendmsg+0x18/0x1c)
[1-11856.5055]  r6:bdfda694 r5:bdfda60c r4:002841c0
[1-11856.5101] [c03e4dd8] (SyS_sendmsg+0x0/0x1c) from [c00114c0]
(ret_fast_syscall+0x0/0x48)

Please help.

Thanks,
Vivek
[0-11852.8705] usb 5-1: USB disconnect, device number 10
[0-11852.8720] ax88179_178a 5-1:1.0 eth1: Failed to read reg index 0x0002: -19
[0-11852.8785] ax88179_178a 5-1:1.0 eth1: Failed to write reg index 0x0002: -19
[0-11852.8866] [SA_DEBUG] trying to ifconfig down (net device close )
[0-11852.8929] Main Output MUTE !! MainOutGain[100]
[0-11852.8968]  DEV_NAME : eth1 PID: 372 (khubd) PPID: 2 (kthreadd)
[0-11852.9037] [SA_DEBUG] trying to ifconfig down (net device close )
[0-11852.9094] vivek, inetdev_event: NETDEV_UNREGISTER called
)/ie.wd(34)/g_no[0-11852.9157] vivek, inetdev_destroy freeing ip_ptr 
tilist(0x0018dcc[3-11852.9219] [bptime] process open : app-boot-manage(30031)
[1-11852.9219] vivek, ip_ptr is gone
[1-11852.9219] dual-tv (1493): undefined

[RESUBMIT Patch 1/1] net: replace if()/BUG with BUG_ON

2015-06-16 Thread Maninder Singh

Use BUG_ON(condition) instead of if(condition)/BUG() .

Signed-off-by: Maninder Singh maninder...@samsung.com
Reviewed-by: Akhilesh Kumar akhiles...@samsung.com
---
 net/packet/af_packet.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b5989c6..c91d405 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -547,8 +547,7 @@ static void prb_setup_retire_blk_timer(struct packet_sock 
*po, int tx_ring)
 {
struct tpacket_kbdq_core *pkc;
 
-   if (tx_ring)
-   BUG();
+   BUG_ON(tx_ring);
 
pkc = tx_ring ? GET_PBDQC_FROM_RB(po-tx_ring) :
GET_PBDQC_FROM_RB(po-rx_ring);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re: [PATCH 1/1] net: replace if()/BUG with BUG_ON()

2015-06-16 Thread Maninder Singh

Hi David, 

 Use BUG_ON(condition) instead of if(condition)/BUG()
 
 Signed-off-by: Maninder Singh maninder...@samsung.com
 Reviewed-by: Akhilesh Kumar akhiles...@samsung.com

Your email client corrupted this patch, making it unusable

I resent the patch with git send-mail , it results ok, hopefully it does not 
corrupt it.

Thanks

[PATCH net-next] cxgb4: Add PCI device ID for custom T522 T520 adapter

2015-06-16 Thread Hariprasad Shenai

Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h 
b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
index 1a9a6f3..d7ca106 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
@@ -153,6 +153,8 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN
CH_PCI_ID_TABLE_FENTRY(0x5088), /* Custom T570-CR */
CH_PCI_ID_TABLE_FENTRY(0x5089), /* Custom T520-CR */
CH_PCI_ID_TABLE_FENTRY(0x5090), /* Custom T540-CR */
+   CH_PCI_ID_TABLE_FENTRY(0x5091), /* Custom T522-CR */
+   CH_PCI_ID_TABLE_FENTRY(0x5092), /* Custom T520-CR */
 CH_PCI_DEVICE_ID_TABLE_DEFINE_END;
 
 #endif /* __T4_PCI_ID_TBL_H__ */
-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] Modify Liquidio Kconfig for crc lib

2015-06-16 Thread Raghu Vatsavayi

Following patch contains changes in liquidio Kconfig for
selecting LIBCRC32C.

Signed-off-by: Derek Chickles derek.chick...@caviumnetworks.com
Signed-off-by: Satanand Burla satananda.bu...@caviumnetworks.com
Signed-off-by: Felix Manlunas felix.manlu...@caviumnetworks.com
Signed-off-by: Raghu Vatsavayi raghu.vatsav...@caviumnetworks.com
---
 drivers/net/ethernet/cavium/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/Kconfig 
b/drivers/net/ethernet/cavium/Kconfig
index 5e7a0e2..c4d6bbe 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -46,7 +46,7 @@ config LIQUIDIO
depends on 64BIT
select PTP_1588_CLOCK
select FW_LOADER
-   select LIBCRC32
+   select LIBCRC32C
---help---
  This driver supports Cavium LiquidIO Intelligent Server Adapters
  based on CN66XX and CN68XX chips.
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/3] net/xen-netback: Don't mix hexa and decimal with 0x in the printf format

2015-06-16 Thread Joe Perches

On Wed, 2015-06-17 at 01:29 +0300, Sergei Shtylyov wrote:
 Hello.
 
 On 06/17/2015 01:09 AM, Joe Perches wrote:
 
  Append 0x to all %x in order to avoid while reading when there is other
  decimal value in the log.
 
  []
 
  @@ -874,7 +874,7 @@ static inline void xenvif_grant_handle_set(struct 
  xenvif_queue *queue,
if (unlikely(queue-grant_tx_handle[pending_idx] !=
 NETBACK_INVALID_HANDLE)) {
netdev_err(queue-vif-dev,
  -Trying to overwrite active handle! pending_idx: 
  %x\n,
  +Trying to overwrite active handle! pending_idx: 
  0x%x\n,
 
   Using %#x is shorter ind does the same.
 
  That's true, but it's also far less common.
 
 Which is a pity... People just don't know the format specifiers well 
 enough. :-(
 
  $ git grep -E %#[\*\d\.]*x | wc -l
  1419
  $ git grep 0x% | wc -l
  29844
 
 Which means 29 KB could theoretically be saved on allyesconfig build. :-)
 (Actually less since the width specifiers will likely need to be fixed where 
 present.)

And less than that because a lot of these are in
arch specific code.

0x%x is easier and simpler to visualize than %#x.

But you are welcome to try to make the kernel smaller.
One byte at a time.

There are ~14.5k uses of 0x%x in ~10.5k lines and
~2600 files that would be changed.

That's a lot of lines and a lot of patches.

$ git grep --name-only 0x%x | xargs sed -i -e 's/0x%x/%#x/g'
$ git diff | wc
  96250  415388 3949872

Only a 4M patch.

The pretty common (~5k) 0x%08x would be %#010x
so that doesn't save any space.

but this one's a ~3.5M patch.

$ git grep --name-only -P 0x%\d+\w*x | xargs perl -p -i -e 
's/0x%0(\d+)(\w*)x/\%#0 . eval($1 + 2) . $2x/eg'
$ git diff | wc
  80857  344565 3306990

enjoy...

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding

2015-06-16 Thread Scott Feldman

On Tue, Jun 16, 2015 at 2:11 PM, Jiri Pirko j...@resnulli.us wrote:
 Tue, Jun 16, 2015 at 06:47:47PM CEST, sfel...@gmail.com wrote:
On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko j...@resnulli.us wrote:
 Tue, Jun 16, 2015 at 01:25:51AM CEST, da...@davemloft.net wrote:
From: sfel...@gmail.com
Date: Sat, 13 Jun 2015 11:04:26 -0700

 The switchdev port driver must do two things:

 1) Generate a fwd_mark for each switch port, using some unique key of the
switch device (and optionally port).  This is a one-time operation done
when port's netdev is setup.

 2) On packet ingress from port, mark the skb with the ingress port's
fwd_mark.  If the device supports it, it's useful to only mark skbs
which were already forwarded by the device.  If the device does not
support such indication, all skbs can be marked, even if they're
local dst.

 Two new 32-bit fields are added to struct sk_buff and struct netdevice to
 hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. 
 I
 tried using skb-mark for this purpose, but ebtables can overwrite the
 skb-mark before the bridge gets it, so that will not work.

 In general, this fwd_mark can be used for any case where a packet is
 forwarded by the device and a copy is sent to the CPU, to avoid the kernel
 re-forwarding the packet.  sFlow is another use-case that comes to mind,
 but I haven't explored the details.

Generally I'm against adding new fields fo sk_buff but I'm trying to be
open minded. :-)

About the per-device fwd_mark, if the key attribute is uniqueness,
let's just do it right and use something like lib/idr.c to generate
truly unique indices at probe time for all devices using this
facility.  I like that better than having them be unique by a happy
accident.

 We already have per-device uniqueue key. dev-ifindex.
 That should be good for fwd_mark purposes I believe.

It would be great if we could use dev-index, but fwd_mark is really
to mark device ports that belong to a group.  In the case of a bridge,
the device ports in the bridge should all have the same mark.  And
another device's ports in the same bridge would have a different mark
(so we can't use the bridge's dev-ifindex).  On ingress, the skb is
marked with the ingress port's mark.  If the skb is to be forwarded
out an egress port, the skb mark is compared with egress port's mark.
If marks compare, then the device has already forwarded the pkt so the
kernel can consume_skb to avoid duplicate pkts on the wire.

So what we need is a unique mark for device ports within a fwding
group, such as a bridge.

 Yep, have a group of netdevs, pick one of them and use it's ifindex for
 the whole group.

That will not work because ports from two switches in the same bridge
need different marks...that's how the bridge knows which ports to fwd
and which ones to skip.

Example:

br0
   sw1p1 (mark=3)
   sw1p2 (mark=3)
   sw2p2 (mark=7)
   sw2p2 (mark=7)

Two switches, sw1 and sw2, in bridge br0.  Let's say sw1 receives an
unknown unicast pkt. It'll flood the pkt to its other switch ports
(sw1p2, in this case) and send a copy to the CPU (the bridge), with
skb-mark=3.  The bridge will flood pkt to sw1p2, sw2p1, and sw2p2,
but our little check in dev.c will drop the pkt to sw1p2 just before
egress onto wire.  Pkt goes out on sw2p1 and sw2p2.  This is why we
can't use just the br0 ifindex to generate the mark.  We need
something unique about the switch port and the bridge ifindex to give
us a mark for a port.

I'm going to send v2 which uses switch port ppid + some group ifindex
to generate port mark.  Group ifindex could be bond ifindex or bridge
ifindex or even zero if ports are L3 router ports and we want to prune
any L3 forwarding by the kernel.  There is some flexibility here,
depending on what we're trying to do.  We have the switch port ppid,
so we might as well use it.

-scott
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding

2015-06-16 Thread Jiri Pirko

Tue, Jun 16, 2015 at 01:25:51AM CEST, da...@davemloft.net wrote:
From: sfel...@gmail.com
Date: Sat, 13 Jun 2015 11:04:26 -0700

 The switchdev port driver must do two things:

 1) Generate a fwd_mark for each switch port, using some unique key of the
switch device (and optionally port).  This is a one-time operation done
when port's netdev is setup.

 2) On packet ingress from port, mark the skb with the ingress port's
fwd_mark.  If the device supports it, it's useful to only mark skbs
which were already forwarded by the device.  If the device does not
support such indication, all skbs can be marked, even if they're
local dst.

 Two new 32-bit fields are added to struct sk_buff and struct netdevice to
 hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
 tried using skb-mark for this purpose, but ebtables can overwrite the
 skb-mark before the bridge gets it, so that will not work.

 In general, this fwd_mark can be used for any case where a packet is
 forwarded by the device and a copy is sent to the CPU, to avoid the kernel
 re-forwarding the packet.  sFlow is another use-case that comes to mind,
 but I haven't explored the details.

Generally I'm against adding new fields fo sk_buff but I'm trying to be
open minded. :-)

About the per-device fwd_mark, if the key attribute is uniqueness,
let's just do it right and use something like lib/idr.c to generate
truly unique indices at probe time for all devices using this
facility.  I like that better than having them be unique by a happy
accident.

We already have per-device uniqueue key. dev-ifindex.
That should be good for fwd_mark purposes I believe.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events

On 16/06/2015 00:32, Hefty, Sean wrote:
  drivers/infiniband/core/cm.c | 7 +++
  include/rdma/ib_cm.h | 2 ++
  2 files changed, 9 insertions(+)

 diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
 index c5f5f89e274a..46f99ec4080a 100644
 --- a/drivers/infiniband/core/cm.c
 +++ b/drivers/infiniband/core/cm.c
 @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work
 *work,
  param-pkey = __be16_to_cpu(sidr_req_msg-pkey);
  param-listen_id = listen_id;
  param-service_id = sidr_req_msg-service_id;
 +if (work-mad_recv_wc-wc-wc_flags  IB_WC_GRH) {
 +param-grh = 1;
 +memcpy(param-dgid, work-mad_recv_wc-recv_buf.grh-dgid,
 +   sizeof(param-dgid));
 +} else {
 +param-grh = 0;
 
 What is the use case here?  Are you trying to sort by device?  How does the 
 GID of the GMP relate to the listen?

The idea is to allow SIDR request to be sorted by the GID, when we will
have alias GIDs for IPoIB.

Unlike the CM requests, SIDR requests do not contain the remote GID, so
I thought we could use the GID from the GRH and turn on GRH on such systems.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/11] IB/ipoib: Return IPoIB devices matching connection parameters

On 15/06/2015 20:22, Jason Gunthorpe wrote:
 On Mon, Jun 15, 2015 at 11:47:07AM +0300, Haggai Eran wrote:
 
 +/* Called with an RCU read lock taken */
 
 Add _rcu to the name? That is the standard convention.

Sure, I'll change that.

 
 +/* returns an IPoIB netdev on top a given ipoib device matching a pkey_index
 + * and address, if one exists. */
 +static struct net_device *ipoib_match_gid_pkey_addr(struct ipoib_dev_priv 
 *priv,
 +const union ib_gid *gid,
 +u16 pkey_index,
 +const struct sockaddr *addr)
 +{
 +struct ipoib_dev_priv *child_priv;
 +struct net_device *net_dev = NULL;
 +
 +if (priv-pkey_index == pkey_index 
 +(!gid || !memcmp(gid, priv-local_gid, sizeof(*gid {
 +net_dev = ipoib_get_net_dev_match_addr(addr, priv-dev);
 +if (net_dev)
 +return net_dev;
 
 As I said already, this should not even look at the sockaddr unless
 there are multiple possible hits on the other parameters,
What is the goal here? The only difference omitting the IP check will
make is when sending a request to a matching GID but with the wrong IP.
Is it important that we pass these requests here so that they will be
dropped at the rdma_cm module?

Also, note that ipoib_get_net_dev_match_addr can return a different
net_dev from the one ipoib created. When using bonding, it will find the
IP address on the bonding device, and return the bonding net_dev instead.

 and there
 should be a comment explaining the sockaddr is only a hack to make up
 for having an incomplete LLADDR.

Sure, I'll add a comment.

 
 That way people not using same guid children do not get incorrect
 functionality..
 
 +static struct net_device *ipoib_get_net_dev_by_params(
 +struct ib_device *dev, u8 port, u16 pkey,
 +const union ib_gid *gid, const struct sockaddr *addr)
 
 [..]
 
 +ret = ib_find_cached_pkey(dev, port, pkey, pkey_index);
 +if (ret)
 +return NULL;
 +
 +if (!rdma_protocol_ib(dev, port))
 +return NULL;
 
 This if should be first I'd think.

Okay.

 
 
 +dev_list = ib_get_client_data(dev, ipoib_client);
 +if (!dev_list)
 +return NULL;
 
 Is the locking OK here? This access protected by lists_rwsem -
 but for instance ib_unregister_device holds only the device_mutex when
 calling client-remove, which kfree's dev_list. Looks wrong to me.

I think you're right. Perhaps we can switch the client data to NULL in
ib_unregister_device under the lists_rwsem. Then the
ipoib_get_net_dev_by_params call will know to skip it. The remove()
callback will need to be augmented with the client data as a parameter,
because it won't be able to retrieve it using ib_get_client_data anymore.

Haggai
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab

On Mon, Jun 15, 2015 at 05:52:07PM +0200, Jesper Dangaard Brouer wrote:
 From: Christoph Lameter c...@linux.com
 
 [NOTICE: Already in AKPM's quilt-queue]
 
 First piece: acceleration of retrieval of per cpu objects
 
 If we are allocating lots of objects then it is advantageous to disable
 interrupts and avoid the this_cpu_cmpxchg() operation to get these objects
 faster.
 
 Note that we cannot do the fast operation if debugging is enabled, because
 we would have to add extra code to do all the debugging checks.  And it
 would not be fast anyway.
 
 Note also that the requirement of having interrupts disabled
 avoids having to do processor flag operations.
 
 Allocate as many objects as possible in the fast way and then fall back to
 the generic implementation for the rest of the objects.
 
 Signed-off-by: Christoph Lameter c...@linux.com
 Cc: Jesper Dangaard Brouer bro...@redhat.com
 Cc: Pekka Enberg penb...@kernel.org
 Cc: David Rientjes rient...@google.com
 Cc: Joonsoo Kim iamjoonsoo@lge.com
 Signed-off-by: Andrew Morton a...@linux-foundation.org
 ---
  mm/slub.c |   27 ++-
  1 file changed, 26 insertions(+), 1 deletion(-)
 
 diff --git a/mm/slub.c b/mm/slub.c
 index 80f17403e503..d18f8e195ac4 100644
 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk);
  bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
   void **p)
  {
 - return kmem_cache_alloc_bulk(s, flags, size, p);
 + if (!kmem_cache_debug(s)) {
 + struct kmem_cache_cpu *c;
 +
 + /* Drain objects in the per cpu slab */
 + local_irq_disable();
 + c = this_cpu_ptr(s-cpu_slab);
 +
 + while (size) {
 + void *object = c-freelist;
 +
 + if (!object)
 + break;
 +
 + c-freelist = get_freepointer(s, object);
 + *p++ = object;
 + size--;
 +
 + if (unlikely(flags  __GFP_ZERO))
 + memset(object, 0, s-object_size);
 + }
 + c-tid = next_tid(c-tid);
 +
 + local_irq_enable();
 + }
 +
 + return __kmem_cache_alloc_bulk(s, flags, size, p);
  }
  EXPORT_SYMBOL(kmem_cache_alloc_bulk);

Now I found that we need to call slab_pre_alloc_hook() before any operation
on kmem_cache to support kmemcg accounting. And, we need to call
slab_post_alloc_hook() on every allocated objects to support many
debugging features like as kasan and kmemleak

Thanks.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: initial bulk free implementation

On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote:
 This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator
 now both have bulk alloc and free implemented.
 
 Play nice and reenable local IRQs while calling slowpath.
 
 Signed-off-by: Jesper Dangaard Brouer bro...@redhat.com
 ---
  mm/slub.c |   32 +++-
  1 file changed, 31 insertions(+), 1 deletion(-)
 
 diff --git a/mm/slub.c b/mm/slub.c
 index 98d0e6f73ec1..cc4f870677bb 100644
 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free);
  
  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
  {
 - __kmem_cache_free_bulk(s, size, p);
 + struct kmem_cache_cpu *c;
 + struct page *page;
 + int i;
 +
 + local_irq_disable();
 + c = this_cpu_ptr(s-cpu_slab);
 +
 + for (i = 0; i  size; i++) {
 + void *object = p[i];
 +
 + if (unlikely(!object))
 + continue; // HOW ABOUT BUG_ON()???
 +
 + page = virt_to_head_page(object);
 + BUG_ON(s != page-slab_cache); /* Check if valid slab page */
 +
 + if (c-page == page) {
 + /* Fastpath: local CPU free */
 + set_freepointer(s, object, c-freelist);
 + c-freelist = object;
 + } else {
 + c-tid = next_tid(c-tid);
 + local_irq_enable();
 + /* Slowpath: overhead locked cmpxchg_double_slab */
 + __slab_free(s, page, object, _RET_IP_);
 + local_irq_disable();
 + c = this_cpu_ptr(s-cpu_slab);

SLUB free path doesn't need to irq management in many cases although
it uses cmpxchg_doule_slab. Is this really better than just calling
__kmem_cache_free_bulk()?

Thanks.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: initial bulk free implementation