Re: [RFC PATCH 0/2] Crypto kernel TLS socket

2015-11-23 Thread Hannes Frederic Sowa
Hello,

On Mon, Nov 23, 2015, at 18:42, Dave Watson wrote:
> An approach for a kernel TLS socket.
> 
> Only the symmetric encryption / decryption is done in-kernel, as well
> as minimal framing handling.  The handshake is kept in userspace, and
> the negotiated cipher / keys / IVs are then set on the algif_tls
> socket, which is then hooked in to a tcp socket using
> sk_write_space/sk_data_ready hooks.
> 
> If a non application-data TLS record is seen, it is left on the TCP
> socket and an error is returned on the ALG socket, and the record is
> left for userspace to manage. Userspace can't ignore the message, but
> could just close the socket.
> 
> TLS could potentially also be done directly on the TCP socket, but
> seemed a bit harder to work with the OOB data for non application_data
> messages, and the sockopts / CMSGS already exist for ALG sockets.  The
> flip side is having to manage two fds in userspace.
> 
> Some reasons we're looking at this:
> 
> 1) Access to sendfile/splice for CDN-type applications.  We were
>inspired by Netflix exploring this in FreeBSD
> 
>https://people.freebsd.org/~rrs/asiabsd_2015_tls.pdf
> 
>For perf, this patch is almost on par with userspace OpenSSL.
>Currently there are some copies and allocs to support
>scatter/gather in aesni-intel_glue.c, but with some extra work to
>remove those (not included here), a sendfile() is faster than the
>equivalent userspace read/SSL_write using a 128k buffer by 2~7%.

This argument is mood:

We already have mmap+vmsplice working on TCP sockets and ERR_MSGQUEUE
notifications are already send when to advance the window. Please
provide a benchmark using those already existing facilities.

I am pretty sure you at least need one data copy (as stated in the
referred paper). Linux kernel can do this in user space already. FreeBSD
only implements sendfile, thus this was the easier way for them to go.

> 2) Access to the unencrypted bytes in kernelspace.  For example, Tom
>Herbert's kcm would need this
> 
>https://lwn.net/Articles/657999/
> 
> 3) NIC offload. To support running aesni routines on the NIC instead
>of the processor, we would probably need enough of the framing
>interface put in kernel.

This would require adding TOE offloading. The kernel community was a
strong opponent to TOE offloading.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: add show_fdinfo handler for maps

2015-11-23 Thread Hannes Frederic Sowa
On Mon, Nov 23, 2015, at 20:09, John Fastabend wrote:
> On 15-11-23 10:03 AM, Alexei Starovoitov wrote:
> > On Mon, Nov 23, 2015 at 05:11:58PM +0100, Hannes Frederic Sowa wrote:
> >>
> >> Actually, that is the reason why I mentioned it, so *the admin* can see
> >> something is going on. Do you want to protect ebpf from root? Skynet? ;)
> > 
> > correct. To me both root and non-root are users in the first place and
> > they both shouldn't be allowed to misuse it.
> > 
> >> In my opinion the kernel never should hide any information of the admin
> >> if they are accessible easily. Sampling the number of failed updates to
> >> a map or printing it via procfs/ebpffs seems to be just a matter of how
> >> difficult it should be done. The map has a lock, so the number is fairly
> > 
> > map_lookup is actually lockless. It's a critical path and should be
> > as fast as possible. No extra stats just for debugging.
> > 
> >> accurate. Sampling and plotting size of hash maps without having kprobes
> >> installed would be a nice thing, because it reduces complexity and this
> >> is nice to have.
> > 
> > doing 'cat' from procfs is, of course, easier to use, but it's an extra
> > code that permenanetly lives in memory, whereas kprobe+bpf is a run-time
> > debugging.
> 
> Hopefully not jumping in off-base here (I've read most the thread), but
> what I've been doing is loading programs with debug ebpf code in them
> to keep a statistics map(s) and then I read that from  userspace for
> stats. It works pretty well and lets me compile out the debug code when
> I want and also doesn't need kprobe at all. Also I can implement
> sampling so that the debug code only runs every .01% or something like
> that so it can be used in "real" systems. My "real" systems are just a
> couple node test setup but it seems to be ok ;)

Ok, I am fine to wait until there is user demand.

Anyway, all you refer to is code you have under your control. I am
worried about bpf code that is not under my control.

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] Crypto kernel tls socket

2015-11-23 Thread Sowmini Varadhan
On (11/23/15 09:43), Dave Watson wrote:
> Currently gcm(aes) represents ~80% of our SSL connections.
> 
> Userspace interface:
> 
> 1) A transform and op socket are created using the userspace crypto interface
> 2) Setsockopt ALG_SET_AUTHSIZE is called
> 3) Setsockopt ALG_SET_KEY is called twice, since we need both send/recv keys
> 4) ALG_SET_IV cmsgs are sent twice, since we need both send/recv IVs.
>To support userspace heartbeats, changeciphersuite, etc, we would also need
>to get these back out, use them, then reset them via CMSG.
> 5) ALG_SET_OP cmsg is overloaded to mean FD to read/write from.

[from patch 0/2:]
> If a non application-data TLS record is seen, it is left on the TCP
> socket and an error is returned on the ALG socket, and the record is
> left for userspace to manage.

Interesting approach.

FWIW, I was hoping to discuss solutions for securing traffic tunnelled
over L3 at netdev 1.1, so hopefully we'll be able to go over the
trade-offs there. 

I'm trying to see how your approach would fit with the RDS-type of
use-case. RDS-TCP is mostly similar in concept to kcm,
except that rds has its own header for multiplexing, and has no 
dependancy on BPF for basic things like re-assembling the datagram. 
If I were to try to use this for RDS-TCP, the tls_tcp_read_sock() logic
would be merged into the recv_actor callback for RDS, right?  Thus tls
control-plane message could be seen in the middle of the
data-stream, so we really have to freeze the processing of the data
stream till the control-plane message is processed?

I'm concerned about the possiblilites for async that can happen when
we separate the data-plane from the control-plane (uspace tls
does not have to deal with this), but we now have control plane
separated from data-plane. (And IPsec/IKE has plenty of headaches
from this sort of thing already)

In the tls.c example that you have, the opfd is generated from
the accept() on the AF_ALG socket- how would this work if I wanted
my opfd to be a PF_RDS or a PF_KCM or similar?

One concern is that this patchset provides a solution for the "80%"
case but what about the other 20% (and the non x86 platforms)?
E.g., if I get a cipher-suite request outside the aes-ni, what would
happen (punt to uspace?)

--Sowmini

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction

2015-11-23 Thread Shi, Yang

Hi folks,

Any more comments on this patch (store immediate only)?

I need more time to add XADD (I'm supposed everyone agrees it is 
equivalent to atomic_add). However, this one is irrelevant to XADD, so 
we may be able to apply it first?


Thanks,
Yang


On 11/12/2015 7:45 PM, Z Lim wrote:

On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang  wrote:

On 11/11/2015 4:39 AM, Will Deacon wrote:


Wait a second, we're both talking rubbish here :) The STR (immediate)
form is referring to the addressing mode, whereas this patch wants to
store an immediate value to memory, which does need moving to a register
first.



Yes, the immediate means immediate offset for addressing index. Doesn't mean
to store immediate to memory.

I don't think any load-store architecture has store immediate instruction.



Indeed. Sorry for the noise.

Somehow Will caught a whiff of whatever I was smoking then :)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)

2015-11-23 Thread Hannes Frederic Sowa
Hello Tom,

On Mon, Nov 23, 2015, at 18:33, Tom Herbert wrote:
> > For me this still looks a little bit like messages could be delimited by
> > TCP PSH flag, where we might need to have some more fine grained control
> > over and besides that just adding better fanout semantics to TCP, no?
> >
> The TCP PSH flag is not defined for message delineation (neither is
> urgent pointer). We can't change that (many people have tried to add
> message semantics to TCP protocol but have always failed miserably).
> The fact is TCP is always going to be a stream based protocol. Period!
> :-) It is up to the application to interpret the stream and extract
> messages. Even if we could somehow apply the PSH bit to "help" in
> message delineation, we would need to change senders to use the PSH
> bit in that fashion for it to be of benefit to receivers.

I see TCP PSH flags as an optimization and I agree it is hard to
properly make use of them in the internet. But in a datacenter where
everything is under control, this could be done?

Anyway, decoding arbitrary messages in the kernel with maybe huge
lengths could result in starvation problems if you adhere to the socket
receive buffer limits at all time. So I wonder if forward progress
guarantee can be achieved here agnostic of the eBPF program? I really
see this becoming a problem as soon as people use it for privilege
separation. Will there be central error handling?

Also, would a TCP option make sense here to add instead of using the TCP
PSH flag? Not sure, yet...

> > Do kcm sockets still allow streaming unlimited amounts of data? E.g. if
> > you want to pass a data stream attached to a rpc message? I think not
> > allowing streaming is a major shortcoming then (even though this will
> > induce head of line blocking).
> >
> RPC messages can be of arbitrary size and with SOCK_SEQPACKET,
> messages can be sent or received in multiple calls. No HOL blocking
> since message are constructed on KCM sockets before starting to send
> on TCP sockets. Socket buffer limits are respected. KCM does not
> enforce a maximum message size, if an applications does have a maximum
> then that can be checked in the BPF code.

I was referring to the receivers end HOL blocking, the same as in user
space TCP, where one data stream (or huge message) keeps the byte stream
busy so no other datagrams in there can be delivered. For low latency I
would actually use multiple streams or switch to UDP with user space
based retry.

I think this problem more and more comes down to improve epoll interface
with somewhat better CPU steered wake-up capabilities to make it more
agnostic. Some programs e.g. want also be woken up if a HTTP header is
received completely, SO_RCVLOWAT was made for this, FreeBSD has
accept_filter for this kind.

You want to use this in thrift which is mainly Java based and reuse the
existing NIO infrastructure?

> >> Future support:
> >>
> >>  - Integration with TLS (TLS-in-kernel is a separate initiative).
> >
> > This is interesting:
> >
> > Regarding the last week's discussion about better OOB support in TCP
> > e.g. for SOCKET_DESTROY, do you already have a plan to handle TLS alerts
> > and do CHANGE_CIPHER on the socket synchronously?
> >
> Dave should be posting the basic TLS-in-the-kenel patches shortly,
> those will be a better context for discussion.

Thanks, I am looking at them right now. :)

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 12/15] i40evf: fix compiler warning of unused variable

2015-11-23 Thread Jeff Kirsher
From: Jesse Brandeburg 

Compiler complained of an unused variable, which the driver was just
using to store the result of a rd32 which is used to clear a register
unconditionally.  Just drop the unused variable and re-use one.

Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index d962164..6ad6265 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -307,10 +307,9 @@ static irqreturn_t i40evf_msix_aq(int irq, void *data)
struct i40e_hw *hw = >hw;
u32 val;
 
-   /* handle non-queue interrupts */
-   rd32(hw, I40E_VFINT_ICR01);
-   rd32(hw, I40E_VFINT_ICR0_ENA1);
-
+   /* handle non-queue interrupts, these reads clear the registers */
+   val = rd32(hw, I40E_VFINT_ICR01);
+   val = rd32(hw, I40E_VFINT_ICR0_ENA1);
 
val = rd32(hw, I40E_VFINT_DYN_CTL01) |
  I40E_VFINT_DYN_CTL01_CLEARPBA_MASK;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 11/15] ixgbe: Remove CS4227 diagnostic code

2015-11-23 Thread Jeff Kirsher
From: Mark Rustad 

Testing has now shown that the diagnostic code used with the CS4227
is no longer needed, so remove it.

Signed-off-by: Mark Rustad 
Tested-by: Darin Miller 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 80 ---
 1 file changed, 80 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index 005f01b..bf2ae8d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -87,79 +87,6 @@ static s32 ixgbe_write_cs4227(struct ixgbe_hw *hw, u16 reg, 
u16 value)
 }
 
 /**
- * ixgbe_check_cs4227_reg - Perform diag on a CS4227 register
- * @hw: pointer to hardware structure
- * @reg: the register to check
- *
- * Performs a diagnostic on a register in the CS4227 chip. Returns an error
- * if it is not operating correctly.
- * This function assumes that the caller has acquired the proper semaphore.
- */
-static s32 ixgbe_check_cs4227_reg(struct ixgbe_hw *hw, u16 reg)
-{
-   s32 status;
-   u32 retry;
-   u16 reg_val;
-
-   reg_val = (IXGBE_CS4227_EDC_MODE_DIAG << 1) | 1;
-   status = ixgbe_write_cs4227(hw, reg, reg_val);
-   if (status)
-   return status;
-   for (retry = 0; retry < IXGBE_CS4227_RETRIES; retry++) {
-   msleep(IXGBE_CS4227_CHECK_DELAY);
-   reg_val = 0x;
-   ixgbe_read_cs4227(hw, reg, _val);
-   if (!reg_val)
-   break;
-   }
-   if (reg_val) {
-   hw_err(hw, "CS4227 reg 0x%04X failed diagnostic\n", reg);
-   return status;
-   }
-
-   return 0;
-}
-
-/**
- * ixgbe_get_cs4227_status - Return CS4227 status
- * @hw: pointer to hardware structure
- *
- * Performs a diagnostic on the CS4227 chip. Returns an error if it is
- * not operating correctly.
- * This function assumes that the caller has acquired the proper semaphore.
- */
-static s32 ixgbe_get_cs4227_status(struct ixgbe_hw *hw)
-{
-   s32 status;
-   u16 value = 0;
-
-   /* Exit if the diagnostic has already been performed. */
-   status = ixgbe_read_cs4227(hw, IXGBE_CS4227_SCRATCH, );
-   if (status)
-   return status;
-   if (value == IXGBE_CS4227_RESET_COMPLETE)
-   return 0;
-
-   /* Check port 0. */
-   status = ixgbe_check_cs4227_reg(hw, IXGBE_CS4227_LINE_SPARE24_LSB);
-   if (status)
-   return status;
-
-   status = ixgbe_check_cs4227_reg(hw, IXGBE_CS4227_HOST_SPARE24_LSB);
-   if (status)
-   return status;
-
-   /* Check port 1. */
-   status = ixgbe_check_cs4227_reg(hw, IXGBE_CS4227_LINE_SPARE24_LSB +
-   (1 << 12));
-   if (status)
-   return status;
-
-   return ixgbe_check_cs4227_reg(hw, IXGBE_CS4227_HOST_SPARE24_LSB +
- (1 << 12));
-}
-
-/**
  * ixgbe_read_pe - Read register from port expander
  * @hw: pointer to hardware structure
  * @reg: register number to read
@@ -328,13 +255,6 @@ static void ixgbe_check_cs4227(struct ixgbe_hw *hw)
return;
}
 
-   /* Is the CS4227 working correctly? */
-   status = ixgbe_get_cs4227_status(hw);
-   if (status) {
-   hw_err(hw, "CS4227 status failed: %d", status);
-   goto out;
-   }
-
/* Record completion for next time. */
status = ixgbe_write_cs4227(hw, IXGBE_CS4227_SCRATCH,
IXGBE_CS4227_RESET_COMPLETE);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 14/15] ixgbevf: fix spoofed packets with random MAC

2015-11-23 Thread Jeff Kirsher
From: Emil Tantilov 

If ixgbevf is loaded while the corresponding PF interface is down
and the driver assigns a random MAC address, that address can be
overwritten with the value of hw->mac.perm_addr, which would be 0 at
that point.

To avoid this case we init hw->mac.perm_addr to the randomly generated
address and do not set it unless we receive ACK from ixgbe.

Reported-by: John Greene 
Signed-off-by: Emil Tantilov 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 +
 drivers/net/ethernet/intel/ixgbevf/vf.c   | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 47c71e1..dbbd1be 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2664,6 +2664,7 @@ static int ixgbevf_sw_init(struct ixgbevf_adapter 
*adapter)
dev_info(>dev, "Assigning random MAC address\n");
eth_hw_addr_random(netdev);
ether_addr_copy(hw->mac.addr, netdev->dev_addr);
+   ether_addr_copy(hw->mac.perm_addr, netdev->dev_addr);
}
 
/* Enable dynamic interrupt throttling rates */
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c 
b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 427f360..61a98f4 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -117,7 +117,9 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
msgbuf[0] != (IXGBE_VF_RESET | IXGBE_VT_MSGTYPE_NACK))
return IXGBE_ERR_INVALID_MAC_ADDR;
 
-   ether_addr_copy(hw->mac.perm_addr, addr);
+   if (msgbuf[0] == (IXGBE_VF_RESET | IXGBE_VT_MSGTYPE_ACK))
+   ether_addr_copy(hw->mac.perm_addr, addr);
+
hw->mac.mc_filter_type = msgbuf[IXGBE_VF_MC_TYPE_WORD];
 
return 0;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 01/15] fm10k: do not assume VF always has 1 queue

2015-11-23 Thread Jeff Kirsher
From: Jacob Keller 

It is possible that the PF has not yet assigned resources to the VF.
Although rare, this could result in the VF attempting to read queues it
does not own and result in FUM or THI faults in the PF. To prevent this,
check queue 0 before we continue in init_hw_vf.

Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_type.h | 1 +
 drivers/net/ethernet/intel/fm10k/fm10k_vf.c   | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_type.h 
b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
index 318a212..35afd71 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_type.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
@@ -77,6 +77,7 @@ struct fm10k_hw;
 #define FM10K_PCIE_SRIOV_CTRL_VFARI0x10
 
 #define FM10K_ERR_PARAM-2
+#define FM10K_ERR_NO_RESOURCES -3
 #define FM10K_ERR_REQUESTS_PENDING -4
 #define FM10K_ERR_RESET_REQUESTED  -5
 #define FM10K_ERR_DMA_PENDING  -6
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
index 36c8b0a..3a18ef1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
@@ -103,7 +103,12 @@ static s32 fm10k_init_hw_vf(struct fm10k_hw *hw)
s32 err;
u16 i;
 
-   /* assume we always have at least 1 queue */
+   /* verify we have at least 1 queue */
+   if (!~fm10k_read_reg(hw, FM10K_TXQCTL(0)) ||
+   !~fm10k_read_reg(hw, FM10K_RXQCTL(0)))
+   return FM10K_ERR_NO_RESOURCES;
+
+   /* determine how many queues we have */
for (i = 1; tqdloc0 && (i < FM10K_MAX_QUEUES_POOL); i++) {
/* verify the Descriptor cache offsets are increasing */
tqdloc = ~fm10k_read_reg(hw, FM10K_TQDLOC(i));
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 07/15] ixgbe: Fix handling of NAPI budget when multiple queues are enabled per vector

2015-11-23 Thread Jeff Kirsher
From: Alexander Duyck 

This patch corrects an issue in which the polling routine would increase
the budget for Rx to at least 1 per queue if multiple queues were present.
This would result in Rx packets being processed when the budget was 0 which
is meant to indicate that no Rx can be handled.

Signed-off-by: Alexander Duyck 
Tested-by: Darin Miller 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c9b7e5e..4fa94a3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2783,7 +2783,8 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
ixgbe_for_each_ring(ring, q_vector->tx)
clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
 
-   if (!ixgbe_qv_lock_napi(q_vector))
+   /* Exit if we are called by netpoll or busy polling is active */
+   if ((budget <= 0) || !ixgbe_qv_lock_napi(q_vector))
return budget;
 
/* attempt to distribute budget to each queue fairly, but don't allow
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 04/15] ixgbe: Delete redundant include file

2015-11-23 Thread Jeff Kirsher
From: Mark Rustad 

Delete a redundant include of net/vxlan.h.

Signed-off-by: Mark Rustad 
Tested-by: Darin Miller 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4089d77..450db04 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -65,9 +65,6 @@
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
-#ifdef CONFIG_IXGBE_VXLAN
-#include 
-#endif
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 05/15] ixgbe: fix multiple kernel-doc errors

2015-11-23 Thread Jeff Kirsher
From: Jean Sacren 

The commit dfaf891dd3e1 ("ixgbe: Refactor the RSS configuration code")
introduced a few kernel-doc errors:

1) The function name is missing;
2) The format is wrong;
3) The short description is redundant.

Fix all the above for the correct execution of the kernel doc.

Signed-off-by: Jean Sacren 
Tested-by: Darin Miller 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 450db04..c9b7e5e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3312,8 +3312,7 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter 
*adapter,
 }
 
 /**
- * Return a number of entries in the RSS indirection table
- *
+ * ixgbe_rss_indir_tbl_entries - Return RSS indirection table entries
  * @adapter: device handle
  *
  *  - 82598/82599/X540: 128
@@ -3331,8 +3330,7 @@ u32 ixgbe_rss_indir_tbl_entries(struct ixgbe_adapter 
*adapter)
 }
 
 /**
- * Write the RETA table to HW
- *
+ * ixgbe_store_reta - Write the RETA table to HW
  * @adapter: device handle
  *
  * Write the RSS redirection table stored in adapter.rss_indir_tbl[] to HW.
@@ -3371,8 +3369,7 @@ void ixgbe_store_reta(struct ixgbe_adapter *adapter)
 }
 
 /**
- * Write the RETA table to HW (for x550 devices in SRIOV mode)
- *
+ * ixgbe_store_vfreta - Write the RETA table to HW (x550 devices in SRIOV mode)
  * @adapter: device handle
  *
  * Write the RSS redirection table stored in adapter.rss_indir_tbl[] to HW.
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 03/15] ixgbe: drop null test before destroy functions

2015-11-23 Thread Jeff Kirsher
From: Julia Lawall 

Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@@ expression x; @@
-if (x != NULL)
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// 

Signed-off-by: Julia Lawall 
Tested-by: Darin Miller 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index 631c603..5f98870 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -620,8 +620,7 @@ static void ixgbe_fcoe_dma_pool_free(struct ixgbe_fcoe 
*fcoe, unsigned int cpu)
struct ixgbe_fcoe_ddp_pool *ddp_pool;
 
ddp_pool = per_cpu_ptr(fcoe->ddp_pool, cpu);
-   if (ddp_pool->pool)
-   dma_pool_destroy(ddp_pool->pool);
+   dma_pool_destroy(ddp_pool->pool);
ddp_pool->pool = NULL;
 }
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 10/15] ixgbe/ixgbevf: use napi_schedule_irqoff()

2015-11-23 Thread Jeff Kirsher
From: Alexander Duyck 

The ixgbe_intr and ixgbe/ixgbevf_msix_clean_rings functions run from hard
interrupt context or with interrupts already disabled in netpoll.

They can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Alexander Duyck 
Tested-by: Darin Miller 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4fa94a3..c95042e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2754,7 +2754,7 @@ static irqreturn_t ixgbe_msix_clean_rings(int irq, void 
*data)
/* EIAM disabled interrupts (on this vector) for us */
 
if (q_vector->rx.ring || q_vector->tx.ring)
-   napi_schedule(_vector->napi);
+   napi_schedule_irqoff(_vector->napi);
 
return IRQ_HANDLED;
 }
@@ -2948,7 +2948,7 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
ixgbe_ptp_check_pps_event(adapter, eicr);
 
/* would disable interrupts here but EIAM disabled it */
-   napi_schedule(_vector->napi);
+   napi_schedule_irqoff(_vector->napi);
 
/*
 * re-enable link(maybe) and non-queue interrupts, no flush.
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index e678178..1b15f95 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1288,7 +1288,7 @@ static irqreturn_t ixgbevf_msix_clean_rings(int irq, void 
*data)
 
/* EIAM disabled interrupts (on this vector) for us */
if (q_vector->rx.ring || q_vector->tx.ring)
-   napi_schedule(_vector->napi);
+   napi_schedule_irqoff(_vector->napi);
 
return IRQ_HANDLED;
 }
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 09/15] ixgbevf: Limit lowest interrupt rate for adaptive interrupt moderation to 12K

2015-11-23 Thread Jeff Kirsher
From: Alexander Duyck 

This patch is the ixgbevf version of commit 8ac34f10a5ea4 "ixgbe: Limit
lowest interrupt rate for adaptive interrupt moderation to 12K"

The same logic applies here as well as the same results since a netperf
test will starve for memory in the time from one Tx interrupt to the next.
As a result the ixgbevf driver underperformed when compared to vhost_net.

Signed-off-by: Alexander Duyck 
Tested-by: Darin Miller 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c  | 2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  | 3 +--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c 
b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index d3e5f5b..c48aef6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -774,7 +774,7 @@ static int ixgbevf_set_coalesce(struct net_device *netdev,
adapter->tx_itr_setting = ec->tx_coalesce_usecs;
 
if (adapter->tx_itr_setting == 1)
-   tx_itr_param = IXGBE_10K_ITR;
+   tx_itr_param = IXGBE_12K_ITR;
else
tx_itr_param = adapter->tx_itr_setting;
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index ec31472..68ec7daa 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -326,8 +326,7 @@ static inline bool ixgbevf_qv_disable(struct 
ixgbevf_q_vector *q_vector)
 #define IXGBE_MIN_RSC_ITR  24
 #define IXGBE_100K_ITR 40
 #define IXGBE_20K_ITR  200
-#define IXGBE_10K_ITR  400
-#define IXGBE_8K_ITR   500
+#define IXGBE_12K_ITR  336
 
 /* Helper macros to switch between ints/sec and what the register uses.
  * And yes, it's the same math going both ways.  The lowest value
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 2955186..e678178 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1138,7 +1138,7 @@ static void ixgbevf_configure_msix(struct ixgbevf_adapter 
*adapter)
if (q_vector->tx.ring && !q_vector->rx.ring) {
/* Tx only vector */
if (adapter->tx_itr_setting == 1)
-   q_vector->itr = IXGBE_10K_ITR;
+   q_vector->itr = IXGBE_12K_ITR;
else
q_vector->itr = adapter->tx_itr_setting;
} else {
@@ -1196,7 +1196,7 @@ static void ixgbevf_update_itr(struct ixgbevf_q_vector 
*q_vector,
/* simple throttle rate management
 *0-20MB/s lowest (10 ints/s)
 *   20-100MB/s low   (2 ints/s)
-*  100-1249MB/s bulk (8000 ints/s)
+*  100-1249MB/s bulk (12000 ints/s)
 */
/* what was last interrupt timeslice? */
timepassed_us = q_vector->itr >> 2;
@@ -1247,7 +1247,7 @@ static void ixgbevf_set_itr(struct ixgbevf_q_vector 
*q_vector)
break;
case bulk_latency:
default:
-   new_itr = IXGBE_8K_ITR;
+   new_itr = IXGBE_12K_ITR;
break;
}
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 06/15] fm10k: Fix handling of NAPI budget when multiple queues are enabled per vector

2015-11-23 Thread Jeff Kirsher
From: Alexander Duyck 

This patch corrects an issue in which the polling routine would increase
the budget for Rx to at least 1 per queue if multiple queues were present.
This would result in Rx packets being processed when the budget was 0 which
is meant to indicate that no Rx can be handled.

Signed-off-by: Alexander Duyck 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index e76a44c..746a198 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1428,6 +1428,10 @@ static int fm10k_poll(struct napi_struct *napi, int 
budget)
fm10k_for_each_ring(ring, q_vector->tx)
clean_complete &= fm10k_clean_tx_irq(q_vector, ring);
 
+   /* Handle case where we are called by netpoll with a budget of 0 */
+   if (budget <= 0)
+   return budget;
+
/* attempt to distribute budget to each queue fairly, but don't
 * allow the budget to go below 1 because we'll exit polling
 */
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 02/15] fm10k: Correct MTU for jumbo frames

2015-11-23 Thread Jeff Kirsher
From: Jacob Keller 

Based on hardware testing, the host interface supports up to 15368 bytes
as the maximum frame size. To determine the correct MTU, we subtract 8
for the internal switch tag, 14 for the L2 header, and 4 for the
appended FCS header, resulting in 15342 bytes of payload for our maximum
MTU on jumbo frames.

Signed-off-by: Matthew Vick 
Signed-off-by: Jacob Keller 
Acked-by: Bruce Allan 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h 
b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 1444020..48809e5 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -33,7 +33,7 @@
 #include "fm10k_pf.h"
 #include "fm10k_vf.h"
 
-#define FM10K_MAX_JUMBO_FRAME_SIZE 15358   /* Maximum supported size 15K */
+#define FM10K_MAX_JUMBO_FRAME_SIZE 15342   /* Maximum supported size 15K */
 
 #define MAX_QUEUES FM10K_MAX_QUEUES_PF
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 08/15] ixgbe: Add KR mode support for CS4227 chip

2015-11-23 Thread Jeff Kirsher
From: Mark Rustad 

KR auto-neg mode is what we will be using going forward. The SW
interface for this mode is different that what was used for iXFI.

Signed-off-by: Mark Rustad 
Tested-by: Phil Schmitt 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 85 +++
 1 file changed, 62 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index ebe0ac9..005f01b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -26,6 +26,8 @@
 #include "ixgbe_common.h"
 #include "ixgbe_phy.h"
 
+static s32 ixgbe_setup_kr_speed_x550em(struct ixgbe_hw *, ixgbe_link_speed);
+
 static s32 ixgbe_get_invariants_X550_x(struct ixgbe_hw *hw)
 {
struct ixgbe_mac_info *mac = >mac;
@@ -1257,31 +1259,71 @@ ixgbe_setup_mac_link_sfp_x550em(struct ixgbe_hw *hw,
if (status)
return status;
 
-   /* Configure CS4227 LINE side to 10G SR. */
-   slice = IXGBE_CS4227_LINE_SPARE22_MSB + (hw->bus.lan_id << 12);
-   value = IXGBE_CS4227_SPEED_10G;
-   status = ixgbe_write_i2c_combined_generic(hw, IXGBE_CS4227, slice,
- value);
-
-   /* Configure CS4227 for HOST connection rate then type. */
-   slice = IXGBE_CS4227_HOST_SPARE22_MSB + (hw->bus.lan_id << 12);
-   value = speed & IXGBE_LINK_SPEED_10GB_FULL ?
-   IXGBE_CS4227_SPEED_10G : IXGBE_CS4227_SPEED_1G;
-   status = ixgbe_write_i2c_combined_generic(hw, IXGBE_CS4227, slice,
- value);
+   if (!(hw->phy.nw_mng_if_sel & IXGBE_NW_MNG_IF_SEL_INT_PHY_MODE)) {
+   /* Configure CS4227 LINE side to 10G SR. */
+   slice = IXGBE_CS4227_LINE_SPARE22_MSB + (hw->bus.lan_id << 12);
+   value = IXGBE_CS4227_SPEED_10G;
+   status = ixgbe_write_i2c_combined_generic(hw, IXGBE_CS4227,
+ slice, value);
+   if (status)
+   goto i2c_err;
 
-   slice = IXGBE_CS4227_HOST_SPARE24_LSB + (hw->bus.lan_id << 12);
-   if (setup_linear)
-   value = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 1;
-   else
+   slice = IXGBE_CS4227_LINE_SPARE24_LSB + (hw->bus.lan_id << 12);
value = (IXGBE_CS4227_EDC_MODE_SR << 1) | 1;
-   status = ixgbe_write_i2c_combined_generic(hw, IXGBE_CS4227, slice,
- value);
+   status = ixgbe_write_i2c_combined_generic(hw, IXGBE_CS4227,
+ slice, value);
+   if (status)
+   goto i2c_err;
+
+   /* Configure CS4227 for HOST connection rate then type. */
+   slice = IXGBE_CS4227_HOST_SPARE22_MSB + (hw->bus.lan_id << 12);
+   value = speed & IXGBE_LINK_SPEED_10GB_FULL ?
+   IXGBE_CS4227_SPEED_10G : IXGBE_CS4227_SPEED_1G;
+   status = ixgbe_write_i2c_combined_generic(hw, IXGBE_CS4227,
+ slice, value);
+   if (status)
+   goto i2c_err;
 
-   /* If internal link mode is XFI, then setup XFI internal link. */
-   if (!(hw->phy.nw_mng_if_sel & IXGBE_NW_MNG_IF_SEL_INT_PHY_MODE))
+   slice = IXGBE_CS4227_HOST_SPARE24_LSB + (hw->bus.lan_id << 12);
+   if (setup_linear)
+   value = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 1;
+   else
+   value = (IXGBE_CS4227_EDC_MODE_SR << 1) | 1;
+   status = ixgbe_write_i2c_combined_generic(hw, IXGBE_CS4227,
+ slice, value);
+   if (status)
+   goto i2c_err;
+
+   /* Setup XFI internal link. */
status = ixgbe_setup_ixfi_x550em(hw, );
+   if (status) {
+   hw_dbg(hw, "setup_ixfi failed with %d\n", status);
+   return status;
+   }
+   } else {
+   /* Configure internal PHY for KR/KX. */
+   status = ixgbe_setup_kr_speed_x550em(hw, speed);
+   if (status) {
+   hw_dbg(hw, "setup_kr_speed failed with %d\n", status);
+   return status;
+   }
 
+   /* Configure CS4227 LINE side to proper mode. */
+   slice = IXGBE_CS4227_LINE_SPARE24_LSB + (hw->bus.lan_id << 12);
+   if (setup_linear)
+   value = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 1;
+   else
+   value = (IXGBE_CS4227_EDC_MODE_SR << 1) | 1;
+  

[net-next 15/15] intel: i40e: fix confused code

2015-11-23 Thread Jeff Kirsher
From: Rasmus Villemoes 

This code is pretty confused. The variable name 'bytes_not_copied'
clearly indicates that the programmer knew the semantics of
copy_{to,from}_user, but then the return value is checked for being
negative and used as a -Exxx return value.

I'm not sure this is the proper fix, but at least we get rid of the
dead code which pretended to check for access faults.

Signed-off-by: Rasmus Villemoes 
Acked-by: Shannon Nelson 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index d4b7af9..d1a91c8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -103,8 +103,8 @@ static ssize_t i40e_dbg_dump_read(struct file *filp, char 
__user *buffer,
len = min_t(int, count, (i40e_dbg_dump_data_len - *ppos));
 
bytes_not_copied = copy_to_user(buffer, _dbg_dump_buf[*ppos], len);
-   if (bytes_not_copied < 0)
-   return bytes_not_copied;
+   if (bytes_not_copied)
+   return -EFAULT;
 
*ppos += len;
return len;
@@ -353,8 +353,8 @@ static ssize_t i40e_dbg_command_read(struct file *filp, 
char __user *buffer,
bytes_not_copied = copy_to_user(buffer, buf, len);
kfree(buf);
 
-   if (bytes_not_copied < 0)
-   return bytes_not_copied;
+   if (bytes_not_copied)
+   return -EFAULT;
 
*ppos = len;
return len;
@@ -981,12 +981,10 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
if (!cmd_buf)
return count;
bytes_not_copied = copy_from_user(cmd_buf, buffer, count);
-   if (bytes_not_copied < 0) {
+   if (bytes_not_copied) {
kfree(cmd_buf);
-   return bytes_not_copied;
+   return -EFAULT;
}
-   if (bytes_not_copied > 0)
-   count -= bytes_not_copied;
cmd_buf[count] = '\0';
 
cmd_buf_tmp = strchr(cmd_buf, '\n');
@@ -2034,8 +2032,8 @@ static ssize_t i40e_dbg_netdev_ops_read(struct file 
*filp, char __user *buffer,
bytes_not_copied = copy_to_user(buffer, buf, len);
kfree(buf);
 
-   if (bytes_not_copied < 0)
-   return bytes_not_copied;
+   if (bytes_not_copied)
+   return -EFAULT;
 
*ppos = len;
return len;
@@ -2068,10 +2066,8 @@ static ssize_t i40e_dbg_netdev_ops_write(struct file 
*filp,
memset(i40e_dbg_netdev_ops_buf, 0, sizeof(i40e_dbg_netdev_ops_buf));
bytes_not_copied = copy_from_user(i40e_dbg_netdev_ops_buf,
  buffer, count);
-   if (bytes_not_copied < 0)
-   return bytes_not_copied;
-   else if (bytes_not_copied > 0)
-   count -= bytes_not_copied;
+   if (bytes_not_copied)
+   return -EFAULT;
i40e_dbg_netdev_ops_buf[count] = '\0';
 
buf_tmp = strchr(i40e_dbg_netdev_ops_buf, '\n');
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 13/15] ixgbevf: use ether_addr_copy instead of memcpy

2015-11-23 Thread Jeff Kirsher
From: Emil Tantilov 

replace some instances of memcpy for setting up the mac address with
ether_addr_copy()

Signed-off-by: Emil Tantilov 
Tested-by: Darin Miller 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 1b15f95..47c71e1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2260,10 +2260,8 @@ void ixgbevf_reset(struct ixgbevf_adapter *adapter)
}
 
if (is_valid_ether_addr(adapter->hw.mac.addr)) {
-   memcpy(netdev->dev_addr, adapter->hw.mac.addr,
-  netdev->addr_len);
-   memcpy(netdev->perm_addr, adapter->hw.mac.addr,
-  netdev->addr_len);
+   ether_addr_copy(netdev->dev_addr, adapter->hw.mac.addr);
+   ether_addr_copy(netdev->perm_addr, adapter->hw.mac.addr);
}
 
adapter->last_reset = jiffies;
@@ -2659,13 +2657,13 @@ static int ixgbevf_sw_init(struct ixgbevf_adapter 
*adapter)
else if (is_zero_ether_addr(adapter->hw.mac.addr))
dev_info(>dev,
 "MAC address not assigned by 
administrator.\n");
-   memcpy(netdev->dev_addr, hw->mac.addr, netdev->addr_len);
+   ether_addr_copy(netdev->dev_addr, hw->mac.addr);
}
 
if (!is_valid_ether_addr(netdev->dev_addr)) {
dev_info(>dev, "Assigning random MAC address\n");
eth_hw_addr_random(netdev);
-   memcpy(hw->mac.addr, netdev->dev_addr, netdev->addr_len);
+   ether_addr_copy(hw->mac.addr, netdev->dev_addr);
}
 
/* Enable dynamic interrupt throttling rates */
@@ -3695,8 +3693,8 @@ static int ixgbevf_set_mac(struct net_device *netdev, 
void *p)
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
 
-   memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
-   memcpy(hw->mac.addr, addr->sa_data, netdev->addr_len);
+   ether_addr_copy(netdev->dev_addr, addr->sa_data);
+   ether_addr_copy(hw->mac.addr, addr->sa_data);
 
spin_lock_bh(>mbx_lock);
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: add show_fdinfo handler for maps

2015-11-23 Thread John Fastabend
On 15-11-23 11:12 AM, Hannes Frederic Sowa wrote:
> On Mon, Nov 23, 2015, at 20:09, John Fastabend wrote:
>> On 15-11-23 10:03 AM, Alexei Starovoitov wrote:
>>> On Mon, Nov 23, 2015 at 05:11:58PM +0100, Hannes Frederic Sowa wrote:

 Actually, that is the reason why I mentioned it, so *the admin* can see
 something is going on. Do you want to protect ebpf from root? Skynet? ;)
>>>
>>> correct. To me both root and non-root are users in the first place and
>>> they both shouldn't be allowed to misuse it.
>>>
 In my opinion the kernel never should hide any information of the admin
 if they are accessible easily. Sampling the number of failed updates to
 a map or printing it via procfs/ebpffs seems to be just a matter of how
 difficult it should be done. The map has a lock, so the number is fairly
>>>
>>> map_lookup is actually lockless. It's a critical path and should be
>>> as fast as possible. No extra stats just for debugging.
>>>
 accurate. Sampling and plotting size of hash maps without having kprobes
 installed would be a nice thing, because it reduces complexity and this
 is nice to have.
>>>
>>> doing 'cat' from procfs is, of course, easier to use, but it's an extra
>>> code that permenanetly lives in memory, whereas kprobe+bpf is a run-time
>>> debugging.
>>
>> Hopefully not jumping in off-base here (I've read most the thread), but
>> what I've been doing is loading programs with debug ebpf code in them
>> to keep a statistics map(s) and then I read that from  userspace for
>> stats. It works pretty well and lets me compile out the debug code when
>> I want and also doesn't need kprobe at all. Also I can implement
>> sampling so that the debug code only runs every .01% or something like
>> that so it can be used in "real" systems. My "real" systems are just a
>> couple node test setup but it seems to be ok ;)
> 
> Ok, I am fine to wait until there is user demand.
> 
> Anyway, all you refer to is code you have under your control. I am
> worried about bpf code that is not under my control.
> 

Right, I've not gotten this far. To date everything I've been looking at
is owned by the admin. So probably some more use cases there I haven't
looked at.

> Thanks,
> Hannes
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 10/18] net/mlx5e: Write vlan list into vport context

2015-11-23 Thread Saeed Mahameed
On Mon, Nov 23, 2015 at 7:30 PM, Alexander Duyck
 wrote:
> On 11/23/2015 03:11 AM, Or Gerlitz wrote:
>>
>> From: Saeed Mahameed 
>>
>> Each Vport/vNIC must notify underlying e-Switch layer
>> for vlan table changes in-order to update SR-IOV FDB tables.
>>
>> We do that at vlan_rx_add_vid and vlan_rx_kill_vid ndos.
>>
>> Signed-off-by: Saeed Mahameed 
>> Signed-off-by: Or Gerlitz 
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/en.h   |  1 +
>>   .../ethernet/mellanox/mlx5/core/en_flow_table.c| 49
>> ++
>>   2 files changed, 50 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> index 69f1c1a..89313d4 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> @@ -465,6 +465,7 @@ enum {
>>   };
>>
>>   struct mlx5e_vlan_db {
>> +   unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
>> u32   active_vlans_ft_ix[VLAN_N_VID];
>> u32   untagged_rule_ft_ix;
>> u32   any_vlan_rule_ft_ix;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> index 9a021be..3c0cf22 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> @@ -502,6 +502,46 @@ add_eth_addr_rule_out:
>> return err;
>>   }
>>
>> +static int mlx5e_vport_context_update_vlans(struct mlx5e_priv *priv)
>> +{
>> +   struct net_device *ndev = priv->netdev;
>> +   int max_list_size;
>> +   int list_size;
>> +   u16 *vlans;
>> +   int vlan;
>> +   int err;
>> +   int i;
>> +
>> +   list_size = 0;
>> +   for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
>> +   list_size++;
>> +
>> +   max_list_size = 1 << MLX5_CAP_GEN(priv->mdev, log_max_vlan_list);
>> +
>> +   if (list_size > max_list_size) {
>> +   netdev_warn(ndev,
>> +   "netdev vlans list size (%d) > (%d) max vport
>> list size, some vlans will be dropped\n",
>> +   list_size, max_list_size);
>> +   list_size = max_list_size;
>> +   }
>> +
>> +   vlans = kcalloc(list_size, sizeof(*vlans), GFP_KERNEL);
>> +   if (!vlans)
>> +   return -ENOMEM;
>> +
>> +   i = 0;
>> +   for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
>> +   vlans[i++] = vlan;
>> +
>
>
> You capped the allocation at max_list_size above, but you are technically
> populating up to the original value of list_size here.  I believe that opens
> you up to a buffer overrun.  You probably need to add a check for i >=
> list_size and exit the loop if true.
>
True, Will fix this, thanks for noticing.

>
>> +   err = mlx5_modify_nic_vport_vlans(priv->mdev, vlans, list_size);
>> +   if (err)
>> +   netdev_err(ndev, "Failed to modify vport vlans list
>> err(%d)\n",
>> +  err);
>> +
>> +   kfree(vlans);
>> +   return err;
>> +}
>> +
>>   enum mlx5e_vlan_rule_type {
>> MLX5E_VLAN_RULE_TYPE_UNTAGGED,
>> MLX5E_VLAN_RULE_TYPE_ANY_VID,
>> @@ -552,6 +592,10 @@ static int mlx5e_add_vlan_rule(struct mlx5e_priv
>> *priv,
>>  1);
>> break;
>> default: /* MLX5E_VLAN_RULE_TYPE_MATCH_VID */
>> +   err = mlx5e_vport_context_update_vlans(priv);
>> +   if (err)
>> +   goto add_vlan_rule_out;
>> +
>> ft_ix = >vlan.active_vlans_ft_ix[vid];
>> MLX5_SET(fte_match_param, match_value,
>> outer_headers.vlan_tag,
>>  1);
>> @@ -588,6 +632,7 @@ static void mlx5e_del_vlan_rule(struct mlx5e_priv
>> *priv,
>> case MLX5E_VLAN_RULE_TYPE_MATCH_VID:
>> mlx5_del_flow_table_entry(priv->ft.vlan,
>>
>> priv->vlan.active_vlans_ft_ix[vid]);
>> +   mlx5e_vport_context_update_vlans(priv);
>> break;
>> }
>>   }
>> @@ -619,6 +664,8 @@ int mlx5e_vlan_rx_add_vid(struct net_device *dev,
>> __always_unused __be16 proto,
>>   {
>> struct mlx5e_priv *priv = netdev_priv(dev);
>>
>> +   set_bit(vid, priv->vlan.active_vlans);
>> +
>> return mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID,
>> vid);
>>   }
>>
>> @@ -627,6 +674,8 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev,
>> __always_unused __be16 proto,
>>   {
>> struct mlx5e_priv *priv = netdev_priv(dev);
>>
>> +   clear_bit(vid, priv->vlan.active_vlans);
>> +
>> mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
>>
>> return 0;
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to 

Re: [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller

2015-11-23 Thread Johannes Weiner
On Mon, Nov 23, 2015 at 01:00:59PM +0300, Vladimir Davydov wrote:
> I've another question regarding this socket_work: its reclaim target
> always equals CHARGE_BATCH. Can't it result in a workload exceeding
> memory.high in case there are a lot of allocations coming from different
> cpus? In this case the work might not manage to complete before another
> allocation happens. May be, we should accumulate the number of pages to
> be reclaimed by the work, as we do in try_charge?

Actually, try_to_free_mem_cgroup_pages() rounds it up to 2MB anyway. I
would hate to add locking or more atomics to accumulate a reclaim goal
for the worker on spec, so let's wait to see if this is a real issue.

> > > BTW why do we need this work at all? Why is reclaim_high called from
> > > task_work not enough?
> > 
> > The problem lies in the memcg association: the random task that gets
> > interrupted by an arriving packet might not be in the same memcg as
> > the one owning receiving socket. And multiple interrupts could happen
> > while we're in the kernel already charging pages. We'd basically have
> > to maintain a list of memcgs that need to run reclaim_high associated
> > with current.
> > 
> 
> Right, I think this is worth placing in a comment to memcg->socket_work.

Okay, will do.

> I wonder if we could use it *instead* of task_work for handling every
> allocation, not only socket-related. Would it make any sense? May be, it
> could reduce the latency experienced by tasks in memory cgroups.

No, we *want* charging tasks to do reclaim work once memory.high is
breached, in order to match their speed to memory availability. That
needs to remain synchroneous.

What we could try is make memcg->socket_work purely about the receive
side when we're inside the softirq, and arm the per-task work when in
process context on the sending side. I'll look into that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 00/15][pull request] Intel Wired LAN Driver Updates 2015-11-23

2015-11-23 Thread Jeff Kirsher
This series contains updates to ixgbe, ixgbevf, fm10k, i40e and i40evf.

Jacob fixes an issue where VF could attempt to read queues it does not own,
so prevent this we check queue 0 before we continue.

Matthew fixes the MTU for jumbo frames for fm10k.

Julia Lawall cleans up a unneeded NULL test in ixgbe.

Mark cleans up a redundant header inclusion.  Adds KR mode support for
CS4227 chip.  Cleaned up diagnostic code, which is no longer needed, for
the CS4227 chip.

Jean Sacren fixes kernel documentation for ixgbe.

Alex Duyck fixes an fm10k and ixgbe issue in which the polling routine would
increase the budget for receive to at least 1 per queue if multiple queues were
present.  This would result in receive packets being processed when the budget
was 0 which is meant to indicate that no receive can be handled.  Also fixes
an ixgbevf performance issue where netperf test will starve for memory in the
time form one transmit interrupt to the next, so limit lowest interrupt rate
for adaptive interrupt moderation to 12K.  Fixed up ixgbe and ixgbevf to
use napi_schedule_irqoff() where the drivers were run from hard interrupt
context or with interrupts already disabled in netpoll.

Jesse fixes a compiler warning about an unused variable for i40evf.

John Greene fixes an issue with ixgbevf, where if the VF driver is loaded
while the corresponding PH interface is down, the driver assigns a random
MAC address, can be overwritten with the value of hw->mac.perm_addr which
is 0 at that point.  So avoid this case by initializing hw->mac.perm_addr
to the randomly generated address and do not set it unless we receive an
ACK from ixgbe.

Rasmus Villemoes cleans up some confusing code in i40e debugfs code.

The following are changes since commit 3d40e44361eab3dd6c969241d12dac7466eb7174:
  Merge branch 'dsa-gpio-reset'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue master

Alexander Duyck (4):
  fm10k: Fix handling of NAPI budget when multiple queues are enabled
per vector
  ixgbe: Fix handling of NAPI budget when multiple queues are enabled
per vector
  ixgbevf: Limit lowest interrupt rate for adaptive interrupt moderation
to 12K
  ixgbe/ixgbevf: use napi_schedule_irqoff()

Emil Tantilov (2):
  ixgbevf: use ether_addr_copy instead of memcpy
  ixgbevf: fix spoofed packets with random MAC

Jacob Keller (2):
  fm10k: do not assume VF always has 1 queue
  fm10k: Correct MTU for jumbo frames

Jean Sacren (1):
  ixgbe: fix multiple kernel-doc errors

Jesse Brandeburg (1):
  i40evf: fix compiler warning of unused variable

Julia Lawall (1):
  ixgbe: drop null test before destroy functions

Mark Rustad (3):
  ixgbe: Delete redundant include file
  ixgbe: Add KR mode support for CS4227 chip
  ixgbe: Remove CS4227 diagnostic code

Rasmus Villemoes (1):
  intel: i40e: fix confused code

 drivers/net/ethernet/intel/fm10k/fm10k.h  |   2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |   4 +
 drivers/net/ethernet/intel/fm10k/fm10k_type.h |   1 +
 drivers/net/ethernet/intel/fm10k/fm10k_vf.c   |   7 +-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c|  24 ++--
 drivers/net/ethernet/intel/i40evf/i40evf_main.c   |   7 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  19 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 165 --
 drivers/net/ethernet/intel/ixgbevf/ethtool.c  |   2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |   3 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  23 ++-
 drivers/net/ethernet/intel/ixgbevf/vf.c   |   4 +-
 13 files changed, 111 insertions(+), 153 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)

2015-11-23 Thread David Miller
From: Tom Herbert 
Date: Mon, 23 Nov 2015 09:33:44 -0800

> The TCP PSH flag is not defined for message delineation (neither is
> urgent pointer). We can't change that (many people have tried to add
> message semantics to TCP protocol but have always failed miserably).

Agreed.

My only gripe with kcm right now is a lack of a native sendpage.
We should be able to zero copy data through KCM streams without
any problems whatsoever.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: add show_fdinfo handler for maps

2015-11-23 Thread Alexei Starovoitov
On Mon, Nov 23, 2015 at 05:11:58PM +0100, Hannes Frederic Sowa wrote:
> 
> Actually, that is the reason why I mentioned it, so *the admin* can see
> something is going on. Do you want to protect ebpf from root? Skynet? ;)

correct. To me both root and non-root are users in the first place and
they both shouldn't be allowed to misuse it.

> In my opinion the kernel never should hide any information of the admin
> if they are accessible easily. Sampling the number of failed updates to
> a map or printing it via procfs/ebpffs seems to be just a matter of how
> difficult it should be done. The map has a lock, so the number is fairly

map_lookup is actually lockless. It's a critical path and should be
as fast as possible. No extra stats just for debugging.

> accurate. Sampling and plotting size of hash maps without having kprobes
> installed would be a nice thing, because it reduces complexity and this
> is nice to have.

doing 'cat' from procfs is, of course, easier to use, but it's an extra
code that permenanetly lives in memory, whereas kprobe+bpf is a run-time
debugging.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter

2015-11-23 Thread Johannes Weiner
On Mon, Nov 23, 2015 at 12:36:46PM +0300, Vladimir Davydov wrote:
> On Fri, Nov 20, 2015 at 01:56:48PM -0500, Johannes Weiner wrote:
> > I actually had all this at first, but then wondered if it makes more
> > sense to keep the legacy code in isolation. Don't you think it would
> > be easier to keep track of what's v1 and what's v2 if we keep the
> > legacy stuff physically separate as much as possible? In particular I
> > found that 'tcp_mem.' marker really useful while working on the code.
> > 
> > In the same vein, tcp_memcontrol.c doesn't really hurt anybody and I'd
> > expect it to remain mostly unopened and unchanged in the future. But
> > if we merge it into memcontrol.c, that code will likely be in the way
> > and we'd have to make it explicit somehow that this is not actually
> > part of the new memory controller anymore.
> > 
> > What do you think?
> 
> There isn't much code left in tcp_memcontrol.c, and not all of it is
> legacy. We still want to call tcp_init_cgroup and tcp_destroy_cgroup
> from memcontrol.c - in fact, it's the only call site, so I think we'd
> better keep these functions there. Apart from init/destroy, there is
> only stuff for handling legacy files, which is relatively small and
> isolated. We can just put it along with memsw and kmem legacy files in
> the end of memcontrol.c adding a comment that it's legacy. Personally,
> I'd find the code easier to follow then, because currently the logic
> behind the ACTIVE flag as well as memcg->tcp_mem init/use/destroy turns
> out to be scattered between two files in different subsystems for no
> apparent reason now, as it does not need tcp_prot any more. Besides,
> this would allow us to accurately reuse the ACTIVE flag in init/destroy
> for inc/dec static branch and probably in sock_update_memcg instead of
> sprinkling cgroup_subsys_on_dfl all over the place, which would make the
> code a bit cleaner IMO (in fact, that's why I proposed to drop ACTIVATED
> bit and replace cg_proto->flags with ->active bool).

As far as I can see, all of tcp_memcontrol.c is legacy, including the
init and destroy functions. We only call them to set up the legacy
tcp_mem state and do legacy jump-label maintenance. Delete it all and
the unified hierarchy controller would still work. So I don't really
see the benefits of consolidating it, and more risk of convoluting.

That being said, if you care strongly about it and see opportunities
to cut down code and make things more readable, please feel free to
turn the flags -> bool patch into a followup series and I'll be happy
to review it.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] net: l3mdev: Add master device lookup by index

2015-11-23 Thread David Ahern

On 11/22/15 9:35 PM, David Miller wrote:

From: David Ahern 
Date: Sun, 22 Nov 2015 21:02:04 -0700


I am confused by that response given that sk_bound_dev_if is one of
the key principals for the VRF implementation. Applications wanting to
communicate over interfaces in a VRF have to set sk_bound_dev_if.


Yes, they have to set it explicitly.

You are setting it for them in response to the connection
creation, and that's what I object to.



The intent is to not require having N-listen sockets/threads/tasks to 
support N-vrfs for scalability reasons. Having a special DEVICE_ANY 
index adds complexity to socket lookups, so I dropped that idea long 
ago. Would guarding this behavior by a sysctl be acceptable?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] net/hsr: fix a warning message

2015-11-23 Thread David Miller
From: Dan Carpenter 
Date: Sat, 21 Nov 2015 13:34:12 +0300

> WARN_ON_ONCE() takes a condition, it doesn't take an error message.  I
> have converted this to WARN() instead.
> 
> Signed-off-by: Dan Carpenter 

Applied, thanks Dan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 09/18] net/mlx5e: Write UC/MC list and promisc mode into vport context

2015-11-23 Thread Saeed Mahameed
On Mon, Nov 23, 2015 at 7:23 PM, Alexander Duyck
 wrote:
> On 11/23/2015 03:11 AM, Or Gerlitz wrote:
>>
>> From: Saeed Mahameed 
>>
>> Each Vport/vNIC must notify underlying e-Switch layer
>> for UC/MC list and promisc mode updates, in-order to update
>> l2 tables and SR-IOV FDB tables.
>>
>> We do that at set_rx_mode ndo.
>>
>> preperation for ethernet-SRIOV and l2 table management.
>>
>> Signed-off-by: Saeed Mahameed 
>> Signed-off-by: Or Gerlitz 
>> ---
>>   .../ethernet/mellanox/mlx5/core/en_flow_table.c| 99
>> ++
>>   1 file changed, 99 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> index 22d603f..9a021be 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
>> @@ -671,6 +671,103 @@ static void mlx5e_sync_netdev_addr(struct mlx5e_priv
>> *priv)
>> netif_addr_unlock_bh(netdev);
>>   }
>>
>> +/* Returns a pointer to an array of type u8[][ETH_ALEN] */
>> +static u8 (*mlx5e_build_addr_array(struct mlx5e_priv *priv, int
>> list_type,
>> +  int *size))[ETH_ALEN]
>
>
> This is just ugly.  Isn't there a way you can just return a u8 pointer and
> assume the ETH_ALEN stride?  If nothing else it seems like it would be
> better to just create a structure or typedef containing the u8 array and to
> return a pointer to that since all the ETH_ALEN here really represents is a
> stride within your array.
>
I thought twice before writing the code this way, Indeed it looks ugly
although it is a standard C syntax, it might be cleaner to just have a
typedef or a structure.
Will try your suggestion.

>
>> +{
>> +   bool is_uc = (list_type == MLX5_NVPRT_LIST_TYPE_UC);
>> +   struct net_device *ndev = priv->netdev;
>> +   struct mlx5e_eth_addr_hash_node *hn;
>> +   struct hlist_head *addr_list;
>> +   u8 (*addr_array)[ETH_ALEN];
>> +   struct hlist_node *tmp;
>> +   int max_list_size;
>> +   int list_size;
>> +   int hi;
>> +   int i;
>> +
>> +   list_size = is_uc ? 0 : (priv->eth_addr.broadcast_enabled ? 1 :
>> 0);
>> +   max_list_size = is_uc ?
>> +   1 << MLX5_CAP_GEN(priv->mdev, log_max_current_uc_list) :
>> +   1 << MLX5_CAP_GEN(priv->mdev, log_max_current_mc_list);
>> +
>> +   addr_list = is_uc ? priv->eth_addr.netdev_uc :
>> priv->eth_addr.netdev_mc;
>> +   mlx5e_for_each_hash_node(hn, tmp, addr_list, hi)
>> +   list_size++;
>> +
>> +   if (list_size > max_list_size) {
>> +   netdev_warn(ndev,
>> +   "netdev %s list size (%d) > (%d) max vport
>> list size, some addresses will be dropped\n",
>> +   is_uc ? "UC" : "MC", list_size,
>> max_list_size);
>> +   list_size = max_list_size;
>> +   }
>> +
>> +   addr_array = kcalloc(list_size, ETH_ALEN, GFP_KERNEL);
>> +   if (!addr_array)
>> +   return NULL;
>> +
>> +   i = 0;
>> +   if (is_uc) { /* Make sure our own address is pushed first */
>> +   mlx5e_for_each_hash_node(hn, tmp, addr_list, hi) {
>> +   if (ether_addr_equal(ndev->dev_addr, hn->ai.addr))
>> {
>> +   ether_addr_copy(addr_array[i++],
>> ndev->dev_addr);
>> +   break;
>> +   }
>> +   }
>> +   }
>> +
>
>
> What is the point of this loop?  Is there a chance that the device address
> isn't going to be in the list somewhere?  Otherwise it seems like you could
> just follow the pattern you did for the broadcast address and just copy the
> dev_addr directly instead of crawling through the loop.
>
The main Idea of traversing in this loop is to handle the case where the
device uc list is sent empty, in this case I don't need any kind of
special logic to know whether I need to push the dev_addr directly or
not at all.

Regarding your question whether the device address going to be in the list,
the answer is yes and it is always there when the device is up, we do push
it ourselves in mlx5e_sync_netdev_addr.

for the broadcast address the pattern already existed before this patch,
and the broadcast address is not part if the netdev mc_list, so this is just
the way to handle it.

Anyway, I tend to agree with you, the loop looks redundant.
I will do some thinking offline and will come up with a better approach.

Thanks.

>
>> +   if (!is_uc && priv->eth_addr.broadcast_enabled)
>> +   ether_addr_copy(addr_array[i++], ndev->broadcast);
>> +
>> +   mlx5e_for_each_hash_node(hn, tmp, addr_list, hi) {
>> +   if (ether_addr_equal(ndev->dev_addr, hn->ai.addr))
>> +   continue;
>> +   if (i >= list_size)
>> + 

Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup

2015-11-23 Thread Tejun Heo
Hello,

On Mon, Nov 23, 2015 at 10:53:46AM -0500, Tejun Heo wrote:
> > [   11.594536] [ cut here ]
> > [   11.595274] WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 
> > pids_cancel.constprop.6+0x31/0x40()
> > [   11.595958] Modules linked in:
> > [   11.596199] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #196
> > [   11.596689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> > [   11.597632]  81f66d8b 88007c04bb90 8155ccdc 
> > 
> > [   11.598234]  88007c04bbc8 810de202 8800793dda00 
> > 88007a096800
> > [   11.598877]  88007c04bc80 88007a6b6200 0001 
> > 88007c04bbd8
> > [   11.599547] Call Trace:
> > [   11.599784]  [] dump_stack+0x4e/0x82
> > [   11.600197]  [] warn_slowpath_common+0x82/0xc0
> > [   11.600705]  [] warn_slowpath_null+0x1a/0x20
> > [   11.601208]  [] pids_cancel.constprop.6+0x31/0x40
> > [   11.601764]  [] pids_can_attach+0x6d/0xf0
> 
> Yeah, this is a known problem regarding css's lifetime.  Working on
> it.  The earlier dump, I think, is likely to have been caused by the
> same issue.

Just posted the fix for this issue.  Can you please verify the fix?

 http://lkml.kernel.org/g/20151123195541.ga19...@mtj.duckdns.org

Thanks a lot!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)

2015-11-23 Thread Tom Herbert
On Mon, Nov 23, 2015 at 11:54 AM, David Miller  wrote:
> From: Tom Herbert 
> Date: Mon, 23 Nov 2015 09:33:44 -0800
>
>> The TCP PSH flag is not defined for message delineation (neither is
>> urgent pointer). We can't change that (many people have tried to add
>> message semantics to TCP protocol but have always failed miserably).
>
> Agreed.
>
> My only gripe with kcm right now is a lack of a native sendpage.
> We should be able to zero copy data through KCM streams without
> any problems whatsoever.

Right, there is no reason zero copy won't work here. I was just trying
minimize the initial implementation small for reviewability.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] drivers: net: xgene: fix: ifconfig up/down crash

2015-11-23 Thread Iyappan Subramanian
Fixing kernel crash when doing ifconfig down and up in a loop,

[ 124.028237] Call trace:
[ 124.030670] [] memcpy+0x20/0x180
[ 124.035436] [] skb_clone+0x3c/0xa8
[ 124.040374] [] __skb_tstamp_tx+0xc0/0x118
[ 124.045918] [] skb_tstamp_tx+0x10/0x1c
[ 124.051203] [] xgene_enet_start_xmit+0x2e4/0x33c
[ 124.057352] [] dev_hard_start_xmit+0x2e8/0x400
[ 124.063327] [] sch_direct_xmit+0x90/0x1d4
[ 124.068870] [] __dev_queue_xmit+0x28c/0x498
[ 124.074585] [] dev_queue_xmit_sk+0x10/0x1c
[ 124.080216] [] ip_finish_output2+0x3d0/0x438
[ 124.086017] [] ip_finish_output+0x198/0x1ac
[ 124.091732] [] ip_output+0xec/0x164
[ 124.096755] [] ip_local_out_sk+0x38/0x48
[ 124.102211] [] ip_queue_xmit+0x288/0x330
[ 124.107668] [] tcp_transmit_skb+0x908/0x964
[ 124.113383] [] tcp_send_ack+0x128/0x138
[ 124.118753] [] __tcp_ack_snd_check+0x5c/0x94
[ 124.124555] [] tcp_rcv_established+0x554/0x68c
[ 124.130530] [] tcp_v4_do_rcv+0xa4/0x37c
[ 124.135900] [] release_sock+0xb4/0x150
[ 124.141184] [] tcp_recvmsg+0x448/0x9e0
[ 124.146468] [] inet_recvmsg+0xa0/0xc0
[ 124.151666] [] sock_recvmsg+0x10/0x1c
[ 124.156863] [] SyS_recvfrom+0xa4/0xf8
[ 124.162061] Code: f2400c84 540001c0 cb040042 3664 (38401423)
[ 124.168133] ---[ end trace 7ab2550372e8a65b ]---

The fix was to reorder napi_enable, napi_disable, request_irq and
free_irq calls, move register_netdev after dma_coerce_mask_and_coherent.

Signed-off-by: Iyappan Subramanian 
Tested-by: Khuong Dinh 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 29 +---
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 991412c..1adfe70 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -688,10 +688,10 @@ static int xgene_enet_open(struct net_device *ndev)
mac_ops->tx_enable(pdata);
mac_ops->rx_enable(pdata);
 
+   xgene_enet_napi_enable(pdata);
ret = xgene_enet_register_irq(ndev);
if (ret)
return ret;
-   xgene_enet_napi_enable(pdata);
 
if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII)
phy_start(pdata->phy_dev);
@@ -715,13 +715,13 @@ static int xgene_enet_close(struct net_device *ndev)
else
cancel_delayed_work_sync(>link_work);
 
-   xgene_enet_napi_disable(pdata);
-   xgene_enet_free_irq(ndev);
-   xgene_enet_process_ring(pdata->rx_ring, -1);
-
mac_ops->tx_disable(pdata);
mac_ops->rx_disable(pdata);
 
+   xgene_enet_free_irq(ndev);
+   xgene_enet_napi_disable(pdata);
+   xgene_enet_process_ring(pdata->rx_ring, -1);
+
return 0;
 }
 
@@ -1474,15 +1474,15 @@ static int xgene_enet_probe(struct platform_device 
*pdev)
}
ndev->hw_features = ndev->features;
 
-   ret = register_netdev(ndev);
+   ret = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(64));
if (ret) {
-   netdev_err(ndev, "Failed to register netdev\n");
+   netdev_err(ndev, "No usable DMA configuration\n");
goto err;
}
 
-   ret = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(64));
+   ret = register_netdev(ndev);
if (ret) {
-   netdev_err(ndev, "No usable DMA configuration\n");
+   netdev_err(ndev, "Failed to register netdev\n");
goto err;
}
 
@@ -1490,14 +1490,17 @@ static int xgene_enet_probe(struct platform_device 
*pdev)
if (ret)
goto err;
 
-   xgene_enet_napi_add(pdata);
mac_ops = pdata->mac_ops;
-   if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) {
ret = xgene_enet_mdio_config(pdata);
-   else
+   if (ret)
+   goto err;
+   } else {
INIT_DELAYED_WORK(>link_work, mac_ops->link_state);
+   }
 
-   return ret;
+   xgene_enet_napi_add(pdata);
+   return 0;
 err:
unregister_netdev(ndev);
free_netdev(ndev);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 net-next 2/2] tcp: Add Redundant Data Bundling (RDB)

2015-11-23 Thread Bendik Rønning Opstad
On 23/11/15 18:43, Eric Dumazet wrote:
> On Mon, 2015-11-23 at 17:26 +0100, Bendik Rønning Opstad wrote:
> 
>> > +
>> > +tcp_rdb_max_skbs - INTEGER
>> > +  Enable restriction on how many previous SKBs in the output queue
>> > +  RDB may include data from. A value of 1 will restrict bundling to
>> > +  only the data from the last packet that was sent.
>> > +  Default: 1
>> > +
> skb is an internal thing. I would rather not expose a sysctl with such
> name.
> 
> Can be multi segment or not (if GSO/TSO is enabled)
> 
> So even '1' skb can have very different content, from 1 byte to ~64 KB

I see your point about not exposing the internal naming. What about
tcp_rdb_max_packets?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 0/9] net: ipmr: cleanups and minor improvements

2015-11-23 Thread David Miller
From: Nikolay Aleksandrov 
Date: Sat, 21 Nov 2015 15:57:23 +0100

> Since I'll have to work with ipmr, I decided to clean it up and do some
> minor improvements. Functionally there're almost no changes except the
> SLAB_PANIC removal. Most of the patches just re-design some functions to
> be clearer and more concise and try to remove the ifdef web that was
> inside. There's more information in each commit. This is the first set,
> the end goal is to introduce complete netlink support and control over
> the mfc and vif devices.
> I've tried to test all of the setsockopt/getsockopt options, and also
> made builds with various ipmr kconfig options turned on and off.
> 
> v2: change patch 7 to keep SLAB_PANIC and just drop the unnecessary null
> check

This series is pretty much fine, applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next:master 48/50] net/dsa/dsa.c:783:16: error: implicit declaration of function 'gpio_to_desc'

2015-11-23 Thread David Miller
From: kbuild test robot 
Date: Tue, 24 Nov 2015 01:34:51 +0800

> All error/warnings (new ones prefixed by >>):
> 
>net/dsa/dsa.c: In function 'dsa_of_probe':
>>> net/dsa/dsa.c:783:16: error: implicit declaration of function 
>>> 'gpio_to_desc' [-Werror=implicit-function-declaration]
>cd->reset = gpio_to_desc(gpio);
>^
>>> net/dsa/dsa.c:783:14: warning: assignment makes pointer from integer 
>>> without a cast [-Wint-conversion]
>cd->reset = gpio_to_desc(gpio);
>  ^
>>> net/dsa/dsa.c:784:4: error: implicit declaration of function 
>>> 'gpiod_direction_output' [-Werror=implicit-function-declaration]
>gpiod_direction_output(cd->reset, 0);
>^
>cc1: some warnings being treated as errors

It looks like these specific gpio interfaces are not designed such
that we get default do-nothing versions when CONFIG_GPIO is not set.

Andrew, you'll have to cope with this somehow.  Perhaps add the
missing cases to include/asm-generic/gpio.h CPP #else branch.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stmmac: debugfs broken with multiple ethernets.

2015-11-23 Thread Pavel Machek
Hi!

> stmmac_main will happily try to create two directories with the same
> name.
> 
> I guess something like
> 
> static int id;
> char name[100];
> 
> sprintf(name, STMMAC_RESOURCE_NAME "_%d", id++)
> ...
> 
> might be suitable, but did not try that further.

Hmm. It seems this is fixed in v4.1. Sorry for the noise.

Best regards,


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup

2015-11-23 Thread Daniel Wagner
On 11/23/2015 08:11 AM, Daniel Wagner wrote:
> [3.217648] systemd[1]: tmp.mount: Directory /tmp to mount over is not 
> empty, mounting anyway.
> [3.224665] BUG: spinlock bad magic on CPU#1, systemd/1
> [3.225653]  lock: cgroup_sk_update_lock+0x0/0x60, .magic: , 
> .owner: systemd/1, .owner_cpu: 1
> [3.227034] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #195
> [3.227862] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> [3.228906]  834a2160 88007c043ad0 81551edc 
> 88007c028000
> [3.229512]  88007c043af0 81136868 834a2160 
> 88007aff5940
> [3.230105]  88007c043b08 81136b05 834a2160 
> 88007c043b20
> [3.230716] Call Trace:
> [3.230906]  [] dump_stack+0x4e/0x82
> [3.231289]  [] spin_dump+0x78/0xc0
> [3.231642]  [] do_raw_spin_unlock+0x75/0xd0
> [3.232039]  [] _raw_spin_unlock+0x27/0x50
> [3.232431]  [] update_classid_sock+0x68/0x80
> [3.232836]  [] iterate_fd+0x71/0x150
> [3.233197]  [] update_classid+0x47/0x80
> [3.233571]  [] cgrp_attach+0x14/0x20
> [3.233929]  [] cgroup_taskset_migrate+0x1e1/0x330
> [3.234366]  [] cgroup_migrate+0xf5/0x190
> [3.234747]  [] ? cgroup_migrate+0x5/0x190
> [3.235130]  [] cgroup_attach_task+0x176/0x200
> [3.235543]  [] ? cgroup_attach_task+0x5/0x200
> [3.235953]  [] __cgroup_procs_write+0x2ad/0x460
> [3.236377]  [] ? __cgroup_procs_write+0x5e/0x460
> [3.236805]  [] cgroup_procs_write+0x14/0x20
> [3.237205]  [] cgroup_file_write+0x35/0x1c0
> [3.237600]  [] kernfs_fop_write+0x141/0x190
> [3.237998]  [] __vfs_write+0x28/0xe0
> [3.238361]  [] ? percpu_down_read+0x57/0xa0
> [3.238761]  [] ? __sb_start_write+0xb4/0xf0
> [3.239154]  [] ? __sb_start_write+0xb4/0xf0
> [3.239554]  [] vfs_write+0xac/0x1a0
> [3.239930]  [] ? __fget_light+0x66/0x90
> [3.240308]  [] SyS_write+0x49/0xb0
> [3.240656]  [] entry_SYSCALL_64_fastpath+0x12/0x76

I have enabled a few additional cgroup controllers as well, because I was
trying to figure out why I only see the 'memory' cgroup controller in 
cgroup.controllers. pid and io show up but not net_prio or net_cls.
Not sure why systemd (v227) is not mounting them.

Though, after a while a similar call trace is produced. I guess this
has nothing to do with the current changes.

[   11.594536] [ cut here ]
[   11.595274] WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 
pids_cancel.constprop.6+0x31/0x40()
[   11.595958] Modules linked in:
[   11.596199] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #196
[   11.596689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[   11.597632]  81f66d8b 88007c04bb90 8155ccdc 

[   11.598234]  88007c04bbc8 810de202 8800793dda00 
88007a096800
[   11.598877]  88007c04bc80 88007a6b6200 0001 
88007c04bbd8
[   11.599547] Call Trace:
[   11.599784]  [] dump_stack+0x4e/0x82
[   11.600197]  [] warn_slowpath_common+0x82/0xc0
[   11.600705]  [] warn_slowpath_null+0x1a/0x20
[   11.601208]  [] pids_cancel.constprop.6+0x31/0x40
[   11.601764]  [] pids_can_attach+0x6d/0xf0
[   11.602245]  [] cgroup_taskset_migrate+0x6a/0x330
[   11.602795]  [] cgroup_migrate+0xf5/0x190
[   11.603276]  [] ? cgroup_migrate+0x5/0x190
[   11.603788]  [] cgroup_attach_task+0x176/0x200
[   11.604308]  [] ? cgroup_attach_task+0x5/0x200
[   11.604831]  [] __cgroup_procs_write+0x2ad/0x460
[   11.605367]  [] ? __cgroup_procs_write+0x5e/0x460
[   11.605929]  [] cgroup_procs_write+0x14/0x20
[   11.606448]  [] cgroup_file_write+0x35/0x1c0
[   11.606931]  [] kernfs_fop_write+0x141/0x190
[   11.607401]  [] __vfs_write+0x28/0xe0
[   11.607834]  [] ? percpu_down_read+0x57/0xa0
[   11.608366]  [] ? __sb_start_write+0xb4/0xf0
[   11.608874]  [] ? __sb_start_write+0xb4/0xf0
[   11.609343]  [] vfs_write+0xac/0x1a0
[   11.609843]  [] ? __fget_light+0x66/0x90
[   11.610315]  [] SyS_write+0x49/0xb0
[   11.610756]  [] entry_SYSCALL_64_fastpath+0x12/0x76
[   11.611305] ---[ end trace 7f953d0ce5af99ea ]---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull-request: can-next 2015-11-23

2015-11-23 Thread Marc Kleine-Budde
Hello David,

this is a pull request of a single patch for net-next/master.

The patch by Kedareswara rao Appana converts the xilinx CAN driver to
runtime_pm.

Marc

---

The following changes since commit 3f8c0f7efb4fcac11f31afa97584d06118c614bb:

  gianfar: use of_property_read_bool() (2015-11-22 20:47:14 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git 
tags/linux-can-next-for-4.5-20151123

for you to fetch changes up to 4716620d1b6291ce45522d1346c086f76b995d1c:

  can: xilinx: Convert to runtime_pm (2015-11-23 09:51:34 +0100)


linux-can-next-for-4.5-20151123


Kedareswara rao Appana (1):
  can: xilinx: Convert to runtime_pm

 drivers/net/can/xilinx_can.c | 176 +--
 1 file changed, 101 insertions(+), 75 deletions(-)

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


Re: [PATCH net-next 4/6] kcm: Kernel Connection Multiplexor module

2015-11-23 Thread Daniel Borkmann

On 11/20/2015 10:21 PM, Tom Herbert wrote:
[...]

+
+/* Macro to invoke filter function. */
+#define KCM_RUN_FILTER(prog, ctx) \
+   (*prog->bpf_func)(ctx, prog->insnsi)


Any reason to redefine this macro?

We already have the same one as:

#define BPF_PROG_RUN(filter, ctx)  (*filter->bpf_func)(ctx, filter->insnsi)

[...]

+static int kcm_attach_ioctl(struct socket *sock, struct kcm_attach *info)
+{
+   struct socket *csock;
+   struct bpf_prog *prog;
+   int err;
+
+   csock = sockfd_lookup(info->fd, );
+   if (!csock)
+   return -ENOENT;
+
+   prog = bpf_prog_get(info->bpf_fd);
+   if (IS_ERR(prog)) {
+   err = PTR_ERR(prog);
+   goto out;
+   }
+
+   if (prog->type != BPF_PROG_TYPE_SOCKET_FILTER) {
+   bpf_prog_put(prog);


I'd move this and the below bpf_prog_put() under out_put label, too.


+   err = -EINVAL;
+   goto out;
+   }
+
+   err = kcm_attach(sock, csock, prog);
+   if (err) {
+   bpf_prog_put(prog);


^^^


+   goto out;
+   }
+
+   /* Keep reference on file also */
+
+   return 0;
+out:
+   fput(csock->file);
+   return err;
+}

[...]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 2/9] net: ipmr: always define mroute_reg_vif_num

2015-11-23 Thread Nikolay Aleksandrov
On 11/23/2015 06:23 AM, Cong Wang wrote:
> On Sat, Nov 21, 2015 at 6:57 AM, Nikolay Aleksandrov
>  wrote:
>> From: Nikolay Aleksandrov 
>>
>> Before mroute_reg_vif_num was defined only if any of the CONFIG_PIMSM_
>> options were set, but that's not really necessary as the size of the
>> struct is the same in both cases (checked with pahole, both cases size
>> is 3256 bytes) and we can remove some unnecessary ifdefs to simplify the
>> code.
>>
> 
> Not sure if this really simplifies the code, since now
> mroute_reg_vif_num is hidden
> deeper after your patch and there are still some code under CONFIG_IP_PIMSM.
> 
CONFIG_IP_PIMSM is removed in the next patch, and it's not "hidden" anymore
than it was before.

> If you really care about it, how about introducing a helper function
> to set and get
> mrt->mroute_reg_vif_num?
> 
Patches are welcome, if you don't get to it then I will with the next
set.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 4/9] net: ipmr: fix code and comment style

2015-11-23 Thread Nikolay Aleksandrov
On 11/23/2015 06:30 AM, Cong Wang wrote:
> On Sat, Nov 21, 2015 at 6:57 AM, Nikolay Aleksandrov
>  wrote:
>> -
>> -/*
>> - * Setup for IP multicast routing
>> - */
>> +/* Setup for IP multicast routing */
>>  static int __net_init ipmr_net_init(struct net *net)
> 
> Comments like this one are never useful so can be just removed.
> 

Will do, this is not the point of the patch thought. There're also
incorrect comments that need to be changed or removed. All in
due time, this is just a small trivial step to abide the style.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 0/2] Netronome NFP4000/NFP6000 NIC VF driver

2015-11-23 Thread Jakub Kicinski
This patchset adds support for VFs of Netronome's NFP-4000 and NFP-6000
based NICs. We are currently also preparing the submission for the PF
driver, but it is not quite ready yet. The PF driver can be found on
GitHub:

https://github.com/Netronome/nfp-drv-kmods

changes since v2:

Per DaveM comment I've dropped the code which was managing the MSI-X
table from the driver side, but I still had to open code unmasking
since no appropriate core function is exported.

There were quite a few small changes to this series since v2, if anyone
is  interested in the full changelog - previous revision was based on
16ecade76a89 ("nfp_net_main: Correct some minor issues") in the GitHub
repo, this is based on HEAD.


Jakub Kicinski (2):
  pci_ids: add Netronome Systems vendor
  net: add driver for Netronome NFP4000/NFP6000 NIC VFs

 MAINTAINERS|7 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/netronome/Kconfig |   33 +
 drivers/net/ethernet/netronome/Makefile|5 +
 drivers/net/ethernet/netronome/nfp/Makefile|8 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  767 ++
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 2472 
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |  323 +++
 .../net/ethernet/netronome/nfp/nfp_net_debugfs.c   |  235 ++
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  640 +
 .../net/ethernet/netronome/nfp/nfp_netvf_main.c|  385 +++
 include/linux/pci_ids.h|2 +
 13 files changed, 4879 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/Kconfig
 create mode 100644 drivers/net/ethernet/netronome/Makefile
 create mode 100644 drivers/net/ethernet/netronome/nfp/Makefile
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_common.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 1/2] pci_ids: add Netronome Systems vendor

2015-11-23 Thread Jakub Kicinski
Add PCI vendor id for Netronome Systems.

Signed-off-by: Jakub Kicinski 
Signed-off-by: Rolf Neugebauer 
---
 include/linux/pci_ids.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index d9ba49cedc5d..1acbefc4bbda 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2495,6 +2495,8 @@
 #define PCI_DEVICE_ID_KORENIX_JETCARDF20x1700
 #define PCI_DEVICE_ID_KORENIX_JETCARDF30x17ff
 
+#define PCI_VENDOR_ID_NETRONOME0x19ee
+
 #define PCI_VENDOR_ID_QMI  0x1a32
 
 #define PCI_VENDOR_ID_AZWAVE   0x1a3b
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 17/18] net/mlx5: E-Switch, Introduce get vf statistics

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Add support to get VF statistics using query vport
counter command.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 67 +++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  3 +
 2 files changed, 70 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index ea14664..8f428bf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1213,3 +1213,70 @@ int mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
 
return modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set);
 }
+
+int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
+int vport,
+struct ifla_vf_stats *vf_stats)
+{
+   int outlen = MLX5_ST_SZ_BYTES(query_vport_counter_out);
+   u32 in[MLX5_ST_SZ_DW(query_vport_counter_in)];
+   int err = 0;
+   u32 *out;
+
+   if (!ESW_ALLOWED(esw))
+   return -EPERM;
+   if (!LEGAL_VPORT(esw, vport))
+   return -EINVAL;
+
+   out = mlx5_vzalloc(outlen);
+   if (!out)
+   return -ENOMEM;
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(query_vport_counter_in, in, opcode,
+MLX5_CMD_OP_QUERY_VPORT_COUNTER);
+   MLX5_SET(query_vport_counter_in, in, op_mod, 0);
+   MLX5_SET(query_vport_counter_in, in, vport_number, vport);
+   if (vport)
+   MLX5_SET(query_vport_counter_in, in, other_vport, 1);
+
+   memset(out, 0, outlen);
+   err = mlx5_cmd_exec(esw->dev, in, sizeof(in), out, outlen);
+   if (err)
+   goto free_out;
+
+   #define MLX5_GET_CTR(p, x) \
+   MLX5_GET64(query_vport_counter_out, p, x)
+
+   memset(vf_stats, 0, sizeof(*vf_stats));
+   vf_stats->rx_packets =
+   MLX5_GET_CTR(out, received_eth_unicast.packets) +
+   MLX5_GET_CTR(out, received_eth_multicast.packets) +
+   MLX5_GET_CTR(out, received_eth_broadcast.packets);
+
+   vf_stats->rx_bytes =
+   MLX5_GET_CTR(out, received_eth_unicast.octets) +
+   MLX5_GET_CTR(out, received_eth_multicast.octets) +
+   MLX5_GET_CTR(out, received_eth_broadcast.octets);
+
+   vf_stats->tx_packets =
+   MLX5_GET_CTR(out, transmitted_eth_unicast.packets) +
+   MLX5_GET_CTR(out, transmitted_eth_multicast.packets) +
+   MLX5_GET_CTR(out, transmitted_eth_broadcast.packets);
+
+   vf_stats->tx_bytes =
+   MLX5_GET_CTR(out, transmitted_eth_unicast.octets) +
+   MLX5_GET_CTR(out, transmitted_eth_multicast.octets) +
+   MLX5_GET_CTR(out, transmitted_eth_broadcast.octets);
+
+   vf_stats->multicast =
+   MLX5_GET_CTR(out, received_eth_multicast.packets);
+
+   vf_stats->broadcast =
+   MLX5_GET_CTR(out, received_eth_broadcast.packets);
+
+free_out:
+   kvfree(out);
+   return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index b4284a4..9bac542 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -154,5 +154,8 @@ int mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
int vport, u16 vlan, u8 qos);
 int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
  int vport, struct ifla_vf_info *ivi);
+int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
+int vport,
+struct ifla_vf_stats *vf_stats);
 #endif /* __MLX5_ESWITCH_H__ */
 
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 12/18] net/mlx5: E-Switch, Introduce FDB hardware capabilities

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Define needed hardware structures and capabilities needed
for E-Switch FDB flow tables and read them on driver load.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c | 13 +
 include/linux/mlx5/device.h  | 15 +++
 include/linux/mlx5/mlx5_ifc.h| 13 +
 3 files changed, 41 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 9335e5a..bf6e3df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -160,6 +160,19 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
if (err)
return err;
}
+
+   if (MLX5_CAP_GEN(dev, vport_group_manager) &&
+   MLX5_CAP_GEN(dev, eswitch_flow_table)) {
+   err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH_FLOW_TABLE,
+HCA_CAP_OPMOD_GET_CUR);
+   if (err)
+   return err;
+   err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH_FLOW_TABLE,
+HCA_CAP_OPMOD_GET_MAX);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 90a4cb6..bce9cae 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1138,6 +1138,7 @@ enum mlx5_cap_type {
MLX5_CAP_IPOIB_OFFLOADS,
MLX5_CAP_EOIB_OFFLOADS,
MLX5_CAP_FLOW_TABLE,
+   MLX5_CAP_ESWITCH_FLOW_TABLE,
/* NUM OF CAP Types */
MLX5_CAP_NUM
 };
@@ -1175,6 +1176,20 @@ enum mlx5_cap_type {
 #define MLX5_CAP_FLOWTABLE_MAX(mdev, cap) \
MLX5_GET(flow_table_nic_cap, mdev->hca_caps_max[MLX5_CAP_FLOW_TABLE], 
cap)
 
+#define MLX5_CAP_ESW_FLOWTABLE(mdev, cap) \
+   MLX5_GET(flow_table_eswitch_cap, \
+mdev->hca_caps_cur[MLX5_CAP_ESWITCH_FLOW_TABLE], cap)
+
+#define MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, cap) \
+   MLX5_GET(flow_table_eswitch_cap, \
+mdev->hca_caps_max[MLX5_CAP_ESWITCH_FLOW_TABLE], cap)
+
+#define MLX5_CAP_ESW_FLOWTABLE_FDB(mdev, cap) \
+   MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_nic_esw_fdb.cap)
+
+#define MLX5_CAP_ESW_FLOWTABLE_FDB_MAX(mdev, cap) \
+   MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_nic_esw_fdb.cap)
+
 #define MLX5_CAP_ODP(mdev, cap)\
MLX5_GET(odp_cap, mdev->hca_caps_cur[MLX5_CAP_ODP], cap)
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 39487d0..ae7c08a 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -447,6 +447,18 @@ struct mlx5_ifc_flow_table_nic_cap_bits {
u8 reserved_3[0x7200];
 };
 
+struct mlx5_ifc_flow_table_eswitch_cap_bits {
+   u8 reserved_0[0x200];
+
+   struct mlx5_ifc_flow_table_prop_layout_bits 
flow_table_properties_nic_esw_fdb;
+
+   struct mlx5_ifc_flow_table_prop_layout_bits 
flow_table_properties_esw_acl_ingress;
+
+   struct mlx5_ifc_flow_table_prop_layout_bits 
flow_table_properties_esw_acl_egress;
+
+   u8  reserved_1[0x7800];
+};
+
 struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
u8 csum_cap[0x1];
u8 vlan_cap[0x1];
@@ -1846,6 +1858,7 @@ union mlx5_ifc_hca_cap_union_bits {
struct mlx5_ifc_roce_cap_bits roce_cap;
struct mlx5_ifc_per_protocol_networking_offload_caps_bits 
per_protocol_networking_offload_caps;
struct mlx5_ifc_flow_table_nic_cap_bits flow_table_nic_cap;
+   struct mlx5_ifc_flow_table_eswitch_cap_bits flow_table_eswitch_cap;
u8 reserved_0[0x8000];
 };
 
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 14/18] net/mlx5: E-Switch, Introduce Vport administration functions

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Implement set VF mac/link state and query VF config
to be used later in nedev VF ndos or any other management API.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 61 +++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 10 +++-
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index a208be7..590a06c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1022,3 +1022,64 @@ void mlx5_eswitch_vport_event(struct mlx5_eswitch *esw, 
struct mlx5_eqe *eqe)
queue_work(esw->work_queue, >vport_change_handler);
spin_unlock(>lock);
 }
+
+/* Vport Administration */
+#define ESW_ALLOWED(esw) \
+   (esw && MLX5_CAP_GEN(esw->dev, vport_group_manager) && 
mlx5_core_is_pf(esw->dev))
+#define LEGAL_VPORT(esw, vport) (vport >= 0 && vport < esw->total_vports)
+
+int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
+  int vport, u8 mac[ETH_ALEN])
+{
+   int err = 0;
+
+   if (!ESW_ALLOWED(esw))
+   return -EPERM;
+   if (!LEGAL_VPORT(esw, vport))
+   return -EINVAL;
+
+   err = mlx5_modify_nic_vport_mac_address(esw->dev, vport, mac);
+   if (err) {
+   mlx5_core_warn(esw->dev,
+  "Failed to mlx5_modify_nic_vport_mac vport(%d) 
err=(%d)\n",
+  vport, err);
+   return err;
+   }
+
+   return err;
+}
+
+int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw,
+int vport, int link_state)
+{
+   if (!ESW_ALLOWED(esw))
+   return -EPERM;
+   if (!LEGAL_VPORT(esw, vport))
+   return -EINVAL;
+
+   return mlx5_modify_vport_admin_state(esw->dev,
+
MLX5_QUERY_VPORT_STATE_IN_OP_MOD_ESW_VPORT,
+vport, link_state);
+}
+
+int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
+ int vport, struct ifla_vf_info *ivi)
+{
+   if (!ESW_ALLOWED(esw))
+   return -EPERM;
+   if (!LEGAL_VPORT(esw, vport))
+   return -EINVAL;
+
+   memset(ivi, 0, sizeof(*ivi));
+   ivi->vf = vport - 1;
+
+   mlx5_query_nic_vport_mac_address(esw->dev, vport, ivi->mac);
+   ivi->linkstate = mlx5_query_vport_admin_state(esw->dev,
+ 
MLX5_QUERY_VPORT_STATE_IN_OP_MOD_ESW_VPORT,
+ vport);
+   ivi->vlan = 0;
+   ivi->qos = 0;
+   ivi->spoofchk = 0;
+
+   return 0;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index aec1ec0..c5827ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -33,6 +33,8 @@
 #ifndef __MLX5_ESWITCH_H__
 #define __MLX5_ESWITCH_H__
 
+#include 
+#include 
 #include 
 
 #define MLX5_MAX_UC_PER_VPORT(dev) \
@@ -139,10 +141,16 @@ struct mlx5_eswitch {
 
 /* E-Switch API */
 int mlx5_eswitch_init(struct mlx5_core_dev *dev);
+
 void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw);
 void mlx5_eswitch_vport_event(struct mlx5_eswitch *esw, struct mlx5_eqe *eqe);
 int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs);
 void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw);
-
+int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
+  int vport, u8 mac[ETH_ALEN]);
+int mlx5_eswitch_set_vport_state(struct mlx5_eswitch *esw,
+int vport, int link_state);
+int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
+ int vport, struct ifla_vf_info *ivi);
 #endif /* __MLX5_ESWITCH_H__ */
 
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/27] brcm80211: move under broadcom vendor directory

2015-11-23 Thread Arend van Spriel

On 11/23/2015 11:28 AM, Arend van Spriel wrote:

On 11/22/2015 06:23 PM, Kalle Valo wrote:

Arend van Spriel  writes:


On 11/19/2015 08:48 AM, Kalle Valo wrote:

Hauke Mehrtens  writes:


On 11/18/2015 03:45 PM, Kalle Valo wrote:

Part of reorganising wireless drivers directory and Kconfig. Note
that I had to
edit Makefiles from subdirectories to use the new location.

Signed-off-by: Kalle Valo 
---


I would prefer to remove the brcm80211 directory in this process
and create:
drivers/net/wireless/broadcom/brcmfmac
drivers/net/wireless/broadcom/brcmsmac
drivers/net/wireless/broadcom/brcmutil
drivers/net/wireless/broadcom/include

This way we have one directory less.


I think this could be done separately. This patchset is big enough
already, I would not like to make it anymore complicated.

And I actually like the brcm80211 directory, I would not mind
keeping it
still.


I prefer to keep it as brcmsmac and brcmfmac rely on brcmutil module
so I want to keep them together under brcm80211.

So does this patch go in before or after the patches I submitted
before the merge window. I hope after :-p


Sorry, the vendor patches go in first :) It's much safer that way.

But I think that git should be smart enough and your patchset from
before the merge window should still apply without issues.


Will see if that is true when I merge it in our internal repo. :-p


Just applied the pending patches using 'git am -3' and that works fine. 
So when told to be smart, git is indeed smart ;-)


Regards,
Arend

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 8/9] net: ipmr: rearrange and cleanup setsockopt

2015-11-23 Thread Nikolay Aleksandrov
On 11/23/2015 06:44 AM, Cong Wang wrote:
> On Sat, Nov 21, 2015 at 6:57 AM, Nikolay Aleksandrov
>  wrote:
>>  net/ipv4/ipmr.c | 191 
>> +++-
>>  1 file changed, 107 insertions(+), 84 deletions(-)
> 
> Does this really simplify the code? :-/
> 
Did I really say it does ? :-) Now, to the point it just makes it
much easier to reason about this setsockopt which was doing conditional
locking in some of the cases before, and some were left out, also "v"
was sometimes signed and sometimes unsigned, it's clearer now which type
is used. I've left a comment why the only special case needs to unlock
rtnl (MRT_DONE).



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 06/18] net/mlx5: Introduce access functions to modify/query vport state

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

In preparation for SR-IOV we add here an API to enable each e-switch
manager (PF) to configure its VFs link states in e-switch

preparation for ethernet sriov.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   | 61 ---
 include/linux/mlx5/mlx5_ifc.h |  1 +
 include/linux/mlx5/vport.h|  6 ++-
 4 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2ef717f..007e464 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -63,7 +63,7 @@ static void mlx5e_update_carrier(struct mlx5e_priv *priv)
u8 port_state;
 
port_state = mlx5_query_vport_state(mdev,
-   MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT);
+   MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT, 0);
 
if (port_state == VPORT_STATE_UP)
netif_carrier_on(priv->netdev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 986d0d3..b017a7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -36,26 +36,75 @@
 #include 
 #include "mlx5_core.h"
 
-u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod)
+static int _mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod,
+  u16 vport, u32 *out, int outlen)
 {
-   u32 in[MLX5_ST_SZ_DW(query_vport_state_in)];
-   u32 out[MLX5_ST_SZ_DW(query_vport_state_out)];
int err;
+   u32 in[MLX5_ST_SZ_DW(query_vport_state_in)];
 
memset(in, 0, sizeof(in));
 
MLX5_SET(query_vport_state_in, in, opcode,
 MLX5_CMD_OP_QUERY_VPORT_STATE);
MLX5_SET(query_vport_state_in, in, op_mod, opmod);
+   MLX5_SET(query_vport_state_in, in, vport_number, vport);
+   if (vport)
+   MLX5_SET(query_vport_state_in, in, other_vport, 1);
 
-   err = mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out,
-sizeof(out));
+   err = mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
if (err)
mlx5_core_warn(mdev, "MLX5_CMD_OP_QUERY_VPORT_STATE failed\n");
 
+   return err;
+}
+
+u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod, u16 vport)
+{
+   u32 out[MLX5_ST_SZ_DW(query_vport_state_out)] = {0};
+
+   _mlx5_query_vport_state(mdev, opmod, vport, out, sizeof(out));
+
return MLX5_GET(query_vport_state_out, out, state);
 }
-EXPORT_SYMBOL(mlx5_query_vport_state);
+EXPORT_SYMBOL_GPL(mlx5_query_vport_state);
+
+u8 mlx5_query_vport_admin_state(struct mlx5_core_dev *mdev, u8 opmod, u16 
vport)
+{
+   u32 out[MLX5_ST_SZ_DW(query_vport_state_out)] = {0};
+
+   _mlx5_query_vport_state(mdev, opmod, vport, out, sizeof(out));
+
+   return MLX5_GET(query_vport_state_out, out, admin_state);
+}
+EXPORT_SYMBOL(mlx5_query_vport_admin_state);
+
+int mlx5_modify_vport_admin_state(struct mlx5_core_dev *mdev, u8 opmod,
+ u16 vport, u8 state)
+{
+   u32 in[MLX5_ST_SZ_DW(modify_vport_state_in)];
+   u32 out[MLX5_ST_SZ_DW(modify_vport_state_out)];
+   int err;
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(modify_vport_state_in, in, opcode,
+MLX5_CMD_OP_MODIFY_VPORT_STATE);
+   MLX5_SET(modify_vport_state_in, in, op_mod, opmod);
+   MLX5_SET(modify_vport_state_in, in, vport_number, vport);
+
+   if (vport)
+   MLX5_SET(modify_vport_state_in, in, other_vport, 1);
+
+   MLX5_SET(modify_vport_state_in, in, admin_state, state);
+
+   err = mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out,
+sizeof(out));
+   if (err)
+   mlx5_core_warn(mdev, "MLX5_CMD_OP_MODIFY_VPORT_STATE failed\n");
+
+   return err;
+}
+EXPORT_SYMBOL(mlx5_modify_vport_admin_state);
 
 static int mlx5_query_nic_vport_context(struct mlx5_core_dev *mdev, u16 vport,
u32 *out, int outlen)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 836cf0e..6551847 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2946,6 +2946,7 @@ struct mlx5_ifc_query_vport_state_out_bits {
 
 enum {
MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT  = 0x0,
+   MLX5_QUERY_VPORT_STATE_IN_OP_MOD_ESW_VPORT   = 0x1,
 };
 
 struct mlx5_ifc_query_vport_state_in_bits {
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index 00bbec8..c1bba59 100644
--- 

[PATCH net-next 11/18] net/mlx5: Introducing E-Switch and l2 table

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

E-Switch is the software entity that represents and manages ConnectX4
inter-HCA ethernet l2 switching.

E-Switch has its own Virtual Ports, each Vport/vNIC/VF can be
connected to the device through a vport of an e-switch.

Each e-switch is managed by one vNIC identified by
HCA_CAP.vport_group_manager (usually it is the PF/vport[0]),
and its main responsibility is to forward each packet to the
right vport.

e-Switch needs to manage its own l2-table and FDB tables.

L2 table is a flow table that is managed by FW, it is needed for
Multi-host (Multi PF) configuration for inter HCA switching between
PFs.

FDB table is a flow table that is totally managed by e-Switch driver,
its main responsibility is to switch packets between e-Swtich internal
vports and uplink vport that belong to the same.

This patch introduces only e-Swtich l2 table management, FDB managemnt
will come later when ethernet SRIOV/VFs will be enabled.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  13 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 500 ++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 123 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c|  18 +
 include/linux/mlx5/device.h   |   8 +
 include/linux/mlx5/driver.h   |   4 +
 7 files changed, 667 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 4d51039..a075591 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -3,6 +3,6 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
mad.o transobj.o vport.o sriov.o
-mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o \
+mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o eswitch.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 713ead5..23c244a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -35,6 +35,9 @@
 #include 
 #include 
 #include "mlx5_core.h"
+#ifdef CONFIG_MLX5_CORE_EN
+#include "eswitch.h"
+#endif
 
 enum {
MLX5_EQE_SIZE   = sizeof(struct mlx5_eqe),
@@ -287,6 +290,11 @@ static int mlx5_eq_int(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq)
break;
 #endif
 
+#ifdef CONFIG_MLX5_CORE_EN
+   case MLX5_EVENT_TYPE_NIC_VPORT_CHANGE:
+   mlx5_eswitch_vport_event(dev->priv.eswitch, eqe);
+   break;
+#endif
default:
mlx5_core_warn(dev, "Unhandled event 0x%x on EQ 0x%x\n",
   eqe->type, eq->eqn);
@@ -459,6 +467,11 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
if (MLX5_CAP_GEN(dev, pg))
async_event_mask |= (1ull << MLX5_EVENT_TYPE_PAGE_FAULT);
 
+   if (MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH &&
+   MLX5_CAP_GEN(dev, vport_group_manager) &&
+   mlx5_core_is_pf(dev))
+   async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);
+
err = mlx5_create_map_eq(dev, >cmd_eq, MLX5_EQ_VEC_CMD,
 MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
 "mlx5_cmd_eq", >priv.uuari.uars[0]);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
new file mode 100644
index 000..1f2f804
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -0,0 +1,500 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *  

[PATCH net-next 13/18] net/mlx5: E-Switch, Add SR-IOV (FDB) support

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Enabling E-Switch SRIOV for nvfs+1 vports.

Create E-Switch FDB for L2 UC/MC mac steering between VFs/PF and
external vport (Uplink).

FDB contains forwarding rules such as:
UC MAC0 -> vport0(PF).
UC MAC1 -> vport1.
UC MAC2 -> vport2.
MC MACX -> vport0, vport2, Uplink.
MC MACY -> vport1, Uplink.

For unmatched traffic FDB has the following default rules:
Unmached Traffic (src vport != Uplink) -> Uplink.
Unmached Traffic (src vport == Uplink) -> vport0(PF).

FDB rules population:
Each NIC vport (VF) will notify E-Switch manager of its UC/MC vport
context changes via modify vport context command, which will be
translated to an event that will be handled by E-Switch manager (PF)
which will update FDB table accordingly.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 682 ++---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  25 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c|  14 +-
 include/linux/mlx5/device.h|   6 +
 include/linux/mlx5/flow_table.h|   9 +
 include/linux/mlx5/mlx5_ifc.h  |   7 +-
 7 files changed, 661 insertions(+), 83 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 1f2f804..a208be7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -34,9 +34,12 @@
 #include 
 #include 
 #include 
+#include 
 #include "mlx5_core.h"
 #include "eswitch.h"
 
+#define UPLINK_VPORT 0x
+
 #define MLX5_DEBUG_ESWITCH_MASK BIT(3)
 
 #define esw_info(dev, format, ...) \
@@ -54,18 +57,26 @@ enum {
MLX5_ACTION_DEL  = 2,
 };
 
-/* HW UC L2 table hash node */
-struct mlx5_uc_l2addr {
+/* E-Switch UC L2 table hash node */
+struct esw_uc_addr {
struct l2addr_node node;
-   u8 action;
u32table_index;
u32vport;
 };
 
-/* Vport UC L2 table hash node */
-struct mlx5_vport_addr {
-   struct l2addr_node node;
-   u8 action;
+/* E-Switch MC FDB table hash node */
+struct esw_mc_addr { /* SRIOV only */
+   struct l2addr_node node;
+   struct mlx5_flow_rule *uplink_rule; /* Forward to uplink rule */
+   u32refcnt;
+};
+
+/* Vport UC/MC hash node */
+struct vport_addr {
+   struct l2addr_node node;
+   u8 action;
+   u32vport;
+   struct mlx5_flow_rule *flow_rule; /* SRIOV only */
 };
 
 enum {
@@ -73,7 +84,11 @@ enum {
MC_ADDR_CHANGE = BIT(1),
 };
 
-static int arm_vport_context_events_cmd(struct mlx5_core_dev *dev, int vport,
+/* Vport context events */
+#define SRIOV_VPORT_EVENTS (UC_ADDR_CHANGE | \
+   MC_ADDR_CHANGE)
+
+static int arm_vport_context_events_cmd(struct mlx5_core_dev *dev, u16 vport,
u32 events_mask)
 {
int in[MLX5_ST_SZ_DW(modify_nic_vport_context_in)];
@@ -196,97 +211,492 @@ static void del_l2_table_entry(struct mlx5_core_dev 
*dev, u32 index)
free_l2_table_index(l2_table, index);
 }
 
-/* SW E-Switch L2 Table management */
-static int l2_table_addr_add(struct mlx5_eswitch *esw,
-u8 mac[ETH_ALEN], u32 vport)
+/* E-Switch FDB flow steering */
+struct dest_node {
+   struct list_head list;
+   struct mlx5_flow_destination dest;
+};
+
+static int _mlx5_flow_rule_apply(struct mlx5_flow_rule *fr)
 {
-   struct hlist_head *hash;
-   struct mlx5_uc_l2addr *addr;
+   bool was_valid = fr->valid;
+   struct dest_node *dest_n;
+   u32 dest_list_size = 0;
+   void *in_match_value;
+   u32 *flow_context;
+   u32 flow_index;
int err;
+   int i;
+
+   if (list_empty(>dest_list)) {
+   if (fr->valid)
+   mlx5_del_flow_table_entry(fr->ft, fr->fi);
+   fr->valid = false;
+   return 0;
+   }
+
+   list_for_each_entry(dest_n, >dest_list, list)
+   dest_list_size++;
 
-   hash = esw->l2_table.l2_hash;
-   addr = l2addr_hash_find(hash, mac, struct mlx5_uc_l2addr);
-   if (addr) {
+   flow_context = mlx5_vzalloc(MLX5_ST_SZ_BYTES(flow_context) +
+   MLX5_ST_SZ_BYTES(dest_format_struct) *
+   dest_list_size);
+   if (!flow_context)
+   return -ENOMEM;
+
+   MLX5_SET(flow_context, flow_context, flow_tag, fr->flow_tag);
+   MLX5_SET(flow_context, flow_context, action, fr->action);
+   MLX5_SET(flow_context, flow_context, destination_list_size,
+ 

[PATCH net-next 18/18] net/mlx5e: Add support for SR-IOV ndos

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Implement and enable SR-IOV ndos to manage SR-IOV configuration via
netdev netlink API.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 84 ++-
 1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 007e464..49c0d75 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -32,6 +32,7 @@
 
 #include 
 #include "en.h"
+#include "eswitch.h"
 
 struct mlx5e_rq_param {
u32rqc[MLX5_ST_SZ_DW(rqc)];
@@ -1931,6 +1932,79 @@ static int mlx5e_change_mtu(struct net_device *netdev, 
int new_mtu)
return err;
 }
 
+static int mlx5e_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+
+   return mlx5_eswitch_set_vport_mac(mdev->priv.eswitch, vf + 1, mac);
+}
+
+static int mlx5e_set_vf_vlan(struct net_device *dev, int vf, u16 vlan, u8 qos)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+
+   return mlx5_eswitch_set_vport_vlan(mdev->priv.eswitch, vf + 1,
+  vlan, qos);
+}
+
+static int mlx5_vport_link2ifla(u8 esw_link)
+{
+   switch (esw_link) {
+   case MLX5_ESW_VPORT_ADMIN_STATE_DOWN:
+   return IFLA_VF_LINK_STATE_DISABLE;
+   case MLX5_ESW_VPORT_ADMIN_STATE_UP:
+   return IFLA_VF_LINK_STATE_ENABLE;
+   };
+   return IFLA_VF_LINK_STATE_AUTO;
+}
+
+static int mlx5_ifla_link2vport(u8 ifla_link)
+{
+   switch (ifla_link) {
+   case IFLA_VF_LINK_STATE_DISABLE:
+   return MLX5_ESW_VPORT_ADMIN_STATE_DOWN;
+   case IFLA_VF_LINK_STATE_ENABLE:
+   return MLX5_ESW_VPORT_ADMIN_STATE_UP;
+   };
+   return MLX5_ESW_VPORT_ADMIN_STATE_AUTO;
+}
+
+static int mlx5e_set_vf_link_state(struct net_device *dev, int vf,
+  int link_state)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+
+   return mlx5_eswitch_set_vport_state(mdev->priv.eswitch, vf + 1,
+   mlx5_ifla_link2vport(link_state));
+}
+
+static int mlx5e_get_vf_config(struct net_device *dev,
+  int vf, struct ifla_vf_info *ivi)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+   int err;
+
+   err = mlx5_eswitch_get_vport_config(mdev->priv.eswitch, vf + 1, ivi);
+   if (err)
+   return err;
+   ivi->linkstate = mlx5_vport_link2ifla(ivi->linkstate);
+   return 0;
+}
+
+static int mlx5e_get_vf_stats(struct net_device *dev,
+ int vf, struct ifla_vf_stats *vf_stats)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+
+   return mlx5_eswitch_get_vport_stats(mdev->priv.eswitch, vf + 1,
+   vf_stats);
+}
+
 static struct net_device_ops mlx5e_netdev_ops = {
.ndo_open= mlx5e_open,
.ndo_stop= mlx5e_close,
@@ -1941,7 +2015,7 @@ static struct net_device_ops mlx5e_netdev_ops = {
.ndo_vlan_rx_add_vid = mlx5e_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid= mlx5e_vlan_rx_kill_vid,
.ndo_set_features= mlx5e_set_features,
-   .ndo_change_mtu  = mlx5e_change_mtu,
+   .ndo_change_mtu  = mlx5e_change_mtu
 };
 
 static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
@@ -2041,6 +2115,14 @@ static void mlx5e_build_netdev(struct net_device *netdev)
if (priv->params.num_tc > 1)
mlx5e_netdev_ops.ndo_select_queue = mlx5e_select_queue;
 
+   if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
+   mlx5e_netdev_ops.ndo_set_vf_mac = mlx5e_set_vf_mac;
+   mlx5e_netdev_ops.ndo_set_vf_vlan = mlx5e_set_vf_vlan;
+   mlx5e_netdev_ops.ndo_get_vf_config = mlx5e_get_vf_config;
+   mlx5e_netdev_ops.ndo_set_vf_link_state = 
mlx5e_set_vf_link_state;
+   mlx5e_netdev_ops.ndo_get_vf_stats = mlx5e_get_vf_stats;
+   }
+
netdev->netdev_ops= _netdev_ops;
netdev->watchdog_timeo= 15 * HZ;
 
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 02/18] net/mlx5_core: Add base sriov support

2015-11-23 Thread Or Gerlitz
From: Eli Cohen 

This patch adds SRIOV base support for mlx5 supported devices. The same
driver is used for both PFs and VFs; VFs are identified by the driver
through the flag MLX5_PCI_DEV_IS_VF added to the pci table entries.
Virtual functions are created as usual through writing a value to the
sriov_numvs sysfs file of the PF device. Upon instantiating VFs, they will
all be probed by the driver on the hypervisor. One can gracefully unbind
them through /sys/bus/pci/drivers/mlx5_core/unbind.

mlx5_wait_for_vf_pages() was added to ensure that when a VF dies without
executing proper teardown, the hypervisor driver waits till all of the
pages that were allocated at the hypervisor to maintain its operation
are returned.

In order for the VF to be operational, the PF needs to call enable_hca
for it. This can be done before the VFs are created through a call to
pci_enable_sriov.

If the there are VFs assigned to a VMs when the driver of the PF is
unloaded, all the VF will experience system error and PF driver unloads
cleanly; in this case pci_disable_sriov is not called and the devices
will show when running lspci. Once the PF driver is reloaded, it will
sync its data structures which maintain state on its VFs.

Signed-off-by: Eli Cohen 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  36 +++-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   2 +
 .../net/ethernet/mellanox/mlx5/core/pagealloc.c|  38 
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c| 221 +
 include/linux/mlx5/driver.h|  24 +++
 include/linux/mlx5/mlx5_ifc.h  |   4 +-
 7 files changed, 318 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sriov.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 26a68b8..4d51039 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o
 
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
-   mad.o transobj.o vport.o
+   mad.o transobj.o vport.o sriov.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index f2e64dc..66e2b37 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -454,6 +454,9 @@ static int set_hca_ctrl(struct mlx5_core_dev *dev)
struct mlx5_reg_host_endianess he_out;
int err;
 
+   if (!mlx5_core_is_pf(dev))
+   return 0;
+
memset(_in, 0, sizeof(he_in));
he_in.he = MLX5_SET_HOST_ENDIANNESS;
err = mlx5_core_access_reg(dev, _in,  sizeof(he_in),
@@ -1049,6 +1052,12 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv)
mlx5_init_srq_table(dev);
mlx5_init_mr_table(dev);
 
+   err = mlx5_sriov_init(dev);
+   if (err) {
+   dev_err(>dev, "sriov init failed %d\n", err);
+   goto err_sriov;
+   }
+
err = mlx5_register_device(dev);
if (err) {
dev_err(>dev, "mlx5_register_device failed %d\n", err);
@@ -1065,6 +1074,10 @@ out:
 
return 0;
 
+err_sriov:
+   if (mlx5_sriov_cleanup(dev))
+   dev_err(>pdev->dev, "sriov cleanup failed\n");
+
 err_reg_dev:
mlx5_cleanup_mr_table(dev);
mlx5_cleanup_srq_table(dev);
@@ -1120,6 +1133,13 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv)
 {
int err = 0;
 
+   err = mlx5_sriov_cleanup(dev);
+   if (err) {
+   dev_warn(>pdev->dev, "%s: sriov cleanup failed - abort\n",
+__func__);
+   return err;
+   }
+
mutex_lock(>intf_state_mutex);
if (dev->interface_state == MLX5_INTERFACE_STATE_DOWN) {
dev_warn(>pdev->dev, "%s: interface is down, NOP\n",
@@ -1192,6 +1212,7 @@ static int init_one(struct pci_dev *pdev,
return -ENOMEM;
}
priv = >priv;
+   priv->pci_dev_data = id->driver_data;
 
pci_set_drvdata(pdev, dev);
 
@@ -1362,12 +1383,12 @@ static const struct pci_error_handlers mlx5_err_handler 
= {
 };
 
 static const struct pci_device_id mlx5_core_pci_table[] = {
-   { PCI_VDEVICE(MELLANOX, 0x1011) }, /* Connect-IB */
-   { PCI_VDEVICE(MELLANOX, 0x1012) }, /* Connect-IB VF */
-   { PCI_VDEVICE(MELLANOX, 0x1013) }, /* ConnectX-4 */
-   { 

[PATCH net-next 15/18] net/mlx5: E-Switch, Introduce HCA cap and E-Switch vport context

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

E-Switch vport context is unlike NIC vport context, managed by the
E-Switch manager or vport_group_manager and not by the NIC(VF) driver.

The E-Switch manager can access (read/modify) any of its vports
E-Switch context.

Currently E-Switch vport context includes only clietnt and server
vlan insertion and striping data (for later support of VST mode).

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c | 11 
 include/linux/mlx5/device.h  |  9 +++
 include/linux/mlx5/mlx5_ifc.h| 90 
 3 files changed, 110 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index bf6e3df..1c9f9a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -173,6 +173,17 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
 
+   if (MLX5_CAP_GEN(dev, vport_group_manager)) {
+   err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH,
+HCA_CAP_OPMOD_GET_CUR);
+   if (err)
+   return err;
+   err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH,
+HCA_CAP_OPMOD_GET_MAX);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 88eb449..7d3a85f 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1145,6 +1145,7 @@ enum mlx5_cap_type {
MLX5_CAP_EOIB_OFFLOADS,
MLX5_CAP_FLOW_TABLE,
MLX5_CAP_ESWITCH_FLOW_TABLE,
+   MLX5_CAP_ESWITCH,
/* NUM OF CAP Types */
MLX5_CAP_NUM
 };
@@ -1196,6 +1197,14 @@ enum mlx5_cap_type {
 #define MLX5_CAP_ESW_FLOWTABLE_FDB_MAX(mdev, cap) \
MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_nic_esw_fdb.cap)
 
+#define MLX5_CAP_ESW(mdev, cap) \
+   MLX5_GET(e_switch_cap, \
+mdev->hca_caps_cur[MLX5_CAP_ESWITCH], cap)
+
+#define MLX5_CAP_ESW_MAX(mdev, cap) \
+   MLX5_GET(e_switch_cap, \
+mdev->hca_caps_max[MLX5_CAP_ESWITCH], cap)
+
 #define MLX5_CAP_ODP(mdev, cap)\
MLX5_GET(odp_cap, mdev->hca_caps_cur[MLX5_CAP_ODP], cap)
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index a81b008..f5d9449 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -459,6 +459,17 @@ struct mlx5_ifc_flow_table_eswitch_cap_bits {
u8  reserved_1[0x7800];
 };
 
+struct mlx5_ifc_e_switch_cap_bits {
+   u8 vport_svlan_strip[0x1];
+   u8 vport_cvlan_strip[0x1];
+   u8 vport_svlan_insert[0x1];
+   u8 vport_cvlan_insert_if_not_exist[0x1];
+   u8 vport_cvlan_insert_overwrite[0x1];
+   u8 reserved_0[0x1b];
+
+   u8 reserved_1[0x7e0];
+};
+
 struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
u8 csum_cap[0x1];
u8 vlan_cap[0x1];
@@ -1860,6 +1871,7 @@ union mlx5_ifc_hca_cap_union_bits {
struct mlx5_ifc_per_protocol_networking_offload_caps_bits 
per_protocol_networking_offload_caps;
struct mlx5_ifc_flow_table_nic_cap_bits flow_table_nic_cap;
struct mlx5_ifc_flow_table_eswitch_cap_bits flow_table_eswitch_cap;
+   struct mlx5_ifc_e_switch_cap_bits e_switch_cap;
u8 reserved_0[0x8000];
 };
 
@@ -2305,6 +2317,26 @@ struct mlx5_ifc_hca_vport_context_bits {
u8 reserved_6[0xca0];
 };
 
+struct mlx5_ifc_esw_vport_context_bits {
+   u8 reserved_0[0x3];
+   u8 vport_svlan_strip[0x1];
+   u8 vport_cvlan_strip[0x1];
+   u8 vport_svlan_insert[0x1];
+   u8 vport_cvlan_insert[0x2];
+   u8 reserved_1[0x18];
+
+   u8 reserved_2[0x20];
+
+   u8 svlan_cfi[0x1];
+   u8 svlan_pcp[0x3];
+   u8 svlan_id[0xc];
+   u8 cvlan_cfi[0x1];
+   u8 cvlan_pcp[0x3];
+   u8 cvlan_id[0xc];
+
+   u8 reserved_3[0x7a0];
+};
+
 enum {
MLX5_EQC_STATUS_OK= 0x0,
MLX5_EQC_STATUS_EQ_WRITE_FAILURE  = 0xa,
@@ -3743,6 +3775,64 @@ struct mlx5_ifc_query_flow_group_in_bits {
u8 reserved_5[0x120];
 };
 
+struct mlx5_ifc_query_esw_vport_context_out_bits {
+   u8 status[0x8];
+   u8 reserved_0[0x18];
+
+   u8 syndrome[0x20];
+
+   u8 reserved_1[0x40];
+
+   struct mlx5_ifc_esw_vport_context_bits esw_vport_context;
+};
+
+struct mlx5_ifc_query_esw_vport_context_in_bits {
+   u8 opcode[0x10];
+   u8 reserved_0[0x10];
+
+   u8   

[PATCH net-next 03/18] net/mlx5: Add HW capabilities and structs for SR-IOV E-Switch.

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Update HCA capabilities and HW struct to include needed
capabilities for upcoming Ethernet Switch (SR-IOV E-Switch).

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 include/linux/mlx5/mlx5_ifc.h | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 9b76fdd..836cf0e 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -665,7 +665,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_17[0x1];
u8 ets[0x1];
u8 nic_flow_table[0x1];
-   u8 reserved_18_0;
+   u8 eswitch_flow_table[0x1];
u8 early_vf_enable;
u8 reserved_18[0x2];
u8 local_ca_ack_delay[0x5];
@@ -789,22 +789,30 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_60[0x1b];
u8 log_max_wq_sz[0x5];
 
-   u8 reserved_61[0xa0];
-
+   u8 nic_vport_change_event[0x1];
+   u8 reserved_61[0xa];
+   u8 log_max_vlan_list[0x5];
u8 reserved_62[0x3];
+   u8 log_max_current_mc_list[0x5];
+   u8 reserved_63[0x3];
+   u8 log_max_current_uc_list[0x5];
+
+   u8 reserved_64[0x80];
+
+   u8 reserved_65[0x3];
u8 log_max_l2_table[0x5];
-   u8 reserved_63[0x8];
+   u8 reserved_66[0x8];
u8 log_uar_page_sz[0x10];
 
-   u8 reserved_64[0x100];
+   u8 reserved_67[0xe0];
 
-   u8 reserved_65[0x1f];
+   u8 reserved_68[0x1f];
u8 cqe_zip[0x1];
 
u8 cqe_zip_timeout[0x10];
u8 cqe_zip_max_num[0x10];
 
-   u8 reserved_66[0x220];
+   u8 reserved_69[0x220];
 };
 
 enum {
@@ -2135,10 +2143,6 @@ struct mlx5_ifc_rmpc_bits {
struct mlx5_ifc_wq_bits wq;
 };
 
-enum {
-   MLX5_NIC_VPORT_CONTEXT_ALLOWED_LIST_TYPE_CURRENT_UC_MAC_ADDRESS  = 0x0,
-};
-
 struct mlx5_ifc_nic_vport_context_bits {
u8 reserved_0[0x1f];
u8 roce_en[0x1];
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 01/18] net/mlx5_core: Modify enable/disable hca functions

2015-11-23 Thread Or Gerlitz
From: Eli Cohen 

Modify these functions to have func_id argument to state which device we
are referring to. This is done as a preparation for SRIOV support where
a PF driver needs to control its virtual functions.

Signed-off-by: Eli Cohen 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 45 ++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  2 +
 2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 4ac8d4cc..f2e64dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -462,42 +462,39 @@ static int set_hca_ctrl(struct mlx5_core_dev *dev)
return err;
 }
 
-static int mlx5_core_enable_hca(struct mlx5_core_dev *dev)
+int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id)
 {
+   u32 out[MLX5_ST_SZ_DW(enable_hca_out)];
+   u32 in[MLX5_ST_SZ_DW(enable_hca_in)];
int err;
-   struct mlx5_enable_hca_mbox_in in;
-   struct mlx5_enable_hca_mbox_out out;
 
-   memset(, 0, sizeof(in));
-   memset(, 0, sizeof(out));
-   in.hdr.opcode = cpu_to_be16(MLX5_CMD_OP_ENABLE_HCA);
+   memset(in, 0, sizeof(in));
+   MLX5_SET(enable_hca_in, in, opcode, MLX5_CMD_OP_ENABLE_HCA);
+   MLX5_SET(enable_hca_in, in, function_id, func_id);
+   memset(out, 0, sizeof(out));
+
err = mlx5_cmd_exec(dev, , sizeof(in), , sizeof(out));
if (err)
return err;
 
-   if (out.hdr.status)
-   return mlx5_cmd_status_to_err();
-
-   return 0;
+   return mlx5_cmd_status_to_err_v2(out);
 }
 
-static int mlx5_core_disable_hca(struct mlx5_core_dev *dev)
+int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id)
 {
+   u32 out[MLX5_ST_SZ_DW(disable_hca_out)];
+   u32 in[MLX5_ST_SZ_DW(disable_hca_in)];
int err;
-   struct mlx5_disable_hca_mbox_in in;
-   struct mlx5_disable_hca_mbox_out out;
 
-   memset(, 0, sizeof(in));
-   memset(, 0, sizeof(out));
-   in.hdr.opcode = cpu_to_be16(MLX5_CMD_OP_DISABLE_HCA);
-   err = mlx5_cmd_exec(dev, , sizeof(in), , sizeof(out));
+   memset(in, 0, sizeof(in));
+   MLX5_SET(disable_hca_in, in, opcode, MLX5_CMD_OP_DISABLE_HCA);
+   MLX5_SET(disable_hca_in, in, function_id, func_id);
+   memset(out, 0, sizeof(out));
+   err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
if (err)
return err;
 
-   if (out.hdr.status)
-   return mlx5_cmd_status_to_err();
-
-   return 0;
+   return mlx5_cmd_status_to_err_v2(out);
 }
 
 static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
@@ -942,7 +939,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct 
mlx5_priv *priv)
 
mlx5_pagealloc_init(dev);
 
-   err = mlx5_core_enable_hca(dev);
+   err = mlx5_core_enable_hca(dev, 0);
if (err) {
dev_err(>dev, "enable hca failed\n");
goto err_pagealloc_cleanup;
@@ -1106,7 +1103,7 @@ reclaim_boot_pages:
mlx5_reclaim_startup_pages(dev);
 
 err_disable_hca:
-   mlx5_core_disable_hca(dev);
+   mlx5_core_disable_hca(dev, 0);
 
 err_pagealloc_cleanup:
mlx5_pagealloc_cleanup(dev);
@@ -1149,7 +1146,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv)
}
mlx5_pagealloc_stop(dev);
mlx5_reclaim_startup_pages(dev);
-   mlx5_core_disable_hca(dev);
+   mlx5_core_disable_hca(dev, 0);
mlx5_pagealloc_cleanup(dev);
mlx5_cmd_cleanup(dev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index cee5b7a..1ed2239 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -90,6 +90,8 @@ void mlx5_core_event(struct mlx5_core_dev *dev, enum 
mlx5_dev_event event,
 unsigned long param);
 void mlx5_enter_error_state(struct mlx5_core_dev *dev);
 void mlx5_disable_device(struct mlx5_core_dev *dev);
+int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
+int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
 
 void mlx5e_init(void);
 void mlx5e_cleanup(void);
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 00/18] Introducing ConnectX-4 Ethernet SRIOV

2015-11-23 Thread Or Gerlitz
Hi Dave,

This patchset introduces the support of Ethernet SRIOV in ConnectX-4 
family of 100G Ethernet NICs.

Some features are still missing, but all the basic SRIOV functionalities
are there already.

Basic Introduction:
ConnectX-4 HW architecture provides two kinds of underlying HW switches.

MPFS (Multi Physical Function Switch) or L2 Table in Software terms:

The HCA has one MPFS switch per physical port, this switch is responsible
of forwarding Unicast traffic to the various overlying Physical Functions 
(PFs). 
Multicast traffic is flooded amongst all the PFs, Each PF can request to
forward a unicast MAC to its E-Switch Uplink vport (which we will cover later) 
through SET_L2_TABLE_ENTRY HW command.

MPFS has five ports, four are connected to PFs (one for each) and one is 
connected 
directly to the Physical Port (Physical Link).

E-Switch (Ethernet Switch): 

The HCA has one per physical function. The main responsibility of this 
component is 
to forward Unicast/Multicast and vlan tagged/untagged traffic to the various 
Virtual Functions (VFs) allocated by the PF. Unlike MPFS, the PF needs to 
explicitly 
create the E-Switch FDB table, Which is a HW flow table managed by the PF 
driver 
whenever vport_group_manager capability bit is set for this PF.

E-Switch has Virtual Ports (vports) entities as its ports, vport0 and uplink 
vport
are special kind of vports that represents PF vport (vport0) and uplink vport 
which
is connected to the MPFS switch (if exists) as the PF external link.
vport1..vportN represent VF0..VF(N-1) egress/ingress ports.

E-Switch FDB contains forwarding rules such as:
UC MAC0 -> vport0(PF).
UC MAC1 -> vport1.
UC MAC2 -> vport2.
MC MACX -> vport0, vport2, Uplink.
MC MACY -> vport1, Uplink.

For unmatched traffic FDB has the following default rules:
Unmatched Traffic (src vport != Uplink) -> Uplink.
Unmatched Traffic (src vport == Uplink) -> vport0(PF).

NIC VPort context: 
Each NIC (VF/PF) has its own vport context which will be used to store the 
current
NIC vport context (UC/MC and vlan lists) and other NIC properties such as MTU, 
promisc 
mode, etc.. NIC (VF/PF) driver is responsible of constantly updating this 
context.

FDB rules population:
Each NIC vport (VF/PF) will notify E-Switch manager of its UC/MC vport
context changes via modify vport context command, which will be
translated to an event that will be handled by E-Switch manager (PF)
which will update FDB table accordingly.

Both PF and VF use the same driver and submit commands directly to the firmware.
The PF sees the vport_group_manager capability bit and as such runs the code
to populate the embedded switches as explained above.

The patch goes as follows:

Patches 1-2 introduces the basic PCI SRIOV functionalities and the support of
Connectx4 to enable specific VFs via enable/disable HCA commands. These two
patches will be also in use later for the IB SRIOV flow.

Patches 3-8 Introduces the basic E-Switch capabilities and commands to be used 
later by
VF to modify and update its NIC vport context, and by PF (E-Switch Manager) 
driver to
Query the VF NIC context and acts accordingly.

Patches 9-10 Provide the needed functionality of a NIC driver VF/PF to support 
SRIOV,
mainly vport context update support.

Patch 11 ("net/mlx5: Introducing E-Switch and l2 table"), Introduces the basic
E-Switch support and infrastructure to read vport context events and to update
MPFS L2 Table of the UC mac addresses request by the PF.

Patches 12-18 Introduces SRIOV enablemenet and E-Switch FDB table management
It adds the Basic E-Swtich public API to set and get sriov properties to be used
in PF netdev sriov ndos.

Patchset was applied ontop of commit 3f8c0f7 "gianfar: use 
of_property_read_bool()"

Saeed, Eli and Or.

Eli Cohen (2):
  net/mlx5_core: Modify enable/disable hca functions
  net/mlx5_core: Add base sriov support

Saeed Mahameed (16):
  net/mlx5: Add HW capabilities and structs for SR-IOV E-Switch.
  net/mlx5: Update access functions to Query/Modify vport MAC address
  net/mlx5: Introduce access functions to modify/query vport mac lists
  net/mlx5: Introduce access functions to modify/query vport state
  net/mlx5: Introduce access functions to modify/query vport promisc mode
  net/mlx5: Introduce access functions to modify/query vport vlans
  net/mlx5e: Write UC/MC list and promisc mode into vport context
  net/mlx5e: Write vlan list into vport context
  net/mlx5: Introducing E-Switch and l2 table
  net/mlx5: E-Switch, Introduce FDB hardware capabilities
  net/mlx5: E-Switch, Add SR-IOV (FDB) support
  net/mlx5: E-Switch, Introduce Vport administration functions
  net/mlx5: E-Switch, Introduce HCA cap and E-Switch vport context
  net/mlx5: E-Switch, Introduce set vport vlan (VST mode)
  net/mlx5: E-Switch, Introduce get vf statistics
  net/mlx5e: Add support for SR-IOV ndos

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |4 +-
 

[PATCH net-next 07/18] net/mlx5: Introduce access functions to modify/query vport promisc mode

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Those functions are needed to notify the upcoming SR-IOV
E-Switch(FDB) manager(PF), of the NIC vport (vf) promisc mode changes.

Preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 62 +
 include/linux/mlx5/mlx5_ifc.h   | 28 +--
 include/linux/mlx5/vport.h  |  9 
 3 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index b017a7e..68aa51d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -576,3 +576,65 @@ int mlx5_query_hca_vport_node_guid(struct mlx5_core_dev 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_node_guid);
+
+int mlx5_query_nic_vport_promisc(struct mlx5_core_dev *mdev,
+u32 vport,
+int *promisc_uc,
+int *promisc_mc,
+int *promisc_all)
+{
+   u32 *out;
+   int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
+   int err;
+
+   out = kzalloc(outlen, GFP_KERNEL);
+   if (!out)
+   return -ENOMEM;
+
+   err = mlx5_query_nic_vport_context(mdev, vport, out, outlen);
+   if (err)
+   goto out;
+
+   *promisc_uc = MLX5_GET(query_nic_vport_context_out, out,
+  nic_vport_context.promisc_uc);
+   *promisc_mc = MLX5_GET(query_nic_vport_context_out, out,
+  nic_vport_context.promisc_mc);
+   *promisc_all = MLX5_GET(query_nic_vport_context_out, out,
+   nic_vport_context.promisc_all);
+
+out:
+   kfree(out);
+   return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_promisc);
+
+int mlx5_modify_nic_vport_promisc(struct mlx5_core_dev *mdev,
+ int promisc_uc,
+ int promisc_mc,
+ int promisc_all)
+{
+   void *in;
+   int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+   int err;
+
+   in = mlx5_vzalloc(inlen);
+   if (!in) {
+   mlx5_core_err(mdev, "failed to allocate inbox\n");
+   return -ENOMEM;
+   }
+
+   MLX5_SET(modify_nic_vport_context_in, in, field_select.promisc, 1);
+   MLX5_SET(modify_nic_vport_context_in, in,
+nic_vport_context.promisc_uc, promisc_uc);
+   MLX5_SET(modify_nic_vport_context_in, in,
+nic_vport_context.promisc_mc, promisc_mc);
+   MLX5_SET(modify_nic_vport_context_in, in,
+nic_vport_context.promisc_all, promisc_all);
+
+   err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+
+   kvfree(in);
+
+   return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_promisc);
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6551847..2728b5f6 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2147,16 +2147,31 @@ struct mlx5_ifc_nic_vport_context_bits {
u8 reserved_0[0x1f];
u8 roce_en[0x1];
 
-   u8 reserved_1[0x760];
+   u8 arm_change_event[0x1];
+   u8 reserved_1[0x1a];
+   u8 event_on_mtu[0x1];
+   u8 event_on_promisc_change[0x1];
+   u8 event_on_vlan_change[0x1];
+   u8 event_on_mc_address_change[0x1];
+   u8 event_on_uc_address_change[0x1];
 
-   u8 reserved_2[0x5];
+   u8 reserved_2[0xf0];
+
+   u8 mtu[0x10];
+
+   u8 reserved_3[0x640];
+
+   u8 promisc_uc[0x1];
+   u8 promisc_mc[0x1];
+   u8 promisc_all[0x1];
+   u8 reserved_4[0x2];
u8 allowed_list_type[0x3];
-   u8 reserved_3[0xc];
+   u8 reserved_5[0xc];
u8 allowed_list_size[0xc];
 
struct mlx5_ifc_mac_address_layout_bits permanent_address;
 
-   u8 reserved_4[0x20];
+   u8 reserved_6[0x20];
 
u8 current_uc_mac_address[0][0x40];
 };
@@ -4235,7 +4250,10 @@ struct mlx5_ifc_modify_nic_vport_context_out_bits {
 };
 
 struct mlx5_ifc_modify_nic_vport_field_select_bits {
-   u8 reserved_0[0x1c];
+   u8 reserved_0[0x19];
+   u8 mtu[0x1];
+   u8 change_event[0x1];
+   u8 promisc[0x1];
u8 permanent_address[0x1];
u8 addresses_list[0x1];
u8 roce_en[0x1];
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index c1bba59..dbbaed9 100644
--- 

[PATCH net-next 05/18] net/mlx5: Introduce access functions to modify/query vport mac lists

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Those functions are needed to notify the upcoming L2 table and SR-IOV
E-Switch(FDB) manager(PF), of the NIC vport (vf) UC/MC mac lists
changes.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 119 
 include/linux/mlx5/device.h |   6 ++
 include/linux/mlx5/vport.h  |  10 ++
 3 files changed, 135 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 442916e..986d0d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -150,6 +150,125 @@ int mlx5_modify_nic_vport_mac_address(struct 
mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL(mlx5_modify_nic_vport_mac_address);
 
+int mlx5_query_nic_vport_mac_list(struct mlx5_core_dev *dev,
+ u32 vport,
+ enum mlx5_list_type list_type,
+ u8 addr_list[][ETH_ALEN],
+ int *list_size)
+{
+   u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
+   void *nic_vport_ctx;
+   int max_list_size;
+   int req_list_size;
+   int out_sz;
+   void *out;
+   int err;
+   int i;
+
+   req_list_size = *list_size;
+
+   max_list_size = list_type == MLX5_NVPRT_LIST_TYPE_UC ?
+   1 << MLX5_CAP_GEN(dev, log_max_current_uc_list) :
+   1 << MLX5_CAP_GEN(dev, log_max_current_mc_list);
+
+   if (req_list_size > max_list_size) {
+   mlx5_core_warn(dev, "Requested list size (%d) > (%d) 
max_list_size\n",
+  req_list_size, max_list_size);
+   req_list_size = max_list_size;
+   }
+
+   out_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+   req_list_size * MLX5_ST_SZ_BYTES(mac_address_layout);
+
+   memset(in, 0, sizeof(in));
+   out = kzalloc(out_sz, GFP_KERNEL);
+   if (!out)
+   return -ENOMEM;
+
+   MLX5_SET(query_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+   MLX5_SET(query_nic_vport_context_in, in, allowed_list_type, list_type);
+   MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+
+   if (vport)
+   MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+
+   err = mlx5_cmd_exec_check_status(dev, in, sizeof(in), out, out_sz);
+   if (err)
+   goto out;
+
+   nic_vport_ctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+nic_vport_context);
+   req_list_size = MLX5_GET(nic_vport_context, nic_vport_ctx,
+allowed_list_size);
+
+   *list_size = req_list_size;
+   for (i = 0; i < req_list_size; i++) {
+   u8 *mac_addr = MLX5_ADDR_OF(nic_vport_context,
+   nic_vport_ctx,
+   current_uc_mac_address[i]) + 2;
+   ether_addr_copy(addr_list[i], mac_addr);
+   }
+out:
+   kfree(out);
+   return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_mac_list);
+
+int mlx5_modify_nic_vport_mac_list(struct mlx5_core_dev *dev,
+  enum mlx5_list_type list_type,
+  u8 addr_list[][ETH_ALEN],
+  int list_size)
+{
+   u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
+   void *nic_vport_ctx;
+   int max_list_size;
+   int in_sz;
+   void *in;
+   int err;
+   int i;
+
+   max_list_size = list_type == MLX5_NVPRT_LIST_TYPE_UC ?
+1 << MLX5_CAP_GEN(dev, log_max_current_uc_list) :
+1 << MLX5_CAP_GEN(dev, log_max_current_mc_list);
+
+   if (list_size > max_list_size)
+   return -ENOSPC;
+
+   in_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+   list_size * MLX5_ST_SZ_BYTES(mac_address_layout);
+
+   memset(out, 0, sizeof(out));
+   in = kzalloc(in_sz, GFP_KERNEL);
+   if (!in)
+   return -ENOMEM;
+
+   MLX5_SET(modify_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+   MLX5_SET(modify_nic_vport_context_in, in,
+field_select.addresses_list, 1);
+
+   nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in,
+nic_vport_context);
+
+   MLX5_SET(nic_vport_context, nic_vport_ctx,
+allowed_list_type, list_type);
+   MLX5_SET(nic_vport_context, nic_vport_ctx,
+allowed_list_size, list_size);
+
+   for (i = 0; i < list_size; i++) {
+   u8 *curr_mac = 

[PATCH net-next 04/18] net/mlx5: Update access functions to Query/Modify vport MAC address

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

In preparation for SR-IOV we add here an API to enable each e-switch
client (PF/VF) to configure its L2 MAC addresses and for the e-switch
manager (usually the PF) to access them in order to be able to
configure them into the e-switch.
Therefore we now pass vport num parameter to
mlx5_query_nic_vport_context, so PF can access other vports contexts.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   | 87 ---
 include/linux/mlx5/vport.h|  5 +-
 3 files changed, 81 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index f6a8cc7..2ef717f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2028,7 +2028,7 @@ static void mlx5e_set_netdev_dev_addr(struct net_device 
*netdev)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
 
-   mlx5_query_nic_vport_mac_address(priv->mdev, netdev->dev_addr);
+   mlx5_query_nic_vport_mac_address(priv->mdev, 0, netdev->dev_addr);
 }
 
 static void mlx5e_build_netdev(struct net_device *netdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index b94177e..442916e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -57,33 +57,98 @@ u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 
opmod)
 }
 EXPORT_SYMBOL(mlx5_query_vport_state);
 
-void mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev, u8 *addr)
+static int mlx5_query_nic_vport_context(struct mlx5_core_dev *mdev, u16 vport,
+   u32 *out, int outlen)
+{
+   u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(query_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+
+   MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+   if (vport)
+   MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+
+   return mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
+}
+
+static int mlx5_modify_nic_vport_context(struct mlx5_core_dev *mdev, void *in,
+int inlen)
+{
+   u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
+
+   MLX5_SET(modify_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+
+   memset(out, 0, sizeof(out));
+   return mlx5_cmd_exec_check_status(mdev, in, inlen, out, sizeof(out));
+}
+
+int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
+u16 vport, u8 *addr)
 {
-   u32  in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
u8 *out_addr;
+   int err;
 
out = mlx5_vzalloc(outlen);
if (!out)
-   return;
+   return -ENOMEM;
 
out_addr = MLX5_ADDR_OF(query_nic_vport_context_out, out,
nic_vport_context.permanent_address);
 
-   memset(in, 0, sizeof(in));
-
-   MLX5_SET(query_nic_vport_context_in, in, opcode,
-MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
-
-   memset(out, 0, outlen);
-   mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
+   err = mlx5_query_nic_vport_context(mdev, vport, out, outlen);
+   if (err)
+   goto out;
 
ether_addr_copy(addr, _addr[2]);
 
+out:
kvfree(out);
+   return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_mac_address);
+
+int mlx5_modify_nic_vport_mac_address(struct mlx5_core_dev *mdev,
+ u16 vport, u8 *addr)
+{
+   void *in;
+   int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+   int err;
+   void *nic_vport_ctx;
+   u8 *perm_mac;
+
+   in = mlx5_vzalloc(inlen);
+   if (!in) {
+   mlx5_core_warn(mdev, "failed to allocate inbox\n");
+   return -ENOMEM;
+   }
+
+   MLX5_SET(modify_nic_vport_context_in, in,
+field_select.permanent_address, 1);
+   MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
+
+   if (vport)
+   MLX5_SET(modify_nic_vport_context_in, in, other_vport, 1);
+
+   nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in,
+in, nic_vport_context);
+   perm_mac = MLX5_ADDR_OF(nic_vport_context, nic_vport_ctx,
+   permanent_address);
+
+   

[PATCH net-next 16/18] net/mlx5: E-Switch, Introduce set vport vlan (VST mode)

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Add query and modify functions to control client vlan and qos
striping or insertion, in E-Switch vports contexts.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 134 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |   2 +
 2 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 590a06c..ea14664 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -128,6 +128,116 @@ ex:
return err;
 }
 
+/* E-Switch vport context HW commands */
+static int query_esw_vport_context_cmd(struct mlx5_core_dev *mdev, u32 vport,
+  u32 *out, int outlen)
+{
+   u32 in[MLX5_ST_SZ_DW(query_esw_vport_context_in)];
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(query_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_QUERY_ESW_VPORT_CONTEXT);
+
+   MLX5_SET(query_esw_vport_context_in, in, vport_number, vport);
+   if (vport)
+   MLX5_SET(query_esw_vport_context_in, in, other_vport, 1);
+
+   return mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
+}
+
+static int query_esw_vport_cvlan(struct mlx5_core_dev *dev, u32 vport,
+u16 *vlan, u8 *qos)
+{
+   u32 out[MLX5_ST_SZ_DW(query_esw_vport_context_out)];
+   int err;
+   bool cvlan_strip;
+   bool cvlan_insert;
+
+   memset(out, 0, sizeof(out));
+
+   *vlan = 0;
+   *qos = 0;
+
+   if (!MLX5_CAP_ESW(dev, vport_cvlan_strip) ||
+   !MLX5_CAP_ESW(dev, vport_cvlan_insert_if_not_exist))
+   return -ENOTSUPP;
+
+   err = query_esw_vport_context_cmd(dev, vport, out, sizeof(out));
+   if (err)
+   goto out;
+
+   cvlan_strip = MLX5_GET(query_esw_vport_context_out, out,
+  esw_vport_context.vport_cvlan_strip);
+
+   cvlan_insert = MLX5_GET(query_esw_vport_context_out, out,
+   esw_vport_context.vport_cvlan_insert);
+
+   if (cvlan_strip || cvlan_insert) {
+   *vlan = MLX5_GET(query_esw_vport_context_out, out,
+esw_vport_context.cvlan_id);
+   *qos = MLX5_GET(query_esw_vport_context_out, out,
+   esw_vport_context.cvlan_pcp);
+   }
+
+   esw_debug(dev, "Query Vport[%d] cvlan: VLAN %d qos=%d\n",
+ vport, *vlan, *qos);
+out:
+   return err;
+}
+
+static int modify_esw_vport_context_cmd(struct mlx5_core_dev *dev, u16 vport,
+   void *in, int inlen)
+{
+   u32 out[MLX5_ST_SZ_DW(modify_esw_vport_context_out)];
+
+   memset(out, 0, sizeof(out));
+
+   MLX5_SET(modify_esw_vport_context_in, in, vport_number, vport);
+   if (vport)
+   MLX5_SET(modify_esw_vport_context_in, in, other_vport, 1);
+
+   MLX5_SET(modify_esw_vport_context_in, in, opcode,
+MLX5_CMD_OP_MODIFY_ESW_VPORT_CONTEXT);
+
+   return mlx5_cmd_exec_check_status(dev, in, inlen,
+ out, sizeof(out));
+}
+
+static int modify_esw_vport_cvlan(struct mlx5_core_dev *dev, u32 vport,
+ u16 vlan, u8 qos, bool set)
+{
+   u32 in[MLX5_ST_SZ_DW(modify_esw_vport_context_in)];
+
+   memset(in, 0, sizeof(in));
+
+   if (!MLX5_CAP_ESW(dev, vport_cvlan_strip) ||
+   !MLX5_CAP_ESW(dev, vport_cvlan_insert_if_not_exist))
+   return -ENOTSUPP;
+
+   esw_debug(dev, "Set Vport[%d] VLAN %d qos %d set=%d\n",
+ vport, vlan, qos, set);
+
+   if (set) {
+   MLX5_SET(modify_esw_vport_context_in, in,
+esw_vport_context.vport_cvlan_strip, 1);
+   /* insert only if no vlan in packet */
+   MLX5_SET(modify_esw_vport_context_in, in,
+esw_vport_context.vport_cvlan_insert, 1);
+   MLX5_SET(modify_esw_vport_context_in, in,
+esw_vport_context.cvlan_pcp, qos);
+   MLX5_SET(modify_esw_vport_context_in, in,
+esw_vport_context.cvlan_id, vlan);
+   }
+
+   MLX5_SET(modify_esw_vport_context_in, in,
+field_select.vport_cvlan_strip, 1);
+   MLX5_SET(modify_esw_vport_context_in, in,
+field_select.vport_cvlan_insert, 1);
+
+   return modify_esw_vport_context_cmd(dev, vport, in, sizeof(in));
+}
+
 /* HW L2 Table (MPFS) management */
 static int set_l2_table_entry_cmd(struct mlx5_core_dev *dev, u32 index,
  u8 *mac, u8 vlan_valid, u16 vlan)
@@ -1065,6 +1175,9 @@ int mlx5_eswitch_set_vport_state(struct 

[PATCH net-next 09/18] net/mlx5e: Write UC/MC list and promisc mode into vport context

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Each Vport/vNIC must notify underlying e-Switch layer
for UC/MC list and promisc mode updates, in-order to update
l2 tables and SR-IOV FDB tables.

We do that at set_rx_mode ndo.

preperation for ethernet-SRIOV and l2 table management.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 .../ethernet/mellanox/mlx5/core/en_flow_table.c| 99 ++
 1 file changed, 99 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
index 22d603f..9a021be 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
@@ -671,6 +671,103 @@ static void mlx5e_sync_netdev_addr(struct mlx5e_priv 
*priv)
netif_addr_unlock_bh(netdev);
 }
 
+/* Returns a pointer to an array of type u8[][ETH_ALEN] */
+static u8 (*mlx5e_build_addr_array(struct mlx5e_priv *priv, int list_type,
+  int *size))[ETH_ALEN]
+{
+   bool is_uc = (list_type == MLX5_NVPRT_LIST_TYPE_UC);
+   struct net_device *ndev = priv->netdev;
+   struct mlx5e_eth_addr_hash_node *hn;
+   struct hlist_head *addr_list;
+   u8 (*addr_array)[ETH_ALEN];
+   struct hlist_node *tmp;
+   int max_list_size;
+   int list_size;
+   int hi;
+   int i;
+
+   list_size = is_uc ? 0 : (priv->eth_addr.broadcast_enabled ? 1 : 0);
+   max_list_size = is_uc ?
+   1 << MLX5_CAP_GEN(priv->mdev, log_max_current_uc_list) :
+   1 << MLX5_CAP_GEN(priv->mdev, log_max_current_mc_list);
+
+   addr_list = is_uc ? priv->eth_addr.netdev_uc : priv->eth_addr.netdev_mc;
+   mlx5e_for_each_hash_node(hn, tmp, addr_list, hi)
+   list_size++;
+
+   if (list_size > max_list_size) {
+   netdev_warn(ndev,
+   "netdev %s list size (%d) > (%d) max vport list 
size, some addresses will be dropped\n",
+   is_uc ? "UC" : "MC", list_size, max_list_size);
+   list_size = max_list_size;
+   }
+
+   addr_array = kcalloc(list_size, ETH_ALEN, GFP_KERNEL);
+   if (!addr_array)
+   return NULL;
+
+   i = 0;
+   if (is_uc) { /* Make sure our own address is pushed first */
+   mlx5e_for_each_hash_node(hn, tmp, addr_list, hi) {
+   if (ether_addr_equal(ndev->dev_addr, hn->ai.addr)) {
+   ether_addr_copy(addr_array[i++], 
ndev->dev_addr);
+   break;
+   }
+   }
+   }
+
+   if (!is_uc && priv->eth_addr.broadcast_enabled)
+   ether_addr_copy(addr_array[i++], ndev->broadcast);
+
+   mlx5e_for_each_hash_node(hn, tmp, addr_list, hi) {
+   if (ether_addr_equal(ndev->dev_addr, hn->ai.addr))
+   continue;
+   if (i >= list_size)
+   break;
+   ether_addr_copy(addr_array[i++], hn->ai.addr);
+   }
+
+   *size = list_size;
+   return addr_array;
+}
+
+static void mlx5e_vport_context_update_addr_list(struct mlx5e_priv *priv,
+int list_type)
+{
+   bool is_uc = (list_type == MLX5_NVPRT_LIST_TYPE_UC);
+   u8 (*mac_list)[ETH_ALEN];
+   int list_size;
+   int err;
+
+   mac_list = mlx5e_build_addr_array(priv, list_type, _size);
+   if (!mac_list) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   err = mlx5_modify_nic_vport_mac_list(priv->mdev,
+list_type,
+mac_list,
+list_size);
+out:
+   if (err)
+   netdev_err(priv->netdev,
+  "Failed to modify vport %s list err(%d)\n",
+  is_uc ? "UC" : "MC", err);
+   kfree(mac_list);
+}
+
+static void mlx5e_vport_context_update(struct mlx5e_priv *priv)
+{
+   struct mlx5e_eth_addr_db *ea = >eth_addr;
+
+   mlx5e_vport_context_update_addr_list(priv, MLX5_NVPRT_LIST_TYPE_UC);
+   mlx5e_vport_context_update_addr_list(priv, MLX5_NVPRT_LIST_TYPE_MC);
+   mlx5_modify_nic_vport_promisc(priv->mdev, 0,
+ ea->allmulti_enabled,
+ ea->promisc_enabled);
+}
+
 static void mlx5e_apply_netdev_addr(struct mlx5e_priv *priv)
 {
struct mlx5e_eth_addr_hash_node *hn;
@@ -748,6 +845,8 @@ void mlx5e_set_rx_mode_work(struct work_struct *work)
ea->promisc_enabled   = promisc_enabled;
ea->allmulti_enabled  = allmulti_enabled;
ea->broadcast_enabled = broadcast_enabled;
+
+   mlx5e_vport_context_update(priv);
 }
 
 void mlx5e_init_eth_addr(struct mlx5e_priv *priv)
-- 
2.3.7

--
To 

[PATCH net-next 08/18] net/mlx5: Introduce access functions to modify/query vport vlans

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Those functions are needed to notify the upcoming L2 table and SR-IOV
E-Switch(FDB) manager(PF), of the NIC vport (vf) vlan table changes.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 112 
 include/linux/mlx5/mlx5_ifc.h   |   7 ++
 include/linux/mlx5/vport.h  |   7 ++
 3 files changed, 126 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 68aa51d..076197e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -318,6 +318,118 @@ int mlx5_modify_nic_vport_mac_list(struct mlx5_core_dev 
*dev,
 }
 EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_mac_list);
 
+int mlx5_query_nic_vport_vlans(struct mlx5_core_dev *dev,
+  u32 vport,
+  u16 vlans[],
+  int *size)
+{
+   u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
+   void *nic_vport_ctx;
+   int req_list_size;
+   int max_list_size;
+   int out_sz;
+   void *out;
+   int err;
+   int i;
+
+   req_list_size = *size;
+   max_list_size = 1 << MLX5_CAP_GEN(dev, log_max_vlan_list);
+   if (req_list_size > max_list_size) {
+   mlx5_core_warn(dev, "Requested list size (%d) > (%d) max list 
size\n",
+  req_list_size, max_list_size);
+   req_list_size = max_list_size;
+   }
+
+   out_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+   req_list_size * MLX5_ST_SZ_BYTES(vlan_layout);
+
+   memset(in, 0, sizeof(in));
+   out = kzalloc(out_sz, GFP_KERNEL);
+   if (!out)
+   return -ENOMEM;
+
+   MLX5_SET(query_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+   MLX5_SET(query_nic_vport_context_in, in, allowed_list_type,
+MLX5_NVPRT_LIST_TYPE_VLAN);
+   MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+
+   if (vport)
+   MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+
+   err = mlx5_cmd_exec_check_status(dev, in, sizeof(in), out, out_sz);
+   if (err)
+   goto out;
+
+   nic_vport_ctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+nic_vport_context);
+   req_list_size = MLX5_GET(nic_vport_context, nic_vport_ctx,
+allowed_list_size);
+
+   *size = req_list_size;
+   for (i = 0; i < req_list_size; i++) {
+   void *vlan_addr = MLX5_ADDR_OF(nic_vport_context,
+  nic_vport_ctx,
+  current_uc_mac_address[i]);
+   vlans[i] = MLX5_GET(vlan_layout, vlan_addr, vlan);
+   }
+out:
+   kfree(out);
+   return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_vlans);
+
+int mlx5_modify_nic_vport_vlans(struct mlx5_core_dev *dev,
+   u16 vlans[],
+   int list_size)
+{
+   u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
+   void *nic_vport_ctx;
+   int max_list_size;
+   int in_sz;
+   void *in;
+   int err;
+   int i;
+
+   max_list_size = 1 << MLX5_CAP_GEN(dev, log_max_vlan_list);
+
+   if (list_size > max_list_size)
+   return -ENOSPC;
+
+   in_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+   list_size * MLX5_ST_SZ_BYTES(vlan_layout);
+
+   memset(out, 0, sizeof(out));
+   in = kzalloc(in_sz, GFP_KERNEL);
+   if (!in)
+   return -ENOMEM;
+
+   MLX5_SET(modify_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+   MLX5_SET(modify_nic_vport_context_in, in,
+field_select.addresses_list, 1);
+
+   nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in,
+nic_vport_context);
+
+   MLX5_SET(nic_vport_context, nic_vport_ctx,
+allowed_list_type, MLX5_NVPRT_LIST_TYPE_VLAN);
+   MLX5_SET(nic_vport_context, nic_vport_ctx,
+allowed_list_size, list_size);
+
+   for (i = 0; i < list_size; i++) {
+   void *vlan_addr = MLX5_ADDR_OF(nic_vport_context,
+  nic_vport_ctx,
+  current_uc_mac_address[i]);
+   MLX5_SET(vlan_layout, vlan_addr, vlan, vlans[i]);
+   }
+
+   err = mlx5_cmd_exec_check_status(dev, in, in_sz, out, sizeof(out));
+   kfree(in);
+   return err;
+}

[PATCH net-next 10/18] net/mlx5e: Write vlan list into vport context

2015-11-23 Thread Or Gerlitz
From: Saeed Mahameed 

Each Vport/vNIC must notify underlying e-Switch layer
for vlan table changes in-order to update SR-IOV FDB tables.

We do that at vlan_rx_add_vid and vlan_rx_kill_vid ndos.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  1 +
 .../ethernet/mellanox/mlx5/core/en_flow_table.c| 49 ++
 2 files changed, 50 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 69f1c1a..89313d4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -465,6 +465,7 @@ enum {
 };
 
 struct mlx5e_vlan_db {
+   unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
u32   active_vlans_ft_ix[VLAN_N_VID];
u32   untagged_rule_ft_ix;
u32   any_vlan_rule_ft_ix;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
index 9a021be..3c0cf22 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
@@ -502,6 +502,46 @@ add_eth_addr_rule_out:
return err;
 }
 
+static int mlx5e_vport_context_update_vlans(struct mlx5e_priv *priv)
+{
+   struct net_device *ndev = priv->netdev;
+   int max_list_size;
+   int list_size;
+   u16 *vlans;
+   int vlan;
+   int err;
+   int i;
+
+   list_size = 0;
+   for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
+   list_size++;
+
+   max_list_size = 1 << MLX5_CAP_GEN(priv->mdev, log_max_vlan_list);
+
+   if (list_size > max_list_size) {
+   netdev_warn(ndev,
+   "netdev vlans list size (%d) > (%d) max vport list 
size, some vlans will be dropped\n",
+   list_size, max_list_size);
+   list_size = max_list_size;
+   }
+
+   vlans = kcalloc(list_size, sizeof(*vlans), GFP_KERNEL);
+   if (!vlans)
+   return -ENOMEM;
+
+   i = 0;
+   for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
+   vlans[i++] = vlan;
+
+   err = mlx5_modify_nic_vport_vlans(priv->mdev, vlans, list_size);
+   if (err)
+   netdev_err(ndev, "Failed to modify vport vlans list err(%d)\n",
+  err);
+
+   kfree(vlans);
+   return err;
+}
+
 enum mlx5e_vlan_rule_type {
MLX5E_VLAN_RULE_TYPE_UNTAGGED,
MLX5E_VLAN_RULE_TYPE_ANY_VID,
@@ -552,6 +592,10 @@ static int mlx5e_add_vlan_rule(struct mlx5e_priv *priv,
 1);
break;
default: /* MLX5E_VLAN_RULE_TYPE_MATCH_VID */
+   err = mlx5e_vport_context_update_vlans(priv);
+   if (err)
+   goto add_vlan_rule_out;
+
ft_ix = >vlan.active_vlans_ft_ix[vid];
MLX5_SET(fte_match_param, match_value, outer_headers.vlan_tag,
 1);
@@ -588,6 +632,7 @@ static void mlx5e_del_vlan_rule(struct mlx5e_priv *priv,
case MLX5E_VLAN_RULE_TYPE_MATCH_VID:
mlx5_del_flow_table_entry(priv->ft.vlan,
  priv->vlan.active_vlans_ft_ix[vid]);
+   mlx5e_vport_context_update_vlans(priv);
break;
}
 }
@@ -619,6 +664,8 @@ int mlx5e_vlan_rx_add_vid(struct net_device *dev, 
__always_unused __be16 proto,
 {
struct mlx5e_priv *priv = netdev_priv(dev);
 
+   set_bit(vid, priv->vlan.active_vlans);
+
return mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
 }
 
@@ -627,6 +674,8 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, 
__always_unused __be16 proto,
 {
struct mlx5e_priv *priv = netdev_priv(dev);
 
+   clear_bit(vid, priv->vlan.active_vlans);
+
mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
 
return 0;
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network card doesn't recovered itself after a SYN flooding attack

2015-11-23 Thread Daniel Borkmann

[ cc'ing netdev and r8169 folks ]

On 11/22/2015 11:51 AM, Toralf Förster wrote:

At 22th of November at 21:26 UTC my server (64 bit stable Gentoo hardened) 
suffered from a DDoS attack.

 From the kern.log:


Nov 20 22:26:29 tor-relay kernel: [2431358.124515] TCP: request_sock_TCP: 
Possible SYN flooding on port 80. Sending cookies.  Check SNMP counters.
Nov 20 22:26:48 tor-relay kernel: [2431377.216133] [ cut here 
]
Nov 20 22:26:48 tor-relay kernel: [2431377.216141] WARNING: CPU: 7 PID: 12421 
at net/sched/sch_generic.c:303 dev_watchdog+0x272/0x280()
Nov 20 22:26:48 tor-relay kernel: [2431377.216143] NETDEV WATCHDOG: enp3s0 
(r8169): transmit queue 0 timed out
Nov 20 22:26:48 tor-relay kernel: [2431377.216145] Modules linked in:
Nov 20 22:26:48 tor-relay kernel: [2431377.216148]  af_packet nf_log_ipv6 
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 
nf_log_common xt_LOG xt_multiport nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack iptable_filter ip_tables i2c_i801 i2c_core tpm_tis tpm thermal 
processor battery atkbd x86_pkg_temp_thermal button microcode fan
Nov 20 22:26:48 tor-relay kernel: [2431377.216173] CPU: 7 PID: 12421 Comm: 
emerge Not tainted 4.1.7-hardened-r1 #1
Nov 20 22:26:48 tor-relay kernel: [2431377.216174] Hardware name: System 
manufacturer System Product Name/P8H77-M PRO, BIOS 0922 09/10/2012
Nov 20 22:26:48 tor-relay kernel: [2431377.216176]  994fa966 
 99bced09 88041fbc3d18
Nov 20 22:26:48 tor-relay kernel: [2431377.216179]  99983e26 
 88041fbc3d68 88041fbc3d58
Nov 20 22:26:48 tor-relay kernel: [2431377.216182]  9947f08a 
88041fbc3d48 99bced09 012f
Nov 20 22:26:48 tor-relay kernel: [2431377.216185] Call Trace:
Nov 20 22:26:48 tor-relay kernel: [2431377.216187][] ? 
print_modules+0x76/0xe0
Nov 20 22:26:48 tor-relay kernel: [2431377.216198]  [] dump_stack+0x45/0x5d
Nov 20 22:26:48 tor-relay kernel: [2431377.216203]  [] 
warn_slowpath_common+0x8a/0xd0
Nov 20 22:26:48 tor-relay kernel: [2431377.216205]  [] 
warn_slowpath_fmt+0x5a/0x70
Nov 20 22:26:48 tor-relay kernel: [2431377.216210]  [] ? 
task_tick_fair+0x2a8/0x760
Nov 20 22:26:48 tor-relay kernel: [2431377.216213]  [] dev_watchdog+0x272/0x280
Nov 20 22:26:48 tor-relay kernel: [2431377.216216]  [] ? 
dev_deactivate_queue+0x70/0x70
Nov 20 22:26:48 tor-relay kernel: [2431377.216219]  [] call_timer_fn+0x47/0x140
Nov 20 22:26:48 tor-relay kernel: [2431377.216222]  [] 
run_timer_softirq+0x291/0x450
Nov 20 22:26:48 tor-relay kernel: [2431377.216224]  [] ? 
dev_deactivate_queue+0x70/0x70
Nov 20 22:26:48 tor-relay kernel: [2431377.216228]  [] __do_softirq+0xf8/0x290
Nov 20 22:26:48 tor-relay kernel: [2431377.216230]  [] irq_exit+0x9d/0xb0
Nov 20 22:26:48 tor-relay kernel: [2431377.216235]  [] 
smp_apic_timer_interrupt+0x55/0x70
Nov 20 22:26:48 tor-relay kernel: [2431377.216237]  [] 
apic_timer_interrupt+0x97/0xa0
Nov 20 22:26:48 tor-relay kernel: [2431377.216239]
Nov 20 22:26:48 tor-relay kernel: [2431377.216241] ---[ end trace 
93431a9382c0a11a ]---
Nov 20 22:26:48 tor-relay kernel: [2431377.237826] r8169 :03:00.0 enp3s0: 
link up
Nov 20 22:28:18 tor-relay kernel: [2431467.175659] r8169 :03:00.0 enp3s0: 
link up
Nov 20 22:28:30 tor-relay kernel: [2431479.172562] r8169 :03:00.0 enp3s0: 
link up
Nov 20 22:28:42 tor-relay kernel: [2431491.164472] r8169 :03:00.0 enp3s0: 
link up
Nov 20 22:28:54 tor-relay kernel: [2431503.170416] r8169 :03:00.0 enp3s0: 
link up
Nov 20 22:29:06 tor-relay kernel: [2431515.148333] r8169 :03:00.0 enp3s0: 
link up
Nov 20 22:29:18 tor-relay kernel: [2431527.143293] r8169 :03:00.0 enp3s0: 
link up
Nov 20 22:29:30 tor-relay kernel: [2431539.142164] r8169 :03:00.0 enp3s0: 
link up
Nov 20 22:29:42 tor-relay kernel: [2431551.124104] r8169 :03:00.0 enp3s0: 
link up
...
Nov 22 10:56:24 tor-relay kernel: [2562675.624512] r8169 :03:00.0 enp3s0: 
link up



The last line repeated and the network was down till I initiated a hardware 
reset.

It looks for me that the attack turned the network card into a state from which 
it couldn't recovered itself, or ?
Anything what I should change here at the system to avoid such a hang ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller

2015-11-23 Thread Vladimir Davydov
On Fri, Nov 20, 2015 at 02:25:06PM -0500, Johannes Weiner wrote:
> On Fri, Nov 20, 2015 at 04:10:33PM +0300, Vladimir Davydov wrote:
> > On Thu, Nov 12, 2015 at 06:41:32PM -0500, Johannes Weiner wrote:
> > ...
> > > @@ -5514,16 +5550,43 @@ void sock_release_memcg(struct sock *sk)
> > >   */
> > >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int 
> > > nr_pages)
> > >  {
> > > + unsigned int batch = max(CHARGE_BATCH, nr_pages);
> > >   struct page_counter *counter;
> > > + bool force = false;
> > >  
> > > - if (page_counter_try_charge(>tcp_mem.memory_allocated,
> > > - nr_pages, )) {
> > > - memcg->tcp_mem.memory_pressure = 0;
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > + if (page_counter_try_charge(>tcp_mem.memory_allocated,
> > > + nr_pages, )) {
> > > + memcg->tcp_mem.memory_pressure = 0;
> > > + return true;
> > > + }
> > > + page_counter_charge(>tcp_mem.memory_allocated, nr_pages);
> > > + memcg->tcp_mem.memory_pressure = 1;
> > > + return false;
> > > + }
> > > +#endif
> > > + if (consume_stock(memcg, nr_pages))
> > >   return true;
> > > +retry:
> > > + if (page_counter_try_charge(>memory, batch, ))
> > > + goto done;
> > > +
> > > + if (batch > nr_pages) {
> > > + batch = nr_pages;
> > > + goto retry;
> > >   }
> > > - page_counter_charge(>tcp_mem.memory_allocated, nr_pages);
> > > - memcg->tcp_mem.memory_pressure = 1;
> > > - return false;
> > > +
> > > + page_counter_charge(>memory, batch);
> > > + force = true;
> > > +done:
> > 
> > > + css_get_many(>css, batch);
> > 
> > Is there any point to get css reference per each charged page? For kmem
> > it is absolutely necessary, because dangling slabs must block
> > destruction of memcg's kmem caches, which are destroyed on css_free. But
> > for sockets there's no such problem: memcg will be destroyed only after
> > all sockets are destroyed and therefore uncharged (since
> > sock_update_memcg pins css).
> 
> I'm afraid we have to when we want to share 'stock' with cache and
> anon pages, which hold individual references. drain_stock() always
> assumes one reference per cached page.

Missed that, you're right.

> 
> > > + if (batch > nr_pages)
> > > + refill_stock(memcg, batch - nr_pages);
> > > +
> > > + schedule_work(>socket_work);
> > 
> > I think it's suboptimal to schedule the work even if we are below the
> > high threshold.
> 
> Hm, it seemed unnecessary to duplicate the hierarchy check since this
> is in the batch-exhausted slowpath anyway.

Dunno, may be you're right.

I've another question regarding this socket_work: its reclaim target
always equals CHARGE_BATCH. Can't it result in a workload exceeding
memory.high in case there are a lot of allocations coming from different
cpus? In this case the work might not manage to complete before another
allocation happens. May be, we should accumulate the number of pages to
be reclaimed by the work, as we do in try_charge?

> 
> > BTW why do we need this work at all? Why is reclaim_high called from
> > task_work not enough?
> 
> The problem lies in the memcg association: the random task that gets
> interrupted by an arriving packet might not be in the same memcg as
> the one owning receiving socket. And multiple interrupts could happen
> while we're in the kernel already charging pages. We'd basically have
> to maintain a list of memcgs that need to run reclaim_high associated
> with current.
> 

Right, I think this is worth placing in a comment to memcg->socket_work.
I wonder if we could use it *instead* of task_work for handling every
allocation, not only socket-related. Would it make any sense? May be, it
could reduce the latency experienced by tasks in memory cgroups.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/6] net: Add MSG_BATCH flag

2015-11-23 Thread Hannes Frederic Sowa
Hello,

On Fri, Nov 20, 2015, at 22:21, Tom Herbert wrote:
> Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to
> indicate that more messages will follow (i.e. a batch of messages is
> being sent). This is similar to MSG_MORE except that the following
> messages are not merged into one packet, they are sent individually.
> 
> MSG_BATCH is a performance optimization in cases where a socket
> implementation can benefit by transmitting packets in a batch.
> 
> This patch also updates sendmmsg so that each contained message except
> for the last one is marked as MSG_BATCH.

This flag is only used for KCM because it does not make sense to expose
it to user space? As such, could this be made more clear? I don't see
such an optimization being needed for UDP or TCP.

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: yet another uninterruptable hang in sendfile

2015-11-23 Thread Jan Kara
  Hello,

On Sat 21-11-15 14:24:45, Dmitry Vyukov wrote:
> On commit 8005c49d9aea74d382f474ce11afbbc7d7130bec (Nov 15).
> 
> The program is:
> 
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include 
> #include 
> #include 
> 
> int main()
> {
> long r0 = syscall(SYS_socket, 0x10ul, 0x2ul, 0x0ul, 0, 0, 0);
> long r1 = syscall(SYS_mmap, 0x2000ul, 0x1000ul, 0x3ul,
> 0x32ul, 0xul, 0x0ul);
> long r2 = syscall(SYS_mmap, 0x20001000ul, 0x1000ul, 0x3ul,
> 0x32ul, 0xul, 0x0ul);
> *(uint64_t*)0x2000153f = 0x20001f99;
> *(uint64_t*)0x20001547 = 0x67;
> *(uint64_t*)0x2000154f = 0x20001fa5;
> *(uint64_t*)0x20001557 = 0x5b;
> *(uint64_t*)0x2000155f = 0x20001000;
> *(uint64_t*)0x20001567 = 0x6;
> long r9 = syscall(SYS_readv, r0, 0x2000153ful, 0x3ul, 0, 0, 0);
> long r10 = syscall(SYS_mmap, 0x20002000ul, 0x1000ul, 0x3ul,
> 0x32ul, 0xul, 0x0ul);
> memcpy((void*)0x20002000, "\x65\x74\x68\x31\x00", 5);
> long r12 = syscall(SYS_memfd_create, 0x20002000ul, 0x1ul, 0, 0, 0, 0);
> long r13 = syscall(SYS_fallocate, r12, 0x0ul, 0x5616e07ul, 0x1ul, 0, 
> 0);
> memcpy((void*)0x2da2,
> "\x02\xbe\x98\x59\x88\xb1\x7b\xfd\xe6\x27\x95\xdc\x18\x4e\x04\x87\x28\x1a\xd0\x30\x52\xcd\xa5\xee\x09\x7f\xfa\x7a\x9b\x72\x17\xfa\x2a\xa1\xe1\x60\x09\xbb\xaf\xdd\x0b\x5c\xa8\x18\x81\x4b\x6d\x42\x11\x20\x4a\xd7\x9e\x86\x8b\x63\xd2\x36\xbf\x5f\xb0\x36\x13\x82\x79\xc8\x31\x3b\x3b\x1e",
> 70);
> memcpy((void*)0x28b7,
> "\x0a\x00\x33\xe8\x3d\xe7\x4a\xcc\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xbf\xce\xa1\x60",
> 28);
> long r16 = syscall(SYS_sendto, r0, 0x2da2ul, 0x46ul,
> 0x8000ul, 0x28b7ul, 0x1cul);
> long r17 = syscall(SYS_sendfile, r0, r12, 0x2000ul,
> 0x4785d2c1ul, 0, 0);
> return 0;
> }
>
> 
> It hangs in unkillable state. It is probably similar issue to the
> other reported issues related to sendfile:
> https://groups.google.com/forum/#!topic/syzkaller/zfuHHRXL7Zg
> https://groups.google.com/forum/#!topic/syzkaller/sjA9DrBQviw

For me this hangs interruptibly in readv(2), when I remove that call, it
finishes under a second so I cannot easily test the problem gets fixed by
my patch as well (although AFAIU what the test does it should). Can you
please test the patch in your setup? I'll send it shortly.

> However this one also blankets dmesg with zillions of:
> 
> [ 1682.801412] SELinux: unrecognized netlink message: protocol=0
> nlmsg_type=0 sclass=netlink_route_socket
> [ 1682.803565] SELinux: unrecognized netlink message: protocol=0
> nlmsg_type=0 sclass=netlink_route_socket
> [ 1682.804991] SELinux: unrecognized netlink message: protocol=0
> nlmsg_type=0 sclass=netlink_route_socket
> 
> The program should be killable.

I don't have SELinux configured so that may be what's making a difference.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libertas: fix possible NULL dereference

2015-11-23 Thread Sudip Mukherjee
We were dereferencing cmd first and checking for NULL later. Lets first
check for NULL.

Signed-off-by: Sudip Mukherjee 
---
 drivers/net/wireless/marvell/libertas/cfg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/libertas/cfg.c 
b/drivers/net/wireless/marvell/libertas/cfg.c
index 8317afd..e38ad1d 100644
--- a/drivers/net/wireless/marvell/libertas/cfg.c
+++ b/drivers/net/wireless/marvell/libertas/cfg.c
@@ -1108,7 +1108,7 @@ static int lbs_associate(struct lbs_private *priv,
size_t len, resp_ie_len;
int status;
int ret;
-   u8 *pos = &(cmd->iebuf[0]);
+   u8 *pos;
u8 *tmp;
 
lbs_deb_enter(LBS_DEB_CFG80211);
@@ -1117,6 +1117,7 @@ static int lbs_associate(struct lbs_private *priv,
ret = -ENOMEM;
goto done;
}
+   pos = >iebuf[0];
 
/*
 * cmd  50 00
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] netfilter: implement xt_cgroup cgroup2 path match

2015-11-23 Thread Daniel Wagner
Hi Tejun,

On 11/21/2015 05:14 PM, Tejun Heo wrote:> +static int
> cgroup_mt_check_v1(const struct xt_mtchk_param *par)
> +{
> + struct xt_cgroup_info_v1 *info = par->matchinfo;
> + struct cgroup *cgrp;
> +
> + if ((info->invert_path & ~1) || (info->invert_classid & ~1))
> + return -EINVAL;

The checks below use pr_info() in case the configuration is not valid.
Is this missing here on purpose?

I have tested it slightly and it seems to work (also on an older
kernel). I don't know if that qualifies it for a Tested-by but at least
Acked-by should do the trick:

Tested-by: Daniel Wagner 
Acked-by: Daniel Wagner 

cheers,
daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)

2015-11-23 Thread Sowmini Varadhan
On (11/23/15 10:53), Hannes Frederic Sowa wrote:
> > 
> >  - Integration with TLS (TLS-in-kernel is a separate initiative).
> 
> This is interesting:
> 
> Regarding the last week's discussion about better OOB support in TCP
> e.g. for SOCKET_DESTROY, do you already have a plan to handle TLS alerts
> and do CHANGE_CIPHER on the socket synchronously?

I have had that same question too. In fact I pointed this out already
in the thread at http://permalink.gmane.org/gmane.linux.network/382278

In addition to CCS, TLS does other complex things such as mid-session
regeneration of new session keys based on the master-secret. If you
move TLS to the kernel, there may be a lot of 
synchronicity/security/inter-op issues to resolve.

Perhaps it's not a good idea to use "TLS" on the TCP socket, but let
each kcm application negotiate a crypto key (in any manner that it wants) 
and set it on the PF_KCM socket, then use that key to encrypt application
data just before passing it off to tcp. (Of course, then you have to deal 
with the fact that BPF still needs to get to the clear data somehow)

--Sowmini


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/9] netfilter: prepare xt_cgroup for multi revisions

2015-11-23 Thread Daniel Wagner
Hi Tejun,

On 11/21/2015 05:14 PM, Tejun Heo wrote:
> xt_cgroup will grow cgroup2 path based match.  Postfix existing
> symbols with _v0 and prepare for multi revision registration.
> 
> Signed-off-by: Tejun Heo 
> Cc: Daniel Borkmann 
> Cc: Daniel Wagner 

Same as in my reply to patch #9 (yes, I know do it wrong order...
thought can't stop now... :))

Tested-by: Daniel Wagner 
Acked-by: Daniel Wagner 

cheers,
daniel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Asterisk deadlocks since Kernel 4.1

2015-11-23 Thread Stefan Priebe - Profihost AG
Am 19.11.2015 um 20:51 schrieb Stefan Priebe:
> 
> Am 19.11.2015 um 14:19 schrieb Florian Weimer:
>> On 11/19/2015 01:46 PM, Stefan Priebe - Profihost AG wrote:
>>
>>> I can try Kernel 4.4-rc1 next week. Or something else?
>>
>> I found this bug report which indicates that 4.1.10 works:
>>
>>
>>
>> But in your original report, you said that 4.1.13 is broken.
> 
> That's correct i'm running 4.1.13.
> 
>> This backtrace:
>>
>>
>>
>> shows a lot of waiting on quite different netlink sockets.  So if this
>> is due to a race in Asterisk, it must have happened several times in a
>> row.

Kernel 4.4-rc2 works fine. How can we grab / get an idea which is
causing the isse in 4.1? It's an LTE kernel so it should be fixed!

Stefan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Asterisk deadlocks since Kernel 4.1

2015-11-23 Thread Hannes Frederic Sowa
On Mon, Nov 23, 2015, at 13:44, Stefan Priebe - Profihost AG wrote:
> Am 19.11.2015 um 20:51 schrieb Stefan Priebe:
> > 
> > Am 19.11.2015 um 14:19 schrieb Florian Weimer:
> >> On 11/19/2015 01:46 PM, Stefan Priebe - Profihost AG wrote:
> >>
> >>> I can try Kernel 4.4-rc1 next week. Or something else?
> >>
> >> I found this bug report which indicates that 4.1.10 works:
> >>
> >>
> >>
> >> But in your original report, you said that 4.1.13 is broken.
> > 
> > That's correct i'm running 4.1.13.
> > 
> >> This backtrace:
> >>
> >>
> >>
> >> shows a lot of waiting on quite different netlink sockets.  So if this
> >> is due to a race in Asterisk, it must have happened several times in a
> >> row.
> 
> Kernel 4.4-rc2 works fine. How can we grab / get an idea which is
> causing the isse in 4.1? It's an LTE kernel so it should be fixed!

Thanks for testing. I was not able to reproduce it at all, with as much
parallelism and threads as possible on any kernel. Could you try to do a
git bisect?

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 net-next 4/5] net:hns: Add support of ethtool TSO set option for Hip06 in HNS

2015-11-23 Thread Salil Mehta



On 20/11/15 14:07, Sergei Shtylyov wrote:

On 11/19/2015 11:58 PM, Salil Mehta wrote:


From: Salil 

This patch adds the support of ethtool TSO option to V1 patch,
meant to add support of Hip06 SoC to HNS

Signed-off-by: Salil Mehta 
Signed-off-by: lisheng 
---
  drivers/net/ethernet/hisilicon/hns/hns_enet.c |   47 
+

  1 file changed, 47 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c

index 055e14c..a0763ab 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1386,6 +1386,51 @@ static int hns_nic_change_mtu(struct 
net_device *ndev, int new_mtu)

  return ret;
  }

+static int hns_nic_set_features(struct net_device *netdev,
+netdev_features_t features)
+{
+struct hns_nic_priv *priv = netdev_priv(netdev);
+struct hnae_handle *h = priv->ae_handle;
+
+switch (priv->enet_ver) {
+case AE_VERSION_1:
+if ((features & NETIF_F_TSO) || (features & NETIF_F_TSO6))


if ((features & (NETIF_F_TSO| NETIF_F_TSO6))

Thanks. changed in V4 PATCH.



+netdev_info(netdev, "enet v1 do not support tso!\n");
+break;


   The *break* should have the same indentation level as *if*.

Thanks for pointing out. changed in V4 PATCH.



+default:
+if ((features & NETIF_F_TSO) || (features & NETIF_F_TSO6)) {


if ((features & (NETIF_F_TSO| NETIF_F_TSO6))


+priv->ops.fill_desc = fill_tso_desc;
+priv->ops.maybe_stop_tx = hns_nic_maybe_stop_tso;
+/* The chip only support 7*4096 */
+netif_set_gso_max_size(netdev, 7 * 4096);
+h->dev->ops->set_tso_stats(h, 1);
+} else {
+priv->ops.fill_desc = fill_v2_desc;
+priv->ops.maybe_stop_tx = hns_nic_maybe_stop_tx;
+h->dev->ops->set_tso_stats(h, 0);
+}
+break;


   Same here.

Thanks for pointing out. changed in V4 PATCH.



+}
+netdev->features = features;
+return 0;
+}
+
+static netdev_features_t hns_nic_fix_features(
+struct net_device *netdev, netdev_features_t features)
+{
+struct hns_nic_priv *priv = netdev_priv(netdev);
+
+switch (priv->enet_ver) {
+case AE_VERSION_1:
+features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
+NETIF_F_HW_VLAN_CTAG_FILTER);
+break;
+default:
+break;
+}


   Here it's indented correctly.

Got it, thanks!



+return features;
+}
+
  /**
   * nic_set_multicast_list - set mutl mac address
   * @netdev: net device

[...]

MBR, Sergei



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use-after-free in ppoll

2015-11-23 Thread Dmitry Vyukov
On Sun, Nov 22, 2015 at 7:46 PM, Rainer Weikusat
 wrote:
> Dmitry Vyukov  writes:
>> On Sun, Nov 22, 2015 at 3:32 PM, Rainer Weikusat
>>  wrote:
>>> Dmitry Vyukov  writes:
 Hello,

 On commit f2d10565b9bdbb722bd43e6e1a759eeddb9645c8 (Nov 20).

 The following program triggers use-after-free:

 // autogenerated by syzkaller (http://github.com/google/syzkaller)
 #include 
 #include 
 #include 
 #include 

 void *thread(void *p)
 {
 syscall(SYS_write, (long)p, 0x2000278ful, 0x1ul, 0, 0, 0);
 return 0;
 }
>>>
>>> [...]
>>>
>>>
 long r1 = syscall(SYS_socketpair, 0x1ul, 0x3ul, 0x0ul,
>>>
>>> [...]
>>>
 long r5 = syscall(SYS_close, r2, 0, 0, 0, 0, 0);
 pthread_t th;
 pthread_create(, 0, thread, (void*)(long)r3);
>>>
>>> [...]
>>>
 long r21 = syscall(SYS_ppoll, 0x2ffful, 0x3ul, 0x2ffcul, 
 0x2ffdul, 0x8ul, 0);
 return 0;
 }
>>>
>>> That's one of the already known sequences for triggering this issue:
>
> [...]
>
>> I have not read the code. But I just want to point out that all 3
>> reports are different. For example, in the first one, ppoll both frees
>> the object and then accesses it. That is, it is not write that frees
>> the object.
>
> The call trace is always the same:
>
> [ 2672.994366]  [] __asan_load4+0x6a/0x70
> [ 2672.994366]  [] do_raw_spin_lock+0x22/0x220
> [ 2672.994366]  [] _raw_spin_lock_irqsave+0x51/0x60
> [ 2672.994366]  [] remove_wait_queue+0x18/0x80
> [ 2672.994366]  [] poll_freewait+0x7b/0x130
> [ 2672.994366]  [] do_sys_poll+0x4dc/0x860
> [ 2672.994366]  [] SyS_ppoll+0x1a9/0x310
>
> And if you look at the poll implementation, the important part is this
> (fs/ select.c, do_sys_poll)
>
> fdcount = do_poll(nfds, head, , end_time);
> poll_freewait();
>
> do_poll calls the poll routine of the file descriptors which cause
> "enqueuing of something" via poll wait callback. For poll, that's the
> __pollwait routine in select.c:
>
> static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
> poll_table *p)
> {
> struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt);
> struct poll_table_entry *entry = poll_get_entry(pwq);
> if (!entry)
> return;
> entry->filp = get_file(filp);
> entry->wait_address = wait_address;
> entry->key = p->_key;
> init_waitqueue_func_entry(>wait, pollwake);
> entry->wait.private = pwq;
> add_wait_queue(wait_address, >wait);
> }
>
> because of the close, this routine will be called with the peer_wait
> wait_queue_head of the non-closed socket of the socket pair as
> wait_address argument. And poll_freewait calls free_poll_entry for all
> entries on the poll table which is
>
> static void free_poll_entry(struct poll_table_entry *entry)
> {
> remove_wait_queue(entry->wait_address, >wait);
> fput(entry->filp);
> }
>
> but by this time, the wait_address points to freed memory because the
> only thing which kept the socket it belonged to alive after the
> corresponding file descriptor was closed was the reference the other
> socket held. But that was dropped by unix_dgram_sendmsg upon detecting a
> dead peer.


Hi Rainer,

I am not questioning your conclusions. You definitely know better.

Btw, how close are you to a fix that everybody is happy with? I hit
this use-after-free very frequently.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] sock, cgroup: add sock->sk_cgroup

2015-11-23 Thread Daniel Wagner
Hi Tejun,

On 11/21/2015 05:13 PM, Tejun Heo wrote:
> Signed-off-by: Tejun Heo 
> Cc: Daniel Borkmann 
> Cc: Daniel Wagner 

I did a quick test and for new connection the cgroup2 match worked as
expected. For an existing connection I wasn't able to trigger the match.

It is quite likely I do something wrong:

ssh into the box
# mkdir /sys/fs/cgroup/test
# echo $$ > /sys/fs/cgroup/test/cgroup.procs
# echo $PPID > /sys/fs/cgroup/test/cgroup.procs
# iptables -A OUTPUT -m cgroup --path test

Should I see matches with the existing ssh session?

cheers,
daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Deadlock between bind and splice

2015-11-23 Thread Dmitry Vyukov
On Tue, Nov 10, 2015 at 3:59 AM, Al Viro  wrote:
> On Tue, Nov 10, 2015 at 02:38:54AM +, Al Viro wrote:
>> On Fri, Nov 06, 2015 at 07:42:15AM -0800, Eric Dumazet wrote:
>>
>> > Thank you for this report.
>> >
>> > pipe is part of fs, not net ;)
>>
>> AF_UNIX bind() vs. socketpair() interplay, OTOH...
>
> FWIW, BSD folks unlock the socket for the duration of mknod - mark it as
> "somebody's trying to bind it" to avoid the fun with racing double bind(),
> but that's about it.  Tempting, to be honest...
>
> BTW, why does unix_autobind() do allocation under ->readlock?  The allocation
> will be normally used - that if (u->addr) return; part is just dealing with
> an unlikely race, as far as I can see...


Hello,

This is still happening periodically for me. Is there a proposed fix?
I could test it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull-request: can 2015-11-23

2015-11-23 Thread Marc Kleine-Budde
Hello David,

this is a pull request of three patches for the upcoming v4.4 release.

The first patch is by Mirza Krak, it fixes a problem with the sja1000 driver
after resuming from suspend to disk, by clearing all outstanding interrupts.
Oliver Hartkopp contributes two patches targeting almost all driver, they fix
the assignment of the error location in CAN error messages.

regards,
Marc

---

The following changes since commit 4c6980462f32b4f282c5d8e5f7ea8070e2937725:

  net: ip6mr: fix static mfc/dev leaks on table destruction (2015-11-22 
20:44:47 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git 
tags/linux-can-fixes-for-4.4-20151123

for you to fetch changes up to a2ec19f888f1fb06e2424486423a16f86ad1fcc4:

  can: remove obsolete assignment for CAN protocol error type (2015-11-23 
09:37:38 +0100)


linux-can-fixes-for-4.4-20151123


Mirza Krak (1):
  can: sja1000: clear interrupts on start

Oliver Hartkopp (2):
  can: fix assignment of error location in CAN error messages
  can: remove obsolete assignment for CAN protocol error type

 drivers/net/can/bfin_can.c|  2 --
 drivers/net/can/c_can/c_can.c |  7 ++-
 drivers/net/can/cc770/cc770.c |  2 +-
 drivers/net/can/flexcan.c |  4 ++--
 drivers/net/can/janz-ican3.c  |  1 -
 drivers/net/can/m_can/m_can.c |  7 ++-
 drivers/net/can/pch_can.c |  3 +--
 drivers/net/can/rcar_can.c| 11 +--
 drivers/net/can/sja1000/sja1000.c |  4 +++-
 drivers/net/can/sun4i_can.c   |  1 -
 drivers/net/can/ti_hecc.c |  7 ++-
 drivers/net/can/usb/ems_usb.c |  1 -
 drivers/net/can/usb/esd_usb2.c|  1 -
 drivers/net/can/usb/kvaser_usb.c  |  5 ++---
 drivers/net/can/usb/usb_8dev.c|  4 +---
 drivers/net/can/xilinx_can.c  |  9 +++--
 16 files changed, 24 insertions(+), 45 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] can: remove obsolete assignment for CAN protocol error type

2015-11-23 Thread Marc Kleine-Budde
From: Oliver Hartkopp 

The assignment 'cf->data[2] |= CAN_ERR_PROT_UNSPEC' used at CAN error message
creation time is obsolete as CAN_ERR_PROT_UNSPEC is zero and cf->data[2] is
initialized with zero in alloc_can_err_skb() anyway.

So we could either assign 'cf->data[2] = CAN_ERR_PROT_UNSPEC' correctly or we
can remove the obsolete OR operation entirely.

Signed-off-by: Oliver Hartkopp 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/bfin_can.c| 2 --
 drivers/net/can/c_can/c_can.c | 1 -
 drivers/net/can/janz-ican3.c  | 1 -
 drivers/net/can/m_can/m_can.c | 1 -
 drivers/net/can/rcar_can.c| 5 ++---
 drivers/net/can/sja1000/sja1000.c | 1 -
 drivers/net/can/sun4i_can.c   | 1 -
 drivers/net/can/ti_hecc.c | 1 -
 drivers/net/can/usb/ems_usb.c | 1 -
 drivers/net/can/usb/esd_usb2.c| 1 -
 drivers/net/can/usb/usb_8dev.c| 1 -
 drivers/net/can/xilinx_can.c  | 4 +---
 12 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/drivers/net/can/bfin_can.c b/drivers/net/can/bfin_can.c
index 57dadd52b428..1deb8ff90a89 100644
--- a/drivers/net/can/bfin_can.c
+++ b/drivers/net/can/bfin_can.c
@@ -501,8 +501,6 @@ static int bfin_can_err(struct net_device *dev, u16 isrc, 
u16 status)
cf->data[2] |= CAN_ERR_PROT_FORM;
else if (status & SER)
cf->data[2] |= CAN_ERR_PROT_STUFF;
-   else
-   cf->data[2] |= CAN_ERR_PROT_UNSPEC;
}
 
priv->can.state = state;
diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 7c9892ab0a6a..f91b094288da 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -962,7 +962,6 @@ static int c_can_handle_bus_err(struct net_device *dev,
 * type of the last error to occur on the CAN bus
 */
cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
-   cf->data[2] |= CAN_ERR_PROT_UNSPEC;
 
switch (lec_type) {
case LEC_STUFF_ERROR:
diff --git a/drivers/net/can/janz-ican3.c b/drivers/net/can/janz-ican3.c
index c1e85368a198..5d04f5464faf 100644
--- a/drivers/net/can/janz-ican3.c
+++ b/drivers/net/can/janz-ican3.c
@@ -1096,7 +1096,6 @@ static int ican3_handle_cevtind(struct ican3_dev *mod, 
struct ican3_msg *msg)
cf->data[2] |= CAN_ERR_PROT_STUFF;
break;
default:
-   cf->data[2] |= CAN_ERR_PROT_UNSPEC;
cf->data[3] = ecc & ECC_SEG;
break;
}
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 9dd3ca7a73aa..39cf911f7a1e 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -487,7 +487,6 @@ static int m_can_handle_lec_err(struct net_device *dev,
 * type of the last error to occur on the CAN bus
 */
cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
-   cf->data[2] |= CAN_ERR_PROT_UNSPEC;
 
switch (lec_type) {
case LEC_STUFF_ERROR:
diff --git a/drivers/net/can/rcar_can.c b/drivers/net/can/rcar_can.c
index 9161f045d44c..bc46be39549d 100644
--- a/drivers/net/can/rcar_can.c
+++ b/drivers/net/can/rcar_can.c
@@ -241,10 +241,9 @@ static void rcar_can_error(struct net_device *ndev)
u8 ecsr;
 
netdev_dbg(priv->ndev, "Bus error interrupt:\n");
-   if (skb) {
+   if (skb)
cf->can_id |= CAN_ERR_BUSERROR | CAN_ERR_PROT;
-   cf->data[2] = CAN_ERR_PROT_UNSPEC;
-   }
+
ecsr = readb(>regs->ecsr);
if (ecsr & RCAR_CAN_ECSR_ADEF) {
netdev_dbg(priv->ndev, "ACK Delimiter Error\n");
diff --git a/drivers/net/can/sja1000/sja1000.c 
b/drivers/net/can/sja1000/sja1000.c
index f10834be48a5..8dda3b703d39 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -449,7 +449,6 @@ static int sja1000_err(struct net_device *dev, uint8_t 
isrc, uint8_t status)
cf->data[2] |= CAN_ERR_PROT_STUFF;
break;
default:
-   cf->data[2] |= CAN_ERR_PROT_UNSPEC;
cf->data[3] = ecc & ECC_SEG;
break;
}
diff --git a/drivers/net/can/sun4i_can.c b/drivers/net/can/sun4i_can.c
index d9a42c646783..68ef0a4cd821 100644
--- a/drivers/net/can/sun4i_can.c
+++ b/drivers/net/can/sun4i_can.c
@@ -575,7 +575,6 @@ static int sun4i_can_err(struct net_device *dev, u8 isrc, 
u8 status)
cf->data[2] |= CAN_ERR_PROT_STUFF;
break;
default:
-   cf->data[2] |= CAN_ERR_PROT_UNSPEC;
cf->data[3] = (ecc & SUN4I_STA_ERR_SEG_CODE)

[PATCH 1/3] can: sja1000: clear interrupts on start

2015-11-23 Thread Marc Kleine-Budde
From: Mirza Krak 

According to SJA1000 data sheet error-warning (EI) interrupt is not
cleared by setting the controller in to reset-mode.

Then if we have the following case:
- system is suspended (echo mem > /sys/power/state) and SJA1000 is left
  in operating state
- A bus error condition occurs which activates EI interrupt, system is
  still suspended which means EI interrupt will be not be handled nor
  cleared.

If the above two events occur, on resume there is no way to return the
SJA1000 to operating state, except to cycle power to it.

By simply reading the IR register on start we will clear any previous
conditions that could be present.

Signed-off-by: Mirza Krak 
Reported-by: Christian Magnusson 
Cc: linux-stable 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/sja1000/sja1000.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/can/sja1000/sja1000.c 
b/drivers/net/can/sja1000/sja1000.c
index 7b92e911a616..f10834be48a5 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -218,6 +218,9 @@ static void sja1000_start(struct net_device *dev)
priv->write_reg(priv, SJA1000_RXERR, 0x0);
priv->read_reg(priv, SJA1000_ECC);
 
+   /* clear interrupt flags */
+   priv->read_reg(priv, SJA1000_IR);
+
/* leave reset mode */
set_normal_mode(dev);
 }
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] can: fix assignment of error location in CAN error messages

2015-11-23 Thread Marc Kleine-Budde
From: Oliver Hartkopp 

As Dan Carpenter reported in http://marc.info/?l=linux-can=144793696016187
the assignment of the error location in CAN error messages had some bit wise
overlaps. Indeed the value to be assigned in data[3] is no bitfield but defines
a single value which points to a location inside the CAN frame on the wire.

This patch fixes the assignments for the error locations in error messages.

Reported-by: Dan Carpenter 
Signed-off-by: Oliver Hartkopp 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/c_can/c_can.c| 6 ++
 drivers/net/can/cc770/cc770.c| 2 +-
 drivers/net/can/flexcan.c| 4 ++--
 drivers/net/can/m_can/m_can.c| 6 ++
 drivers/net/can/pch_can.c| 3 +--
 drivers/net/can/rcar_can.c   | 6 +++---
 drivers/net/can/ti_hecc.c| 6 ++
 drivers/net/can/usb/kvaser_usb.c | 5 ++---
 drivers/net/can/usb/usb_8dev.c   | 3 +--
 drivers/net/can/xilinx_can.c | 5 ++---
 10 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 5d214d135332..7c9892ab0a6a 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -975,8 +975,7 @@ static int c_can_handle_bus_err(struct net_device *dev,
break;
case LEC_ACK_ERROR:
netdev_dbg(dev, "ack error\n");
-   cf->data[3] |= (CAN_ERR_PROT_LOC_ACK |
-   CAN_ERR_PROT_LOC_ACK_DEL);
+   cf->data[3] = CAN_ERR_PROT_LOC_ACK;
break;
case LEC_BIT1_ERROR:
netdev_dbg(dev, "bit1 error\n");
@@ -988,8 +987,7 @@ static int c_can_handle_bus_err(struct net_device *dev,
break;
case LEC_CRC_ERROR:
netdev_dbg(dev, "CRC error\n");
-   cf->data[3] |= (CAN_ERR_PROT_LOC_CRC_SEQ |
-   CAN_ERR_PROT_LOC_CRC_DEL);
+   cf->data[3] = CAN_ERR_PROT_LOC_CRC_SEQ;
break;
default:
break;
diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c
index 70a8cbb29e75..1e37313054f3 100644
--- a/drivers/net/can/cc770/cc770.c
+++ b/drivers/net/can/cc770/cc770.c
@@ -578,7 +578,7 @@ static int cc770_err(struct net_device *dev, u8 status)
cf->data[2] |= CAN_ERR_PROT_BIT0;
break;
case STAT_LEC_CRC:
-   cf->data[3] |= CAN_ERR_PROT_LOC_CRC_SEQ;
+   cf->data[3] = CAN_ERR_PROT_LOC_CRC_SEQ;
break;
}
}
diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 868fe945e35a..41c0fc9f3b14 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -535,13 +535,13 @@ static void do_bus_err(struct net_device *dev,
if (reg_esr & FLEXCAN_ESR_ACK_ERR) {
netdev_dbg(dev, "ACK_ERR irq\n");
cf->can_id |= CAN_ERR_ACK;
-   cf->data[3] |= CAN_ERR_PROT_LOC_ACK;
+   cf->data[3] = CAN_ERR_PROT_LOC_ACK;
tx_errors = 1;
}
if (reg_esr & FLEXCAN_ESR_CRC_ERR) {
netdev_dbg(dev, "CRC_ERR irq\n");
cf->data[2] |= CAN_ERR_PROT_BIT;
-   cf->data[3] |= CAN_ERR_PROT_LOC_CRC_SEQ;
+   cf->data[3] = CAN_ERR_PROT_LOC_CRC_SEQ;
rx_errors = 1;
}
if (reg_esr & FLEXCAN_ESR_FRM_ERR) {
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index ef655177bb5e..9dd3ca7a73aa 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -500,8 +500,7 @@ static int m_can_handle_lec_err(struct net_device *dev,
break;
case LEC_ACK_ERROR:
netdev_dbg(dev, "ack error\n");
-   cf->data[3] |= (CAN_ERR_PROT_LOC_ACK |
-   CAN_ERR_PROT_LOC_ACK_DEL);
+   cf->data[3] = CAN_ERR_PROT_LOC_ACK;
break;
case LEC_BIT1_ERROR:
netdev_dbg(dev, "bit1 error\n");
@@ -513,8 +512,7 @@ static int m_can_handle_lec_err(struct net_device *dev,
break;
case LEC_CRC_ERROR:
netdev_dbg(dev, "CRC error\n");
-   cf->data[3] |= (CAN_ERR_PROT_LOC_CRC_SEQ |
-   CAN_ERR_PROT_LOC_CRC_DEL);
+   cf->data[3] = CAN_ERR_PROT_LOC_CRC_SEQ;
break;
default:
break;
diff --git a/drivers/net/can/pch_can.c b/drivers/net/can/pch_can.c
index e187ca783da0..c1317889d3d8 100644
--- a/drivers/net/can/pch_can.c
+++ b/drivers/net/can/pch_can.c
@@ -559,8 +559,7 @@ static void pch_can_error(struct net_device *ndev, u32 
status)
stats->rx_errors++;
break;

Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)

2015-11-23 Thread Hannes Frederic Sowa
Hello,

On Fri, Nov 20, 2015, at 22:21, Tom Herbert wrote:
> Kernel Connection Multiplexor (KCM) is a facility that provides a
> message based interface over TCP for generic application protocols.
> The motivation for this is based on the observation that although
> TCP is byte stream transport protocol with no concept of message
> boundaries, a common use case is to implement a framed application
> layer protocol running over TCP. To date, most TCP stacks offer
> byte stream API for applications, which places the burden of message
> delineation, message I/O operation atomicity, and load balancing
> in the application. With KCM an application can efficiently send
> and receive application protocol messages over TCP using a
> datagram interface.

I am a bit struggling seeing a real need to come up with a new socket
type and subsystem for that. It looks like you want to solve the same
problem that PACKET_FANOUT does? TCP has TCP-PSH flag which could help
delimit messages and a way to improve FANOUT like PACKET_FANOUT would
solve this same problem, too? A propoer fallback has to be in user space
anyway but messages could maybe simply be flagged with an skb->mark and
fanout could push it to the correct FANOUT-subsocket.

> In order to delineate message in a TCP stream for receive in KCM, the
> kernel implements a message parser. For this we chose to employ BPF
> which is applied to the TCP stream. BPF code parses application layer
> messages and returns a message length. Nearly all binary application
> protocols are parsable in this manner, so KCM should be applicable
> across a wide range of applications. Other than message length
> determination in receive, KCM does not require any other application
> specific awareness. KCM does not implement any other application
> protocol semantics-- these are are provided in userspace or could be
> implemented in a kernel module layered above KCM.

For me this still looks a little bit like messages could be delimited by
TCP PSH flag, where we might need to have some more fine grained control
over and besides that just adding better fanout semantics to TCP, no?

Do kcm sockets still allow streaming unlimited amounts of data? E.g. if
you want to pass a data stream attached to a rpc message? I think not
allowing streaming is a major shortcoming then (even though this will
induce head of line blocking).

> Future support:
> 
>  - Integration with TLS (TLS-in-kernel is a separate initiative).

This is interesting:

Regarding the last week's discussion about better OOB support in TCP
e.g. for SOCKET_DESTROY, do you already have a plan to handle TLS alerts
and do CHANGE_CIPHER on the socket synchronously?

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/27] brcm80211: move under broadcom vendor directory

2015-11-23 Thread Arend van Spriel

On 11/22/2015 06:23 PM, Kalle Valo wrote:

Arend van Spriel  writes:


On 11/19/2015 08:48 AM, Kalle Valo wrote:

Hauke Mehrtens  writes:


On 11/18/2015 03:45 PM, Kalle Valo wrote:

Part of reorganising wireless drivers directory and Kconfig. Note that I had to
edit Makefiles from subdirectories to use the new location.

Signed-off-by: Kalle Valo 
---


I would prefer to remove the brcm80211 directory in this process and create:
drivers/net/wireless/broadcom/brcmfmac
drivers/net/wireless/broadcom/brcmsmac
drivers/net/wireless/broadcom/brcmutil
drivers/net/wireless/broadcom/include

This way we have one directory less.


I think this could be done separately. This patchset is big enough
already, I would not like to make it anymore complicated.

And I actually like the brcm80211 directory, I would not mind keeping it
still.


I prefer to keep it as brcmsmac and brcmfmac rely on brcmutil module
so I want to keep them together under brcm80211.

So does this patch go in before or after the patches I submitted
before the merge window. I hope after :-p


Sorry, the vendor patches go in first :) It's much safer that way.

But I think that git should be smart enough and your patchset from
before the merge window should still apply without issues.


Will see if that is true when I merge it in our internal repo. :-p

Thanks,
Arend

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Deadlock between bind and splice

2015-11-23 Thread Hannes Frederic Sowa
On Mon, Nov 23, 2015, at 09:32, Dmitry Vyukov wrote:
> On Tue, Nov 10, 2015 at 3:59 AM, Al Viro  wrote:
> > On Tue, Nov 10, 2015 at 02:38:54AM +, Al Viro wrote:
> >> On Fri, Nov 06, 2015 at 07:42:15AM -0800, Eric Dumazet wrote:
> >>
> >> > Thank you for this report.
> >> >
> >> > pipe is part of fs, not net ;)
> >>
> >> AF_UNIX bind() vs. socketpair() interplay, OTOH...
> >
> > FWIW, BSD folks unlock the socket for the duration of mknod - mark it as
> > "somebody's trying to bind it" to avoid the fun with racing double bind(),
> > but that's about it.  Tempting, to be honest...
> >
> > BTW, why does unix_autobind() do allocation under ->readlock?  The 
> > allocation
> > will be normally used - that if (u->addr) return; part is just dealing with
> > an unlikely race, as far as I can see...
> 
> 
> Hello,
> 
> This is still happening periodically for me. Is there a proposed fix?
> I could test it.

No, we currently have no fix for that report. :/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter

2015-11-23 Thread Vladimir Davydov
On Fri, Nov 20, 2015 at 01:56:48PM -0500, Johannes Weiner wrote:
> On Fri, Nov 20, 2015 at 03:42:16PM +0300, Vladimir Davydov wrote:
> > On Thu, Nov 12, 2015 at 06:41:28PM -0500, Johannes Weiner wrote:
> > > There won't be any separate counters for socket memory consumed by
> > > protocols other than TCP in the future. Remove the indirection and
> > 
> > I really want to believe you're right. And with vmpressure propagation
> > implemented properly you are likely to be right.
> > 
> > However, we might still want to account other socket protos to
> > memcg->memory in the unified hierarchy, e.g. UDP, or SCTP, or whatever
> > else. Adding new consumers should be trivial, but it will break the
> > legacy usecase, where only TCP sockets are supposed to be accounted.
> > What about adding a check to sock_update_memcg() so that it would enable
> > accounting only for TCP sockets in case legacy hierarchy is used?
> 
> Yup, I was thinking the same thing. But we can cross that bridge when
> we come to it and are actually adding further packet types.

Fair enough.

> 
> > For the same reason, I think we'd better rename memcg->tcp_mem to
> > something like memcg->sk_mem or we can even drop the cg_proto struct
> > altogether embedding its fields directly to mem_cgroup struct.
> > 
> > Also, I don't see any reason to have tcp_memcontrol.c file. It's tiny
> > and with this patch it does not depend on tcp code any more. Let's move
> > it to memcontrol.c?
> 
> I actually had all this at first, but then wondered if it makes more
> sense to keep the legacy code in isolation. Don't you think it would
> be easier to keep track of what's v1 and what's v2 if we keep the
> legacy stuff physically separate as much as possible? In particular I
> found that 'tcp_mem.' marker really useful while working on the code.
> 
> In the same vein, tcp_memcontrol.c doesn't really hurt anybody and I'd
> expect it to remain mostly unopened and unchanged in the future. But
> if we merge it into memcontrol.c, that code will likely be in the way
> and we'd have to make it explicit somehow that this is not actually
> part of the new memory controller anymore.
> 
> What do you think?

There isn't much code left in tcp_memcontrol.c, and not all of it is
legacy. We still want to call tcp_init_cgroup and tcp_destroy_cgroup
from memcontrol.c - in fact, it's the only call site, so I think we'd
better keep these functions there. Apart from init/destroy, there is
only stuff for handling legacy files, which is relatively small and
isolated. We can just put it along with memsw and kmem legacy files in
the end of memcontrol.c adding a comment that it's legacy. Personally,
I'd find the code easier to follow then, because currently the logic
behind the ACTIVE flag as well as memcg->tcp_mem init/use/destroy turns
out to be scattered between two files in different subsystems for no
apparent reason now, as it does not need tcp_prot any more. Besides,
this would allow us to accurately reuse the ACTIVE flag in init/destroy
for inc/dec static branch and probably in sock_update_memcg instead of
sprinkling cgroup_subsys_on_dfl all over the place, which would make the
code a bit cleaner IMO (in fact, that's why I proposed to drop ACTIVATED
bit and replace cg_proto->flags with ->active bool).

Regarding, tcp_mem marker, well, currently it's OK, because we don't
account anything but TCP sockets, but when it changes (and I'm pretty
sure it will), we'll have to rename it anyway. For now, I'm OK with
leaving it as is though.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: avoid harmless unnitialized variable warnings

2015-11-23 Thread Pablo Neira Ayuso
On Thu, Nov 19, 2015 at 01:49:59PM +0100, Arnd Bergmann wrote:
> Several ARM default configurations give us warnings on recent
> compilers about potentially uninitialized variables in the
> nfnetlink code in two functions:
> 
> net/netfilter/nfnetlink_queue.c: In function 'nfqnl_build_packet_message':
> net/netfilter/nfnetlink_queue.c:519:19: warning: 'nfnl_ct' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>   if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0)
> 
> Moving the rcu_dereference(nfnl_ct_hook) call outside of the
> conditional code avoids the warning without forcing us to
> preinitialize the variable.

Applied, thanks Arnd.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] drivers: net: xgene: optimizing the code

2015-11-23 Thread Saurabh Sengar
this patch does the following:
1 .  remove unnecessary if, else condition
2 .  reduce one variable
3 .  change the return type of 2 functions to void as there return values
turn out to be 0 always after above changes

Signed-off-by: Saurabh Sengar 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 25 +---
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 991412c..6096d02 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1084,7 +1084,7 @@ static const struct net_device_ops xgene_ndev_ops = {
 };
 
 #ifdef CONFIG_ACPI
-static int xgene_get_port_id_acpi(struct device *dev,
+static void xgene_get_port_id_acpi(struct device *dev,
  struct xgene_enet_pdata *pdata)
 {
acpi_status status;
@@ -1097,24 +1097,19 @@ static int xgene_get_port_id_acpi(struct device *dev,
pdata->port_id = temp;
}
 
-   return 0;
+   return;
 }
 #endif
 
-static int xgene_get_port_id_dt(struct device *dev, struct xgene_enet_pdata 
*pdata)
+static void xgene_get_port_id_dt(struct device *dev, struct xgene_enet_pdata 
*pdata)
 {
u32 id = 0;
-   int ret;
 
-   ret = of_property_read_u32(dev->of_node, "port-id", );
-   if (ret) {
-   pdata->port_id = 0;
-   ret = 0;
-   } else {
-   pdata->port_id = id & BIT(0);
-   }
+   of_property_read_u32(dev->of_node, "port-id", );
 
-   return ret;
+   pdata->port_id = id & BIT(0);
+
+   return;
 }
 
 static int xgene_get_tx_delay(struct xgene_enet_pdata *pdata)
@@ -1209,13 +1204,11 @@ static int xgene_enet_get_resources(struct 
xgene_enet_pdata *pdata)
}
 
if (dev->of_node)
-   ret = xgene_get_port_id_dt(dev, pdata);
+   xgene_get_port_id_dt(dev, pdata);
 #ifdef CONFIG_ACPI
else
-   ret = xgene_get_port_id_acpi(dev, pdata);
+   xgene_get_port_id_acpi(dev, pdata);
 #endif
-   if (ret)
-   return ret;
 
if (!device_get_mac_address(dev, ndev->dev_addr, ETH_ALEN))
eth_hw_addr_random(ndev);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] netfilter: implement xt_cgroup cgroup2 path match

2015-11-23 Thread Daniel Borkmann

On 11/23/2015 02:43 PM, Daniel Borkmann wrote:

On 11/21/2015 07:54 PM, Florian Westphal wrote:

Tejun Heo  wrote:

On Sat, Nov 21, 2015 at 05:56:06PM +0100, Florian Westphal wrote:

+struct xt_cgroup_info_v1 {
+__u8has_path;
+__u8has_classid;
+__u8invert_path;
+__u8invert_classid;
+charpath[PATH_MAX];
+__u32classid;
+
+/* kernel internal data */
+void*priv __attribute__((aligned(8)));
+};


Ahem.  Am I reading this right? This struct is > 4k in size?
If so -- Ugh.  Does sizeof(path) really have to be PATH_MAX?


Hmmm... yeap but would this be an acutual problem?


Since rule blob can be allocated via vmalloc i guess "no", its not
really a problem unless someone needs realy insane amount of such rules.

I don't have any better suggestion, so I guess its necessary evil.

The only other question I have is wheter PATH_MAX might be a possible
ABI breaker in future.  It would have to be guaranteed that this is the
same size forever, else you'd get strange errors on rule insertion if
the sizes of the kernel and userspace version differs.


Haven't looked deeply into kernfs, but if it's possible to get the object
from the struct file eventually, you could let iptables frontend open that
path and just pass the fd down. Would be sizeof(int) vs PATH_MAX then, i.e.
when you have a large number of rules to load.


( ... but with the downside that things like save/restore wouldn't work. )
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request: bluetooth-next 2015-11-23

2015-11-23 Thread Johan Hedberg
Hi Dave,

Here's the first bluetooth-next pull request for the 4.5 kernel.

 - Add new Get Advertising Size Information management command
 - Add support for new system note message type on monitor channel
 - Refactor LE scan changes behind separate workqueue to avoid races
 - Fix issue with privacy feature when powering on adapter
 - Various minor fixes & cleanups here and there

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit d37b4c0a3647db23f41c5ee85701eec356d1:

  be2net: remove local variable 'status' (2015-11-18 15:21:41 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to dc4270c0cd880f1b28dd48f2a31d869d22da941e:

  Bluetooth: Increment management interface revision (2015-11-23 14:13:32 +0100)


Andrei Emeltchenko (2):
  Bluetooth: Fix mask for H5 header len
  Bluetooth: Use hex notation for mask

Andrzej Kaczmarek (1):
  Bluetooth: Fix powering on with privacy and advertising

Johan Hedberg (28):
  Bluetooth: Remove redundant setting to zero of bt_cb
  Bluetooth: Compress the size of struct hci_ctrl
  Bluetooth: Add clarifying comment why schedule_work is used
  Bluetooth: Remove unnecessary call to hci_update_background_scan
  Bluetooth: Move synchronous request handling into hci_request.c
  Bluetooth: Add 'sync' specifier to synchronous request APIs
  Bluetooth: Add stubs for synchronous HCI request functionality
  Bluetooth: Run all background scan updates through req_workqueue
  Bluetooth: Don't wait for HCI in Add/Remove Device
  Bluetooth: Add HCI status return parameter to hci_req_sync()
  Bluetooth: Use req_workqueue for explicit connect requests
  Bluetooth: Use req_workqueue for background scanning when powering on
  Bluetooth: Make __hci_update_background_scan private to hci_request.c
  Bluetooth: Move LE scan disable/restart behind req_workqueue
  Bluetooth: Add discovery type validity helper
  Bluetooth: Add error return value to hci_req_sync callback
  Bluetooth: Move Start Discovery to req_workqueue
  Bluetooth: Move Stop Discovery to req_workqueue
  Bluetooth: Fix BR/EDR Page Scan update with Add Device
  Bluetooth: Pass inquiry length to bredr_inquiry()
  Bluetooth: Simplify le_scan_disable_work()
  Bluetooth: Remove unnecessary le_scan_restart_work_complete() function
  Bluetooth: Fix specifying role for LE connections
  Bluetooth: Move check for ongoing connect earlier in hci_connect_le()
  Bluetooth: Remove conn_unfinished variable from hci_connect_le()
  Bluetooth: Simplify request cleanup code
  Bluetooth: Fix returning proper HCI status from __hci_req_sync
  Bluetooth: Increment management interface revision

Marcel Holtmann (14):
  Bluetooth: Move BR/EDR default events behind its features
  Bluetooth: Build LE event mask based on supported commands
  Bluetooth: Fix issue with HCI_QUIRK_FIXUP_INQUIRY_MODE and event mask
  Bluetooth: Make LE only events conditional on supported commands
  Bluetooth: Add hci_skb_* helper wrappers for bt_cb(skb) access
  Bluetooth: Use new hci_skb_pkt_* wrappers for core packet handling
  Bluetooth: Use new hci_skb_pkt_* wrappers for drivers
  Bluetooth: Add missing hci_skb_opcode for raw socket commands
  Bluetooth: Fix casting coding style within HCI sockets
  Bluetooth: Add support for sending system notes to monitor channel
  Bluetooth: Add support for controller specific logging
  Bluetooth: Add instance range check for Add Advertising command
  Bluetooth: Simplify if statements in tlv_data_is_valid function
  Bluetooth: Add support for Get Advertising Size Information command

Markus Elfring (2):
  mac802154: Delete an unnecessary check before the function call 
"kfree_skb"
  Bluetooth: Delete an unnecessary check before the function call 
"kfree_skb"

Prasanna Karthik (3):
  Bluetooth: clean up af_bluetooth code
  Bluetooth: Clean up hci_core code
  Bluetooth: remove unneeded variable in l2cap_stream_rx

 drivers/bluetooth/bfusb.c |   9 +-
 drivers/bluetooth/bluecard_cs.c   |  25 +-
 drivers/bluetooth/bpa10x.c|   4 +-
 drivers/bluetooth/bt3c_cs.c   |  11 +-
 drivers/bluetooth/btmrvl_main.c   |   8 +-
 drivers/bluetooth/btmrvl_sdio.c   |   4 +-
 drivers/bluetooth/btsdio.c|   6 +-
 drivers/bluetooth/btuart_cs.c |  11 +-
 drivers/bluetooth/btusb.c |  48 +--
 drivers/bluetooth/btwilink.c  |   8 +-
 drivers/bluetooth/dtl1_cs.c   |  11 +-
 drivers/bluetooth/hci_ath.c   |   6 +-
 drivers/bluetooth/hci_bcm.c   |   2 +-
 drivers/bluetooth/hci_bcsp.c  |  25 +-
 drivers/bluetooth/hci_h4.c|  16 +-
 drivers/bluetooth/hci_h5.c|  18 +-
 

[PATCH] net: cdc_ncm: fix NULL pointer deref in cdc_ncm_bind_common

2015-11-23 Thread Bjørn Mork
Commit 77b0a099674a ("cdc-ncm: use common parser") added a dangerous
new trust in the CDC functional descriptors presented by the device,
unconditionally assuming that any device handled by the driver has
a CDC Union descriptor.

This descriptor is required by the NCM and MBIM specs, but crashing
on non-compliant devices is still unacceptable. Not only will that
allow malicious devices to crash the kernel, but in this case it is
also well known that there are non-compliant real devices on the
market - as shown by the comment accompanying the IAD workaround
in the same function.

The Sierra Wireless EM7305 is an example of such device, having
a CDC header and a CDC MBIM descriptor but no CDC Union:

Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber   12
  bAlternateSetting   0
  bNumEndpoints   1
  bInterfaceClass 2 Communications
  bInterfaceSubClass 14
  bInterfaceProtocol  0
  iInterface  0
  CDC Header:
bcdCDC   1.10
  CDC MBIM:
bcdMBIMVersion   1.00
wMaxControlMessage   4096
bNumberFilters   16
bMaxFilterSize   128
wMaxSegmentSize  4064
bmNetworkCapabilities 0x20
  8-byte ntb input size
  Endpoint Descriptor:
..

The conversion to a common parser also left the local cdc_union
variable untouched.  This caused the IAD workaround code to be applied
to all devices with an IAD descriptor, which was never intended.  Finish
the conversion by testing for hdr.usb_cdc_union_desc instead.

Cc: Oliver Neukum 
Fixes: 77b0a099674a ("cdc-ncm: use common parser")
Signed-off-by: Bjørn Mork 
---
 drivers/net/usb/cdc_ncm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index a187f08113ec..3b1ba8237768 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -691,7 +691,6 @@ static void cdc_ncm_free(struct cdc_ncm_ctx *ctx)
 
 int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 
data_altsetting, int drvflags)
 {
-   const struct usb_cdc_union_desc *union_desc = NULL;
struct cdc_ncm_ctx *ctx;
struct usb_driver *driver;
u8 *buf;
@@ -725,15 +724,16 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct 
usb_interface *intf, u8 data_
/* parse through descriptors associated with control interface */
cdc_parse_cdc_header(, intf, buf, len);
 
-   ctx->data = usb_ifnum_to_if(dev->udev,
-   hdr.usb_cdc_union_desc->bSlaveInterface0);
+   if (hdr.usb_cdc_union_desc)
+   ctx->data = usb_ifnum_to_if(dev->udev,
+   
hdr.usb_cdc_union_desc->bSlaveInterface0);
ctx->ether_desc = hdr.usb_cdc_ether_desc;
ctx->func_desc = hdr.usb_cdc_ncm_desc;
ctx->mbim_desc = hdr.usb_cdc_mbim_desc;
ctx->mbim_extended_desc = hdr.usb_cdc_mbim_extended_desc;
 
/* some buggy devices have an IAD but no CDC Union */
-   if (!union_desc && intf->intf_assoc && 
intf->intf_assoc->bInterfaceCount == 2) {
+   if (!hdr.usb_cdc_union_desc && intf->intf_assoc && 
intf->intf_assoc->bInterfaceCount == 2) {
ctx->data = usb_ifnum_to_if(dev->udev, 
intf->cur_altsetting->desc.bInterfaceNumber + 1);
dev_dbg(>dev, "CDC Union missing - got slave from IAD\n");
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] netfilter: implement xt_cgroup cgroup2 path match

2015-11-23 Thread Daniel Borkmann

On 11/21/2015 07:54 PM, Florian Westphal wrote:

Tejun Heo  wrote:

On Sat, Nov 21, 2015 at 05:56:06PM +0100, Florian Westphal wrote:

+struct xt_cgroup_info_v1 {
+   __u8has_path;
+   __u8has_classid;
+   __u8invert_path;
+   __u8invert_classid;
+   charpath[PATH_MAX];
+   __u32   classid;
+
+   /* kernel internal data */
+   void*priv __attribute__((aligned(8)));
+};


Ahem.  Am I reading this right? This struct is > 4k in size?
If so -- Ugh.  Does sizeof(path) really have to be PATH_MAX?


Hmmm... yeap but would this be an acutual problem?


Since rule blob can be allocated via vmalloc i guess "no", its not
really a problem unless someone needs realy insane amount of such rules.

I don't have any better suggestion, so I guess its necessary evil.

The only other question I have is wheter PATH_MAX might be a possible
ABI breaker in future.  It would have to be guaranteed that this is the
same size forever, else you'd get strange errors on rule insertion if
the sizes of the kernel and userspace version differs.


Haven't looked deeply into kernfs, but if it's possible to get the object
from the struct file eventually, you could let iptables frontend open that
path and just pass the fd down. Would be sizeof(int) vs PATH_MAX then, i.e.
when you have a large number of rules to load.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: fec: no need to test for the return type of of_property_read_u32

2015-11-23 Thread Saurabh Sengar
in case of error no need to set num_tx and num_rx = 1, because in case of error
these variables will remain unchanged by of_property_read_u32 ie 1 only

Signed-off-by: Saurabh Sengar 
---
 drivers/net/ethernet/freescale/fec_main.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index b2a3220..d2328fc 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3277,7 +3277,6 @@ static void
 fec_enet_get_queue_num(struct platform_device *pdev, int *num_tx, int *num_rx)
 {
struct device_node *np = pdev->dev.of_node;
-   int err;
 
*num_tx = *num_rx = 1;
 
@@ -3285,13 +3284,9 @@ fec_enet_get_queue_num(struct platform_device *pdev, int 
*num_tx, int *num_rx)
return;
 
/* parse the num of tx and rx queues */
-   err = of_property_read_u32(np, "fsl,num-tx-queues", num_tx);
-   if (err)
-   *num_tx = 1;
+   of_property_read_u32(np, "fsl,num-tx-queues", num_tx);
 
-   err = of_property_read_u32(np, "fsl,num-rx-queues", num_rx);
-   if (err)
-   *num_rx = 1;
+   of_property_read_u32(np, "fsl,num-rx-queues", num_rx);
 
if (*num_tx < 1 || *num_tx > FEC_ENET_MAX_TX_QS) {
dev_warn(>dev, "Invalid num_tx(=%d), fall back to 1\n",
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: severe regression in alx ethernet driver

2015-11-23 Thread Ldap Tester
The severe regression in alx ethernet driver is still present in the
latest released kernel.  Patches have been posted in
https://bugzilla.kernel.org/show_bug.cgi?id=70761 There are reports
that these patches have resolved the problem.  Could you please review
these patches and see if they can be included upstream?  I have not
been able to update my kernel in almost six months.  Many other users
are suffering from this same problem.

On Mon, Sep 14, 2015 at 6:47 PM, Ldap Tester  wrote:
> There is a serious regression in the alx ethernet driver.  The driver
> stopped working after upgrading the kernel from 4.0.x to 4.1.x.
> Please see https://bugzilla.redhat.com/show_bug.cgi?id=1251434 and
> https://bugzilla.kernel.org/show_bug.cgi?id=70761 This regression is
> urgent, as I cannot update my kernel to include the latest security
> fixes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v4] mpls: support for dead routes

2015-11-23 Thread Robert Shearman

On 21/11/15 05:16, Roopa Prabhu wrote:

From: Roopa Prabhu 

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.

Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).


...

v3 - v4:
 - removed per route rt_flags and derive it from the nh_flags during 
dumps
 - use kmemdup to make a copy of the route during route updates
   due to link events


Looks much better. Thanks for making those changes Roopa.

I've just a couple of minor comments on this new version.


+static inline int mpls_route_alloc_size(int num_nh, u8 max_alen_aligned)


I think the standard practice is to not put inline on functions declared 
in .c files, but instead to just let the compiler use its best judgement 
as to whether it's worth inlining or not.



+{
+   struct mpls_route *rt;
+
+   return (ALIGN(sizeof(*rt) + num_nh * sizeof(*rt->rt_nh),
+ VIA_ALEN_ALIGN) + num_nh * max_alen_aligned);
+}
+



-static void mpls_ifdown(struct net_device *dev)
+static inline bool mpls_route_dev_exists(struct mpls_route *rt,


Ditto.


+struct net_device *dev)
+{
+   for_nexthops(rt) {
+   if (rtnl_dereference(nh->nh_dev) != dev)
+   continue;
+   return true;
+   } endfor_nexthops(rt);
+
+   return false;
+}
+
+static void mpls_ifdown(struct net_device *dev, int event)
  {
struct mpls_route __rcu **platform_label;
struct net *net = dev_net(dev);
-   struct mpls_dev *mdev;
+   struct mpls_route *rt_new;
unsigned index;

platform_label = rtnl_dereference(net->mpls.platform_label);
for (index = 0; index < net->mpls.platform_labels; index++) {
struct mpls_route *rt = rtnl_dereference(platform_label[index]);
+
if (!rt)
continue;
-   for_nexthops(rt) {
+
+   if (!mpls_route_dev_exists(rt, dev))
+   continue;
+
+   rt_new = kmemdup(rt, mpls_route_alloc_size(rt->rt_nhn,
+  rt->rt_max_alen),
+  GFP_KERNEL);


Shouldn't the above line be indented level with the opening bracket of 
kmemdup?



+   if (!rt_new) {
+   pr_warn("mpls_ifdown: kmemdup failed\n");


It isn't safe to leave the current route untouched if the net device is 
being deleted, since a nexthop will be left holding a stale pointer to 
it. Perhaps delete the route entirely in that case?



+   return;
+   }
+
+   for_nexthops(rt_new) {


Since the nexthop is being changed, this should be change_nexthops. I 
know this was a problem in the existing code you are changing in this 
patch, if it isn't too much trouble it would be good to fix this whilst 
reindenting it.



if (rtnl_dereference(nh->nh_dev) != dev)
continue;
-   nh->nh_dev = NULL;
-   } endfor_nexthops(rt);
+   switch (event) {
+   case NETDEV_DOWN:
+   case NETDEV_UNREGISTER:
+   nh->nh_flags |= RTNH_F_DEAD;
+   /* fall through */
+   case NETDEV_CHANGE:
+   nh->nh_flags |= RTNH_F_LINKDOWN;
+   rt_new->rt_nhn_alive--;
+   break;
+   }
+   if (event == NETDEV_UNREGISTER)
+   RCU_INIT_POINTER(nh->nh_dev, NULL);
+   } endfor_nexthops(rt_new);
+
+   mpls_route_update(net, index, rt_new, NULL, false);
}

-   mdev = mpls_dev_get(dev);
-   if (!mdev)
-   return;
+   return;
+}
+
+static void mpls_ifup(struct net_device *dev, unsigned int nh_flags)
+{
+   struct mpls_route __rcu **platform_label;
+   struct net *net = dev_net(dev);
+   struct mpls_route *rt_new;
+   unsigned index;
+   int alive;
+
+   platform_label = rtnl_dereference(net->mpls.platform_label);
+   for (index = 0; index < net->mpls.platform_labels; index++) {
+   struct mpls_route *rt = rtnl_dereference(platform_label[index]);
+
+   if (!rt)
+   continue;
+
+   if (!mpls_route_dev_exists(rt, dev))
+   continue;

-   mpls_dev_sysctl_unregister(mdev);
+   rt_new = kmemdup(rt, 

Re: [PATCH net] vrf: fix double free and memory corruption on register_netdevice failure

2015-11-23 Thread David Miller
From: Nikolay Aleksandrov 
Date: Sat, 21 Nov 2015 19:46:19 +0100

> From: Nikolay Aleksandrov 
> 
> When vrf's ->newlink is called, if register_netdevice() fails then it
> does free_netdev(), but that's also done by rtnl_newlink() so a second
> free happens and memory gets corrupted, to reproduce execute the
> following line a couple of times (1 - 5 usually is enough):
> $ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
> This works because we fail in register_netdevice() because of the wrong
> name "vrf:".
> 
> And here's a trace of one crash:
 ...
> Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
> Signed-off-by: Nikolay Aleksandrov 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2] Add support for rt_tables.d

2015-11-23 Thread Stephen Hemminger
On Wed, 18 Nov 2015 11:03:20 -0800
David Ahern  wrote:

> Add support for reading table id/name mappings from rt_tables.d
> directory.
> 
> Signed-off-by: David Ahern 
> ---
>  lib/rt_names.c | 18 ++
>  1 file changed, 18 insertions(+)

This is a useful concept, and I am for incorporating it in the future.

The consensus practice for other utility is to a '.conf' suffix which
allows for README.

Probably should also ship a README file in the package.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 05/10] hv_netvsc: Eliminatte the data field from struct hv_netvsc_packet

2015-11-23 Thread K. Y. Srinivasan
Eliminatte the data field from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h   |5 ++---
 drivers/net/hyperv/netvsc.c   |5 +++--
 drivers/net/hyperv/netvsc_drv.c   |3 ++-
 drivers/net/hyperv/rndis_filter.c |   11 +++
 4 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 7fa4f43..506d552 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -148,9 +148,6 @@ struct hv_netvsc_packet {
u64 send_completion_tid;
void *send_completion_ctx;
void (*send_completion)(void *context);
-
-   /* Points to the send/receive buffer where the ethernet frame is */
-   void *data;
struct hv_page_buffer *page_buf;
 };
 
@@ -196,6 +193,7 @@ void netvsc_linkstatus_callback(struct hv_device 
*device_obj,
 void netvsc_xmit_completion(void *context);
 int netvsc_recv_callback(struct hv_device *device_obj,
struct hv_netvsc_packet *packet,
+   void **data,
struct ndis_tcp_ip_checksum_info *csum_info,
struct vmbus_channel *channel);
 void netvsc_channel_cb(void *context);
@@ -206,6 +204,7 @@ int rndis_filter_device_add(struct hv_device *dev,
 void rndis_filter_device_remove(struct hv_device *dev);
 int rndis_filter_receive(struct hv_device *dev,
struct hv_netvsc_packet *pkt,
+   void **data,
struct vmbus_channel *channel);
 
 int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 2de9e7f..8fbf816 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1008,6 +1008,7 @@ static void netvsc_receive(struct netvsc_device 
*net_device,
int i;
int count = 0;
struct net_device *ndev;
+   void *data;
 
ndev = net_device->ndev;
 
@@ -1047,13 +1048,13 @@ static void netvsc_receive(struct netvsc_device 
*net_device,
for (i = 0; i < count; i++) {
/* Initialize the netvsc packet */
netvsc_packet->status = NVSP_STAT_SUCCESS;
-   netvsc_packet->data = (void *)((unsigned long)net_device->
+   data = (void *)((unsigned long)net_device->
recv_buf + vmxferpage_packet->ranges[i].byte_offset);
netvsc_packet->total_data_buflen =
vmxferpage_packet->ranges[i].byte_count;
 
/* Pass it to the upper layer */
-   rndis_filter_receive(device, netvsc_packet, channel);
+   rndis_filter_receive(device, netvsc_packet, , channel);
 
if (netvsc_packet->status != NVSP_STAT_SUCCESS)
status = NVSP_STAT_FAIL;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 77c0849..c73afb1 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -685,6 +685,7 @@ void netvsc_linkstatus_callback(struct hv_device 
*device_obj,
  */
 int netvsc_recv_callback(struct hv_device *device_obj,
struct hv_netvsc_packet *packet,
+   void **data,
struct ndis_tcp_ip_checksum_info *csum_info,
struct vmbus_channel *channel)
 {
@@ -713,7 +714,7 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 * Copy to skb. This copy is needed here since the memory pointed by
 * hv_netvsc_packet cannot be deallocated
 */
-   memcpy(skb_put(skb, packet->total_data_buflen), packet->data,
+   memcpy(skb_put(skb, packet->total_data_buflen), *data,
packet->total_data_buflen);
 
skb->protocol = eth_type_trans(skb, net);
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 63584e7..be0fa9c 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -351,6 +351,7 @@ static inline void *rndis_get_ppi(struct rndis_packet 
*rpkt, u32 type)
 static void rndis_filter_receive_data(struct rndis_device *dev,
   struct rndis_message *msg,
   struct hv_netvsc_packet *pkt,
+  void **data,
   struct vmbus_channel *channel)
 {
struct rndis_packet *rndis_pkt;
@@ -383,7 +384,7 @@ static void rndis_filter_receive_data(struct rndis_device 
*dev,
 * the data packet to the stack, without the rndis trailer padding
 */
pkt->total_data_buflen = rndis_pkt->data_len;
-   pkt->data = (void *)((unsigned long)pkt->data + data_offset);
+   *data = 

[PATCH net-next 03/10] hv_netvsc: Eliminate the channel field in hv_netvsc_packet structure

2015-11-23 Thread K. Y. Srinivasan
Eliminate the channel field in hv_netvsc_packet structure.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h   |   22 ++
 drivers/net/hyperv/netvsc.c   |   19 ---
 drivers/net/hyperv/netvsc_drv.c   |5 +++--
 drivers/net/hyperv/rndis_filter.c |   10 ++
 4 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 7435673..ac24091 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -144,7 +144,6 @@ struct hv_netvsc_packet {
u32 total_data_buflen;
u32 pad1;
 
-   struct vmbus_channel *channel;
 
u64 send_completion_tid;
void *send_completion_ctx;
@@ -199,7 +198,8 @@ void netvsc_linkstatus_callback(struct hv_device 
*device_obj,
 void netvsc_xmit_completion(void *context);
 int netvsc_recv_callback(struct hv_device *device_obj,
struct hv_netvsc_packet *packet,
-   struct ndis_tcp_ip_checksum_info *csum_info);
+   struct ndis_tcp_ip_checksum_info *csum_info,
+   struct vmbus_channel *channel);
 void netvsc_channel_cb(void *context);
 int rndis_filter_open(struct hv_device *dev);
 int rndis_filter_close(struct hv_device *dev);
@@ -207,12 +207,12 @@ int rndis_filter_device_add(struct hv_device *dev,
void *additional_info);
 void rndis_filter_device_remove(struct hv_device *dev);
 int rndis_filter_receive(struct hv_device *dev,
-   struct hv_netvsc_packet *pkt);
+   struct hv_netvsc_packet *pkt,
+   struct vmbus_channel *channel);
 
 int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
 int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac);
 
-
 #define NVSP_INVALID_PROTOCOL_VERSION  ((u32)0x)
 
 #define NVSP_PROTOCOL_VERSION_12
@@ -1262,5 +1262,19 @@ struct rndis_message {
 #define TRANSPORT_INFO_IPV6_TCP ((INFO_IPV6 << 16) | INFO_TCP)
 #define TRANSPORT_INFO_IPV6_UDP ((INFO_IPV6 << 16) | INFO_UDP)
 
+static inline struct vmbus_channel *get_channel(struct hv_netvsc_packet 
*packet,
+   struct netvsc_device *net_device)
+
+{
+   struct vmbus_channel *out_channel;
+
+   out_channel = net_device->chn_table[packet->q_idx];
+   if (!out_channel) {
+   out_channel = net_device->dev->channel;
+   packet->q_idx = 0;
+   }
+   return out_channel;
+}
+
 
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 51e4c0f..52533ed 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -610,6 +610,7 @@ static inline void netvsc_free_send_slot(struct 
netvsc_device *net_device,
 }
 
 static void netvsc_send_completion(struct netvsc_device *net_device,
+  struct vmbus_channel *incoming_channel,
   struct hv_device *device,
   struct vmpacket_descriptor *packet)
 {
@@ -651,7 +652,7 @@ static void netvsc_send_completion(struct netvsc_device 
*net_device,
if (send_index != NETVSC_INVALID_INDEX)
netvsc_free_send_slot(net_device, send_index);
q_idx = nvsc_packet->q_idx;
-   channel = nvsc_packet->channel;
+   channel = incoming_channel;
nvsc_packet->send_completion(nvsc_packet->
 send_completion_ctx);
}
@@ -748,7 +749,7 @@ static inline int netvsc_send_pkt(
struct netvsc_device *net_device)
 {
struct nvsp_message nvmsg;
-   struct vmbus_channel *out_channel = packet->channel;
+   struct vmbus_channel *out_channel = get_channel(packet, net_device);
u16 q_idx = packet->q_idx;
struct net_device *ndev = net_device->ndev;
u64 req_id;
@@ -857,13 +858,9 @@ int netvsc_send(struct hv_device *device,
if (!net_device)
return -ENODEV;
 
-   out_channel = net_device->chn_table[q_idx];
-   if (!out_channel) {
-   out_channel = device->channel;
-   q_idx = 0;
-   packet->q_idx = 0;
-   }
-   packet->channel = out_channel;
+   out_channel = get_channel(packet, net_device);
+   q_idx = packet->q_idx;
+
packet->send_buf_index = NETVSC_INVALID_INDEX;
packet->cp_partial = false;
 
@@ -1043,7 +1040,6 @@ static void netvsc_receive(struct netvsc_device 
*net_device,
}
 
count = vmxferpage_packet->range_cnt;
-   netvsc_packet->channel = channel;
 
/* Each range represents 1 RNDIS pkt that contains 1 ethernet frame */
   

  1   2   >