[PATCH] Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"

2018-06-12 Thread Bart Van Assche
Revert the patch mentioned in the subject because it breaks at least
the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
daemon to fail to start:

Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining 
capabilities.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in 
/etc/avahi/services.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure 
needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure 
needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No 
suitable network protocol available
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process 
exited, code=exited, status=255/n/a
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 
'exit-code'.
Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.

Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on 
bound sockets")
Cc: Maciej Żenczykowski 
Cc: Eric Dumazet 
Signed-off-by: Bart Van Assche 
---
 net/core/sock.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index f333d75ef1a9..bcc41829a16d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -728,22 +728,9 @@ int sock_setsockopt(struct socket *sock, int level, int 
optname,
sock_valbool_flag(sk, SOCK_DBG, valbool);
break;
case SO_REUSEADDR:
-   val = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
-   if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
-   inet_sk(sk)->inet_num &&
-   (sk->sk_reuse != val)) {
-   ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : 
-EUCLEAN;
-   break;
-   }
-   sk->sk_reuse = val;
+   sk->sk_reuse = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
break;
case SO_REUSEPORT:
-   if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
-   inet_sk(sk)->inet_num &&
-   (sk->sk_reuseport != valbool)) {
-   ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : 
-EUCLEAN;
-   break;
-   }
sk->sk_reuseport = valbool;
break;
case SO_TYPE:
-- 
2.17.0



Re: [PATCH net-next 5/5] net/smc: return booleans instead of integers

2018-01-26 Thread Bart Van Assche
On Fri, 2018-01-26 at 09:28 +0100, Ursula Braun wrote:
> From: Gustavo A. R. Silva 
> 
> Return statements in functions returning bool should use
> true/false instead of 1/0.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 
> Signed-off-by: Ursula Braun 
> ---
>  net/smc/smc.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/smc/smc.h b/net/smc/smc.h
> index bfbe20234105..9518986c97b1 100644
> --- a/net/smc/smc.h
> +++ b/net/smc/smc.h
> @@ -252,12 +252,12 @@ static inline int smc_uncompress_bufsize(u8 compressed)
>  static inline bool using_ipsec(struct smc_sock *smc)
>  {
>   return (smc->clcsock->sk->sk_policy[0] ||
> - smc->clcsock->sk->sk_policy[1]) ? 1 : 0;
> + smc->clcsock->sk->sk_policy[1]) ? true : false;
>  }

Hello Ursula,

If you ever have to touch this code again, please follow the style of other 
kernel
code and leave out the "? true : false" part and also the parentheses. Both are
superfluous.

Thanks,

Bart.




Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution

2018-01-08 Thread Bart Van Assche

On 01/05/18 22:30, Dan Williams wrote:

On Fri, Jan 5, 2018 at 6:22 PM, Eric W. Biederman  wrote:

Please expand this.

It is not clear what the static analysis is looking for.  Have a clear
description of what is being fixed is crucial for allowing any of these
changes.

For the details given in the change description what I read is magic
changes because a magic process says this code is vulnerable.


Yes, that was my first reaction to the patches as well, I try below to
add some more background and guidance, but in the end these are static
analysis reports across a wide swath of sub-systems. It's going to
take some iteration with domain experts to improve the patch
descriptions, and that's the point of this series, to get the better
trained eyes from the actual sub-system owners to take a look at these
reports.


More information about what the static analysis is looking for would 
definitely be welcome.


Additionally, since the analysis tool is not publicly available, how are 
authors of new kernel code assumed to verify whether or not their code 
needs to use nospec_array_ptr()? How are reviewers of kernel code 
assumed to verify whether or not nospec_array_ptr() is missing where it 
should be used?


Since this patch series only modifies the upstream kernel, how will 
out-of-tree drivers be fixed, e.g. the nVidia driver and the Android 
drivers?


Thanks,

Bart.


Re: [Cluster-devel] [PATCH 00/12] drop unneeded newline

2018-01-02 Thread Bart Van Assche
On Tue, 2018-01-02 at 15:00 +0100, Julia Lawall wrote:
> On Tue, 2 Jan 2018, Bob Peterson wrote:
> > - Original Message -
> > > - Original Message -
> > >
> > Still, the GFS2 and DLM code has a plethora of broken-up printk messages,
> > and I don't like the thought of re-combining them all.
> 
> Actually, the point of the patch was to remove the unnecessary \n at the
> end of the string, because log_print will add another one.  If you prefer
> to keep the string broken up, I can resend the patch in that form, but
> without the unnecessary \n.

Please combine any user-visible strings into a single line for which the
unneeded newline is dropped since these strings are modified anyway by
your patch.

Thanks,

Bart.

Re: [PATCH v2 0/5] Support for generalized use of make C={1,2} via a wrapper program

2017-12-18 Thread Bart Van Assche
On Mon, 2017-12-18 at 10:46 -0700, Jason Gunthorpe wrote:
> On Sun, Dec 17, 2017 at 10:00:17PM -0800, Joe Perches wrote:
> 
> > > Today when we run checkers we get so many warnings it is too hard to
> > > make any sense of it.
> > 
> > Here is a list of the checkpatch messages for drivers/infiniband
> > sorted by type.
> > 
> > Many of these might be corrected by using
> > 
> > $ ./scripts/checkpatch.pl -f --fix-inplace --types= \
> >   $(git ls-files drivers/infiniband/)
> 
> How many of these do you think it is worth to fix?
> 
> We do get a steady trickle of changes in this topic every cycle.
> 
> Is it better to just do a big number of them all at once? Do you have
> an idea how disruptive this kind of work is to the whole patch flow
> eg new patches no longer applying to for-next, backports no longer
> applying, merge conflicts?

In my opinion patches that only change the coding style and do not change any
functionality are annoying. Before posting a patch that fixes a bug the change
history (git log -p) has to be cheched to figure out which patch introduced
the bug. Patches that only change coding style pollute the change history.

Bart.

Re: [PATCH net-next 00/10] net/smc: updates 2017-09-20

2017-09-20 Thread Bart Van Assche
On Wed, 2017-09-20 at 13:58 +0200, Ursula Braun wrote:
> here is a collection of small smc-patches built for net-next improving
> the smc code in different areas.

Hello Ursula,

Can you provide us an update for the timeline of the plan to transition from
PF_SMC to PF_INET/PF_INET6 + SOCK_STREAM? See also
https://www.mail-archive.com/netdev@vger.kernel.org/msg166744.html.

Thanks,

Bart.

Re: [PATCH] net/smc: mark as BROKEN due to remote memory exposure

2017-05-14 Thread Bart Van Assche
On Sun, 2017-05-14 at 11:51 -0400, David Miller wrote:
> From: Christoph Hellwig 
> Date: Sun, 14 May 2017 07:58:48 +0200
> 
> > this patch has not been superceeded by anything, can you explain why
> > it has been marked as such in patchworks?
> 
> I think you're being overbearing by requiring this to be marked BROKEN
> and I would like you to explore other ways with the authors to fix
> whatever perceived problems you think SMC has.
> 
> You claim that this is somehow "urgent" is false.  You can ask
> distributions to disable SMC or whatever in the short term if it
> reallly, truly, bothers you.

Hello Dave,

There is agreement that the user-space API for using the SMC protocol must
be changed, namely by dropping AF_SMC and by making applications use the
SMC protocol through socket(AF_INET..., SOCK_STREAM, ...). What is your
plan to avoid that applications start using and depending on AF_SMC?

Thanks,

Bart.

Re: [PATCH] net/smc: mark as BROKEN due to remote memory exposure

2017-05-11 Thread Bart Van Assche
On Wed, 2017-05-10 at 09:26 +0200, Christoph Hellwig wrote:
> The driver has a lot of quality issues due to the lack of RDMA-side
> review, and explicitly bypasses APIs to register all memory once a
> connection is made, and thus allows remote access to memoery.
> 
> Mark it as broken until at least that part is fixed.

Since this is the only way to get the BROKEN marker in the v4.11 stable
kernel series:

Acked-by: Bart Van Assche <bart.vanass...@sandisk.com>

Re: [RFC iproute2 0/8] RDMA tool

2017-05-08 Thread Bart Van Assche
On Sun, 2017-05-07 at 12:20 +0200, Jiri Pirko wrote:
> Sat, May 06, 2017 at 04:40:24PM CEST, bart.vanass...@sandisk.com wrote:
> > On Sat, 2017-05-06 at 12:40 +0200, Jiri Pirko wrote:
> > > Thu, May 04, 2017 at 08:10:54PM CEST, bart.vanass...@sandisk.com wrote:
> > > > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
> > > > > Following our discussion both in mailing list [1] and at the LPC 2016 
> > > > > [2],
> > > > > we would like to propose this RDMA tool to be part of iproute2 package
> > > > > and finally improve this situation.
> > > > 
> > > > Although I really appreciate your work: can you clarify why you would 
> > > > like to
> > > > add *RDMA* functionality to an *IP routing* tool? I haven't found any 
> > > > motivation
> > > > for adding RDMA functionality to iproute2 in [1].
> > > 
> > > Bart, please realize that iproute2 is much more than "*IP routing* tool".
> > > I understand you got confused by the name. Please see sources. Your 
> > > comment
> > > is totally pointless...
> > 
> > I asked for a clarification that should have been in the cover letter but 
> > that
> > was missing from that cover letter. So I think that was the right thing to 
> > do
> 
> I think that was just complete misunderstanding about what iproute2 is.

Hello Jiri,

I do not agree with your reply. The abbreviation "IP" occurs in the package
name and that is a reference to the "Internet Protocol". As far as I know
originally the iproute2 package contained only tools related to the Internet
Protocol. Other tools, e.g. the tipc tool, were added later on. What I'm
wondering about is whether it really is a good idea to add tools like tipc
and rdma to the iproute2 package. The iproute2 package is so essential that
it gets installed on every Linux system, including embedded systems and
smartphones based on Linux. Several companies maintain embedded Linux
distributions and tools to build software images. These tools provide a user
interface that allows to select what packages go into such an image. Adding
tools like tipc and rdma to the iproute2 package makes it harder than
necessary for those who build software images for embedded devices to
minimize the size of such an image. As you probably know even today the size
of a software image still matters for embedded devices. Something else I have
been wondering about is whether bundling the tipc and rdma tools in the
iproute2 package will make the job harder of people who build Android ROMs?
The ip tool is present in every Android ROM, and the size of these ROMs matters
because the larger these ROMs are the less space remains for apps.

Bart.

Re: [RFC iproute2 0/8] RDMA tool

2017-05-06 Thread Bart Van Assche
On Sat, 2017-05-06 at 12:40 +0200, Jiri Pirko wrote:
> Thu, May 04, 2017 at 08:10:54PM CEST, bart.vanass...@sandisk.com wrote:
> > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
> > > Following our discussion both in mailing list [1] and at the LPC 2016 [2],
> > > we would like to propose this RDMA tool to be part of iproute2 package
> > > and finally improve this situation.
> > 
> > Although I really appreciate your work: can you clarify why you would like 
> > to
> > add *RDMA* functionality to an *IP routing* tool? I haven't found any 
> > motivation
> > for adding RDMA functionality to iproute2 in [1].
> 
> Bart, please realize that iproute2 is much more than "*IP routing* tool".
> I understand you got confused by the name. Please see sources. Your comment
> is totally pointless...

I asked for a clarification that should have been in the cover letter but that
was missing from that cover letter. So I think that was the right thing to do
instead of pointless. BTW, can you explain why you are using an e-mail address
that is hiding that you are a Mellanox employee?

Bart.

Re: [RFC iproute2 0/8] RDMA tool

2017-05-05 Thread Bart Van Assche
On Thu, 2017-05-04 at 21:45 +0300, Leon Romanovsky wrote:
> It is not hard to create new tool, the hardest part is to ensure that it is
> part of the distributions. Did you count how many months we are trying to
> add rdma-core to debian?

Hello Leon,

Sorry but I was not aware that the effort to add rdma-core to Debian was taking
that long. Please let me know if I can help with that effort.

Bart.

Re: [RFC iproute2 0/8] RDMA tool

2017-05-04 Thread Bart Van Assche
On Thu, 2017-05-04 at 21:25 +0300, Leon Romanovsky wrote:
> On Thu, May 04, 2017 at 06:10:54PM +0000, Bart Van Assche wrote:
> > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
> > > Following our discussion both in mailing list [1] and at the LPC 2016 [2],
> > > we would like to propose this RDMA tool to be part of iproute2 package
> > > and finally improve this situation.
> > 
> > Hello Leon,
> > 
> > Although I really appreciate your work: can you clarify why you would like 
> > to
> > add *RDMA* functionality to an *IP routing* tool? I haven't found any 
> > motivation
> > for adding RDMA functionality to iproute2 in [1].
> 
> We are planning to reuse the same infrastructure provided by iproute2,
> like netlink parsing, access to distributions, same CLI and same standards.
> 
> Right now, RDMA is already tightened to netdev: iWARP, RoCE, IPoIB, HFI-VNIC.
> Many drivers (mlx, qed, i40, cxgb) are sharing code between net and
> RDMA.
> 
> I do expect that iproute2 will be installed on every machine with any
> type of connection, including IB and OPA.
> 
> So I think that it is enough to be part of that suite and don't invent
> our own for one specific tool.

Hello Leon,

Sorry but to me that sounds like a weak argument for including RDMA 
functionality
in iproute2. There is already a library for communication over netlink sockets,
namely libnl. Is there functionality that is in iproute2 but not in libnl and
that is needed for the new tool? If so, have you considered to create a new
library for that functionality?

Thanks,

Bart.

Re: [RFC iproute2 0/8] RDMA tool

2017-05-04 Thread Bart Van Assche
On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
> Following our discussion both in mailing list [1] and at the LPC 2016 [2],
> we would like to propose this RDMA tool to be part of iproute2 package
> and finally improve this situation.

Hello Leon,

Although I really appreciate your work: can you clarify why you would like to
add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation
for adding RDMA functionality to iproute2 in [1].

Thanks,

Bart.

Re: net/smc and the RDMA core

2017-05-02 Thread Bart Van Assche
On Tue, 2017-05-02 at 14:25 +0200, Ursula Braun wrote:
> if you can point out specific issues, we will be happy to work with you
> to get them addressed!

Hello Ursula,

My list of issues that I would like to see addressed can be found below. Doug,
Christoph and others may have additional inputs. The issues that have not yet
been mentioned in other e-mails are:
- The SMC driver only supports one RDMA transport type (RoCE v1) but
  none of the other RDMA transport types (RoCE v2, IB and iWARP). New
  RDMA drivers should support all RDMA transport types transparently.
  The traditional approach to support multiple RDMA transport types is
  by using the RDMA/CM to establish connections.
- The implementation of the SMC driver only supports RoCEv1. This is
  a very unfortunate choice because:
  - RoCEv1 is not routable and hence is limited to a single Ethernet
broadcast domain.
  - RoCEv1 packets escape a whole bunch of mechanisms that only work
for IP packets. Firewalls pass all RoCEv1 packets and switches
do not restrict RoCEv1 packets to a single VLAN. This means that
if the network configuration is changed after an SMC connection
has been set up such that IP communication between the endpoints
of an SMC connection is blocked that the SMC RoCEv1 packets will
not be blocked by the network equipment of which the configuration
has just been changed.
- As already mentioned by Christoph, the SMC implementation uses RDMA
  calls that probably will be deprecated soon (ib_create_cq()) and
  should use the RDMA R/W API instead of building sge lists itself.

Bart.

Re: net/smc and the RDMA core

2017-05-02 Thread Bart Van Assche
On Tue, 2017-05-02 at 14:41 +0200, Ursula Braun wrote:
> On 05/01/2017 07:55 PM, Parav Pandit wrote:
> > Hi Bart, Ursula, Dave,
> > 
> > I am particularly concerned about SMC as address family.
> > It should not be treated as address family, but rather an additional
> > protocol similar for socket type SOCK_STREAM.
> 
> We tried to avoid changes of the kernel TCP code. A new address family
> seemed to be a feasible way to achieve this.

Hello Ursula,

I agree with Parav that introducing a new address family for SMC was an
unfortunate choice. As one can see in e.g. the implementation of the SCTP
protocol no changes to the TCP implementation are needed to add support
for a new SOCK_STREAM protocol. I think the SCTP implementation uses
inet_register_protosw() to register itself dynamically.

Bart.

Re: net/smc and the RDMA core

2017-05-01 Thread Bart Van Assche
On Mon, 2017-05-01 at 18:33 +0200, Christoph Hellwig wrote:
> Hi Ursual, hi netdev reviewers,
> 
> how did the smc protocol manage to get merged without any review 
> on linux-rdma at all?  As the results it seems it's very substandard
> in terms of RDMA API usage, e.g. it neither uses the proper CQ API
> nor the RDMA R/W API, and other will probably find additional issues
> as well.

Hello Dave and Ursula,

It seems very rude to me to have merged the SMC protocol driver without
having involved the linux-rdma community. Anyway, I have the following
questions for Dave and Ursula:
* Since the Linux kernel is standards based: where can we find the standard
  that defines the SMC wire protocol? If this protocol has not been
  standardized yet: in what file (other than *.[ch]) in the Linux kernel
  tree has this protocol been documented?
* What are the differences between the SMC protocol, the SDP protocol and
  the rsockets protocol? How do existing implementations for these protocols
  compare to each other from a performance point of view? If no performance
  comparison between these protocols is available, shouldn't the performance
  of these protocols have been compared with each other before a review of
  the SMC driver even started?
* What are the reasons why the SDP driver was never accepted upstream? Do
  the arguments why SDP was not accepted upstream also apply to the SMC
  driver (SDP = Sockets Direct Protocol)?
* Since SMC has to be selected by specifying AF_SMC, how are users expected
  to specify whether AF_INET, AF_INET6 or yet another address family should
  be used to set up a connection between SMC
endpoints?
* Is the SMC driver limited to RoCE? Are you aware that the rsockets library
  supports multiple transport layers (RoCE, IB and iWARP)?
* Since functionality that is similar what the SMC driver provides already
  exists in user space (rsockets), why has this functionality been
  reimplemented as a kernel driver (SMC)?

Bart.

Re: [patch] net/mlx4: && vs & typo

2017-02-28 Thread Bart Van Assche
On 02/28/2017 02:23 PM, Joe Perches wrote:
> On Tue, 2017-02-28 at 15:35 +0000, Bart Van Assche wrote:
>> On Tue, 2017-02-28 at 15:02 +0300, Dan Carpenter wrote:
>>> Bitwise & was obviously intended here.
> []
>>> diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
> []
>>> @@ -109,7 +109,7 @@ static inline void  (u8 *addr, u64 mac)
>>> int i;
>>>  
>>> for (i = ETH_ALEN; i > 0; i--) {
>>> -   addr[i - 1] = mac && 0xFF;
>>> +   addr[i - 1] = mac & 0xFF;
>>> mac >>= 8;
>>> }
>>>  }
>>
>> Is this the only place where such a loop occurs?
> 
> Seems to be.
> 
>> Should a put_unaligned_be48()
>> function be introduced?
> 
> Why?  This is used exactly once.

Really? Here is an example of another open-coded version of
put_unaligned_be48() from arch/mips/cavium-octeon/octeon-platform.c:

new_mac[0] = (mac >> 40) & 0xff;
new_mac[1] = (mac >> 32) & 0xff;
new_mac[2] = (mac >> 24) & 0xff;
new_mac[3] = (mac >> 16) & 0xff;
new_mac[4] = (mac >> 8) & 0xff;
new_mac[5] = mac & 0xff;

Bart.


Re: [patch] net/mlx4: && vs & typo

2017-02-28 Thread Bart Van Assche
On Tue, 2017-02-28 at 15:02 +0300, Dan Carpenter wrote:
> Bitwise & was obviously intended here.
> 
> Fixes: 745d8ae4622c ("net/mlx4: Spoofcheck and zero MAC can't coexist")
> Signed-off-by: Dan Carpenter 
> ---
> Applies to net.git.
> 
> diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
> index e965e5090d96..a858bcb6220b 100644
> --- a/include/linux/mlx4/driver.h
> +++ b/include/linux/mlx4/driver.h
> @@ -109,7 +109,7 @@ static inline void mlx4_u64_to_mac(u8 *addr, u64 mac)
>   int i;
>  
>   for (i = ETH_ALEN; i > 0; i--) {
> - addr[i - 1] = mac && 0xFF;
> + addr[i - 1] = mac & 0xFF;
>   mac >>= 8;
>   }
>  }

Is this the only place where such a loop occurs? Should a put_unaligned_be48()
function be introduced?

Bart.

Re: [RFC v3 01/11] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) documentation

2017-02-08 Thread Bart Van Assche
On Tue, 2017-02-07 at 12:23 -0800, Vishwanathapura, Niranjana wrote:
> diff --git a/Documentation/infiniband/hfi_vnic.txt 
> b/Documentation/infiniband/hfi_vnic.txt
> new file mode 100644
> index 000..c6c801e
> --- /dev/null
> +++ b/Documentation/infiniband/hfi_vnic.txt
> @@ -0,0 +1,102 @@
> +Intel Omni-Path Host Fabric Interface (HFI) Virtual Network Interface
> +Controller (VNIC) feature supports Ethernet functionality over Omni-Path
> +fabric by encapsulating the Ethernet packets between HFI nodes.

Please add a reference to the document that defines the Ethernet encapsulation
wire protocol.

> +The patterns of exchanges of Omni-Path encapsulated Ethernet packets
> +involves one or more virtual Ethernet switches overlaid on the Omni-Path
> +fabric topology. A subset of HFI nodes on the Omni-Path fabric are
> +permitted to exchange encapsulated Ethernet packets across a particular
> +virtual Ethernet switch. The virtual Ethernet switches are logical
> +abstractions achieved by configuring the HFI nodes on the fabric for
> +header generation and processing. In the simplest configuration all HFI
> +nodes across the fabric exchange encapsulated Ethernet packets over a
> +single virtual Ethernet switch. A virtual Ethernet switch, is effectively
> +an independent Ethernet network. The configuration is performed by an
> +Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
> +application. HFI nodes can have multiple VNICs each connected to a
> +different virtual Ethernet switch. The below diagram presents a case
> +of two virtual Ethernet switches with two HFI nodes.

Please elaborate this section. What is a virtual Ethernet switch? Is it a
software entity or something that is implemented in hardware? Also, how are
these independent Ethernet networks identified on the wire? The Linux kernel
already supports IB partitions and Ethernet VLANs. How do these independent
Ethernet networks compare to IB partitions and Ethernet VLANs? Which wire-
level header contains the identity of these Ethernet networks? Is it
possible to query from user space which Ethernet network a VNIC belongs to?
If so, with which API and which tools?

Thanks,

Bart.


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Bart Van Assche
On Tue, 2017-02-07 at 16:54 -0800, Vishwanathapura, Niranjana wrote:
> On Tue, Feb 07, 2017 at 09:58:50PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote:
> > > This is Ethernet - not IP - encapsulation over a non-InfiniBand 
> > > device/protocol.
> > 
> > That's more than clear from the cover letter. In my opinion the cover letter
> > should explain why it is considered useful to have such a driver upstream
> > and what the use cases are of encapsulating Ethernet frames inside RDMA
> > packets.
> 
> We believe on our HW, HFI VNIC design gives better hardware resource usage 
> which is also scalable and hence room for better performance.
> Also as evident in the cover letter, it gives us better manageability by 
> defining virtual Ethernet switches overlaid on the fabric and
> use standard Ethernet support provided by Linux.

That kind of language is appropriate for a marketing brochure but not for a
technical forum. Even reading your statement twice did not make me any wiser.
You mentioned "better hardware resource usage". Compared to what? Is that
perhaps compared to IPoIB? Since Ethernet frames have an extra header and are
larger than IPoIB frames, how can larger frames result in better hardware
resource usage? And what is a virtual Ethernet switch? Is this perhaps packet
forwarding by software? If so, why are virtual Ethernet switches needed since
the Linux networking stack already supports packet forwarding?

Thanks,

Bart.


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Bart Van Assche
On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote:
> This is Ethernet - not IP - encapsulation over a non-InfiniBand 
> device/protocol.

That's more than clear from the cover letter. In my opinion the cover letter
should explain why it is considered useful to have such a driver upstream
and what the use cases are of encapsulating Ethernet frames inside RDMA
packets.

Bart.


Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)

2017-02-07 Thread Bart Van Assche
On Tue, 2017-02-07 at 12:22 -0800, Vishwanathapura, Niranjana wrote:
> Intel Omni-Path Host Fabric Interface (HFI) Virtual Network Interface
> Controller (VNIC) feature supports Ethernet functionality over Omni-Path
> fabric by encapsulating the Ethernet packets between HFI nodes.

This may have been stated before, but what is missing from this description
is an explanation of why accepting an Ethernet over RDMA driver in the
upstream kernel is considered useful. We already have an IPoIB driver, so
why to add another driver to the kernel tree for communicating IP packets
over an RDMA network?

Bart.


Re: [PATCH 4.10-rc3 09/13] iscsi: fix build errors when linux/phy*.h is removed from net/dsa.h

2017-01-31 Thread Bart Van Assche
On Tue, 2017-01-31 at 19:19 +, Russell King wrote:
> drivers/target/iscsi/iscsi_target_login.c:1135:7: error: implicit declaration 
> of function 'try_module_get' [-Werror=implicit-function-declaration]
> 
> Add linux/module.h to iscsi_target_login.c.
> 
> Signed-off-by: Russell King <rmk+ker...@armlinux.org.uk>
> ---
>  drivers/target/iscsi/iscsi_target_login.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/target/iscsi/iscsi_target_login.c 
> b/drivers/target/iscsi/iscsi_target_login.c
> index 450f51deb2a2..eab274d17b5c 100644
> --- a/drivers/target/iscsi/iscsi_target_login.c
> +++ b/drivers/target/iscsi/iscsi_target_login.c
> @@ -17,6 +17,7 @@
>   
> **/
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 

Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>

Re: [PATCH] cxgbit: use T6 specific macro to set force bit

2017-01-25 Thread Bart Van Assche
On Tue, 2017-01-24 at 17:07 +0530, Varun Prakash wrote:
> For T6 adapters use T6 specific macro to set force bit.

Thanks, I have applied this patch.

Bart.

[PATCH v3 09/37] RDS: IB: Remove an unused structure member

2017-01-20 Thread Bart Van Assche
Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: linux-r...@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rds-de...@oss.oracle.com
---
 net/rds/ib_mr.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
index 1c754f4acbe5..24c086db4511 100644
--- a/net/rds/ib_mr.h
+++ b/net/rds/ib_mr.h
@@ -45,7 +45,6 @@
 
 struct rds_ib_fmr {
struct ib_fmr   *fmr;
-   u64 *dma;
 };
 
 enum rds_ib_fr_state {
-- 
2.11.0



Re: [RFC] RESEND - rdmatool - tool for RDMA users

2017-01-19 Thread Bart Van Assche
On Thu, 2017-01-19 at 11:03 -0700, Jason Gunthorpe wrote:
> sysfs is unpopular because the 'one value per file' dogma is laregly
> unsuitable for complex mulit-value atomic changes which are common in
> netdev. You can force it to work, but it is pretty horrible..
> 
> It is also very expensive if you want to shuttle a lot of data, eg I
> could not see doing something like 'netstat' for IB through sysfs

Since the RDMA sysfs ABI defines a user space ABI and since user space
ABIs must be backwards compatible removing the existing sysfs ABI is
not an option. We will need to evaluate on a case-by-case basis whether
new functionality should use sysfs or whether another mechanism should
be used.

Bart.

Re: [RFC] RESEND - rdmatool - tool for RDMA users

2017-01-18 Thread Bart Van Assche
On 01/18/2017 10:31 AM, Jason Gunthorpe wrote:
> I think it depends on what this tool is supposed to cover, but based
> on the description, I would start with netlink-only.
> 
> The only place verbs covers a similar ground is in 'device
> capabilities' - for some of that you might want to open a new-uAPI
> verbs fd, but even the capability data from that would not be
> totally offensive to be accessed over netlink.
> 
> IMHO netlink should cover almost everything found in sysfs today.

We would need a very strong argument to introduce a netlink API that
duplicates existing sysfs API functionality. Since the sysfs API is
extensible, why not extend that API further? E.g. the SCST sysfs API
shows that more is possible with sysfs than what most kernel drivers
realize.

Bart.


[PATCH v2 12/26] IB: Convert ib_dma_*_coherent() argument type from u64 into dma_addr_t

2017-01-12 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
Reviewed-by: Leon Romanovsky <leo...@mellanox.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: linux-r...@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rds-de...@oss.oracle.com
---
 include/rdma/ib_verbs.h | 11 +++
 net/rds/ib.h|  6 +++---
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 89e80eb77e06..de8dfb61d2b6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3109,15 +3109,10 @@ static inline void ib_dma_sync_single_for_device(struct 
ib_device *dev,
  */
 static inline void *ib_dma_alloc_coherent(struct ib_device *dev,
   size_t size,
-  u64 *dma_handle,
+  dma_addr_t *dma_handle,
   gfp_t flag)
 {
-   dma_addr_t handle;
-   void *ret;
-
-   ret = dma_alloc_coherent(dev->dma_device, size, , flag);
-   *dma_handle = handle;
-   return ret;
+   return dma_alloc_coherent(dev->dma_device, size, dma_handle, flag);
 }
 
 /**
@@ -3129,7 +3124,7 @@ static inline void *ib_dma_alloc_coherent(struct 
ib_device *dev,
  */
 static inline void ib_dma_free_coherent(struct ib_device *dev,
size_t size, void *cpu_addr,
-   u64 dma_handle)
+   dma_addr_t dma_handle)
 {
dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle);
 }
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 45ac8e8e58f4..d21ca88ab628 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -134,7 +134,7 @@ struct rds_ib_connection {
struct rds_ib_work_ring i_send_ring;
struct rm_data_op   *i_data_op;
struct rds_header   *i_send_hdrs;
-   u64 i_send_hdrs_dma;
+   dma_addr_t  i_send_hdrs_dma;
struct rds_ib_send_work *i_sends;
atomic_ti_signaled_sends;
 
@@ -144,7 +144,7 @@ struct rds_ib_connection {
struct rds_ib_incoming  *i_ibinc;
u32 i_recv_data_rem;
struct rds_header   *i_recv_hdrs;
-   u64 i_recv_hdrs_dma;
+   dma_addr_t  i_recv_hdrs_dma;
struct rds_ib_recv_work *i_recvs;
u64 i_ack_recv; /* last ACK received */
struct rds_ib_refill_cache i_cache_incs;
@@ -161,7 +161,7 @@ struct rds_ib_connection {
struct rds_header   *i_ack;
struct ib_send_wr   i_ack_wr;
struct ib_sge   i_ack_sge;
-   u64 i_ack_dma;
+   dma_addr_t  i_ack_dma;
unsigned long   i_ack_queued;
 
/* Flow control related information
-- 
2.11.0



[PATCH v2 11/26] RDS: IB: Remove an unused structure member

2017-01-12 Thread Bart Van Assche
Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: linux-r...@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rds-de...@oss.oracle.com
---
 net/rds/ib_mr.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
index 1c754f4acbe5..24c086db4511 100644
--- a/net/rds/ib_mr.h
+++ b/net/rds/ib_mr.h
@@ -45,7 +45,6 @@
 
 struct rds_ib_fmr {
struct ib_fmr   *fmr;
-   u64 *dma;
 };
 
 enum rds_ib_fr_state {
-- 
2.11.0



[PATCH 7/9] RDS: IB: Remove an unused structure member

2017-01-10 Thread Bart Van Assche
Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
Cc: Santosh Shilimkar <ssant...@kernel.org>
Cc: Santosh Shilimkar <santosh.shilim...@oracle.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: linux-r...@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rds-de...@oss.oracle.com
---
 net/rds/ib_mr.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h
index 1c754f4acbe5..24c086db4511 100644
--- a/net/rds/ib_mr.h
+++ b/net/rds/ib_mr.h
@@ -45,7 +45,6 @@
 
 struct rds_ib_fmr {
struct ib_fmr   *fmr;
-   u64 *dma;
 };
 
 enum rds_ib_fr_state {
-- 
2.11.0



[PATCH 9/9] treewide: Inline ib_dma_map_*() functions

2017-01-10 Thread Bart Van Assche
Almost all changes in this patch except the removal of local variables
that became superfluous and the actual removal of the ib_dma_map_*()
functions have been generated as follows:

git grep -lE 'ib_(sg_|)dma_' |
  xargs -d\\n \
sed -i -e 
's/\([^[:alnum:]_]\)ib_dma_\([^(]*\)(\&\([^,]\+\),/\1dma_\2(\3.dma_device,/g' \
   -e 
's/\([^[:alnum:]_]\)ib_dma_\([^(]*\)(\([^,]\+\),/\1dma_\2(\3->dma_device,/g' \
   -e 's/ib_sg_dma_\(len\|address\)(\([^,]\+\), /sg_dma_\1(/g'

Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Cc: Andreas Dilger <andreas.dil...@intel.com>
Cc: Anna Schumaker <anna.schuma...@netapp.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Eric Van Hensbergen <eri...@gmail.com>
Cc: James Simmons <jsimm...@infradead.org>
Cc: Latchesar Ionkov <lu...@ionkov.net>
Cc: Oleg Drokin <oleg.dro...@intel.com>
Cc: Ron Minnich <rminn...@sandia.gov>
Cc: Trond Myklebust <trond.mykleb...@primarydata.com>
Cc: de...@driverdev.osuosl.org
Cc: linux-...@vger.kernel.org
Cc: linux-n...@lists.infradead.org
Cc: linux-r...@vger.kernel.org
Cc: lustre-de...@lists.lustre.org
Cc: netdev@vger.kernel.org
Cc: rds-de...@oss.oracle.com
Cc: target-de...@vger.kernel.org
Cc: v9fs-develo...@lists.sourceforge.net
---
 drivers/infiniband/core/mad.c  |  28 +--
 drivers/infiniband/core/rw.c   |  30 ++-
 drivers/infiniband/core/umem.c |   4 +-
 drivers/infiniband/core/umem_odp.c |   6 +-
 drivers/infiniband/hw/mlx4/cq.c|   2 +-
 drivers/infiniband/hw/mlx4/mad.c   |  28 +--
 drivers/infiniband/hw/mlx4/mr.c|   4 +-
 drivers/infiniband/hw/mlx4/qp.c|  10 +-
 drivers/infiniband/hw/mlx5/mr.c|   4 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  20 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|  22 +--
 drivers/infiniband/ulp/iser/iscsi_iser.c   |   6 +-
 drivers/infiniband/ulp/iser/iser_initiator.c   |  38 ++--
 drivers/infiniband/ulp/iser/iser_memory.c  |  12 +-
 drivers/infiniband/ulp/iser/iser_verbs.c   |   2 +-
 drivers/infiniband/ulp/isert/ib_isert.c|  60 +++---
 drivers/infiniband/ulp/srp/ib_srp.c|  50 +++--
 drivers/infiniband/ulp/srpt/ib_srpt.c  |  10 +-
 drivers/nvme/host/rdma.c   |  22 +--
 drivers/nvme/target/rdma.c |  20 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h|  14 +-
 include/rdma/ib_verbs.h| 218 -
 net/9p/trans_rdma.c|  12 +-
 net/rds/ib.h   |  39 
 net/rds/ib_cm.c|  18 +-
 net/rds/ib_fmr.c   |  10 +-
 net/rds/ib_frmr.c  |   8 +-
 net/rds/ib_rdma.c  |   6 +-
 net/rds/ib_recv.c  |  14 +-
 net/rds/ib_send.c  |  28 +--
 net/sunrpc/xprtrdma/fmr_ops.c  |   6 +-
 net/sunrpc/xprtrdma/frwr_ops.c |   6 +-
 net/sunrpc/xprtrdma/rpc_rdma.c |  14 +-
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   4 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c|   8 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c  |  14 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   8 +-
 net/sunrpc/xprtrdma/verbs.c|   8 +-
 38 files changed, 275 insertions(+), 538 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index a009f7132c73..2d51f0bdc13f 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1152,21 +1152,21 @@ int ib_send_mad(struct ib_mad_send_wr_private 
*mad_send_wr)
 
mad_agent = mad_send_wr->send_buf.mad_agent;
sge = mad_send_wr->sg_list;
-   sge[0].addr = ib_dma_map_single(mad_agent->device,
+   sge[0].addr = dma_map_single(mad_agent->device->dma_device,
mad_send_wr->send_buf.mad,
sge[0].length,
DMA_TO_DEVICE);
-   if (unlikely(ib_dma_mapping_error(mad_agent->device, sge[0].addr)))
+   if (unlikely(dma_mapping_error(mad_agent->device->dma_device, 
sge[0].addr)))
return -ENOMEM;
 
mad_send_wr->header_mapping = sge[0].addr;
 
-   sge[1].addr = ib_dma_map_single(mad_agent->device,
+   sge[1].addr = dma_map_single(mad_agent->device->dma_device,
ib_get_payload(mad_send_wr),
sge[1].length,

[PATCH 8/9] IB: Convert ib_dma_*_coherent() argument type from u64 into dma_addr_t

2017-01-10 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: linux-r...@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rds-de...@oss.oracle.com
---
 include/rdma/ib_verbs.h | 11 +++
 net/rds/ib.h|  6 +++---
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 89e80eb77e06..de8dfb61d2b6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3109,15 +3109,10 @@ static inline void ib_dma_sync_single_for_device(struct 
ib_device *dev,
  */
 static inline void *ib_dma_alloc_coherent(struct ib_device *dev,
   size_t size,
-  u64 *dma_handle,
+  dma_addr_t *dma_handle,
   gfp_t flag)
 {
-   dma_addr_t handle;
-   void *ret;
-
-   ret = dma_alloc_coherent(dev->dma_device, size, , flag);
-   *dma_handle = handle;
-   return ret;
+   return dma_alloc_coherent(dev->dma_device, size, dma_handle, flag);
 }
 
 /**
@@ -3129,7 +3124,7 @@ static inline void *ib_dma_alloc_coherent(struct 
ib_device *dev,
  */
 static inline void ib_dma_free_coherent(struct ib_device *dev,
size_t size, void *cpu_addr,
-   u64 dma_handle)
+   dma_addr_t dma_handle)
 {
dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle);
 }
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 45ac8e8e58f4..d21ca88ab628 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -134,7 +134,7 @@ struct rds_ib_connection {
struct rds_ib_work_ring i_send_ring;
struct rm_data_op   *i_data_op;
struct rds_header   *i_send_hdrs;
-   u64 i_send_hdrs_dma;
+   dma_addr_t  i_send_hdrs_dma;
struct rds_ib_send_work *i_sends;
atomic_ti_signaled_sends;
 
@@ -144,7 +144,7 @@ struct rds_ib_connection {
struct rds_ib_incoming  *i_ibinc;
u32 i_recv_data_rem;
struct rds_header   *i_recv_hdrs;
-   u64 i_recv_hdrs_dma;
+   dma_addr_t  i_recv_hdrs_dma;
struct rds_ib_recv_work *i_recvs;
u64 i_ack_recv; /* last ACK received */
struct rds_ib_refill_cache i_cache_incs;
@@ -161,7 +161,7 @@ struct rds_ib_connection {
struct rds_header   *i_ack;
struct ib_send_wr   i_ack_wr;
struct ib_sge   i_ack_sge;
-   u64 i_ack_dma;
+   dma_addr_t  i_ack_dma;
unsigned long   i_ack_queued;
 
/* Flow control related information
-- 
2.11.0



Re: [PATCH] linux/types.h: enable endian checks for all sparse builds

2016-12-09 Thread Bart Van Assche
On 12/08/16 22:40, Madhani, Himanshu wrote:
> We’ll take a look and send patches to resolve these warnings.

Thanks!

Bart.



Re: [PATCH] linux/types.h: enable endian checks for all sparse builds

2016-12-08 Thread Bart Van Assche
On 12/07/16 21:54, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 05:21:47AM +0000, Bart Van Assche wrote:
>> Additionally, there are notable exceptions to the rule that most drivers
>> are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it
>> would remain possible to check such drivers with sparse without enabling
>> endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__
>> into e.g. #ifndef __DONT_CHECK_ENDIAN__?
>
> The right thing is probably just to fix these, isn't it?
> Until then, why not just ignore the warnings?

Neither option is realistic. With endian-checking enabled the qla2xxx 
driver triggers so many warnings that it becomes a real challenge to 
filter the non-endian warnings out manually:

$ for f in "" CF=-D__CHECK_ENDIAN__; do make M=drivers/scsi/qla2xxx C=2\
 $f |  -c ': warning:'; done
4
752

If you think it would be easy to fix the endian warnings triggered by 
the qla2xxx driver, you are welcome to try to fix these.

Bart.


Re: [PATCH] linux/types.h: enable endian checks for all sparse builds

2016-12-07 Thread Bart Van Assche
On 12/07/16 18:29, Michael S. Tsirkin wrote:
> By now, linux is mostly endian-clean. Enabling endian-ness
> checks for everyone produces about 200 new sparse warnings for me -
> less than 10% over the 2000 sparse warnings already there.
>
> Not a big deal, OTOH enabling this helps people notice
> they are introducing new bugs.
>
> So let's just drop __CHECK_ENDIAN__. Follow-up patches
> can drop distinction between __bitwise and __bitwise__.

Hello Michael,

This patch makes a whole bunch of ccflags-y += -D__CHECK_ENDIAN__ 
statements obsolete. Have you considered to remove these statements?

Additionally, there are notable exceptions to the rule that most drivers 
are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it 
would remain possible to check such drivers with sparse without enabling 
endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__ 
into e.g. #ifndef __DONT_CHECK_ENDIAN__?

Thanks,

Bart.


Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?

2016-04-11 Thread Bart Van Assche

On 04/11/2016 11:37 AM, Jesper Dangaard Brouer wrote:

On Mon, 11 Apr 2016 14:46:25 -0300
Thadeu Lima de Souza Cascardo  wrote:


So, Jesper, please take into consideration that this pool design
would rather be per device. Otherwise, we allow some device to write
into another's device/driver memory.


Yes, that was my intended use.  I want to have a page-pool per device.
I actually, want to go as far as a page-pool per NIC HW RX-ring queue.

Because the other use-case for the page-pool is zero-copy RX.

The NIC HW trick is that we today can create a HW filter in the NIC
(via ethtool) and place that traffic into a separate RX queue in the
NIC.  Lets say matching NFS traffic or guest traffic. Then we can allow
RX zero-copy of these pages, into the application/guest, somehow
binding it to RX queue, e.g. introducing a "cross-domain-id" in the
page-pool page that need to match.


I think it is important to keep in mind that using a page pool for 
zero-copy RX is specific to protocols that are based on TCP/IP. 
Protocols like FC, SRP and iSER have been designed such that the side 
that allocates the buffers also initiates the data transfer (the target 
side). With TCP/IP however transferring data and allocating receive 
buffers happens by opposite sides of the connection.


Bart.


Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?

2016-04-07 Thread Bart Van Assche

On 04/07/16 07:38, Christoph Hellwig wrote:

This is also very interesting for storage targets, which face the same
issue.  SCST has a mode where it caches some fully constructed SGLs,
which is probably very similar to what NICs want to do.


I think a cached allocator for page sets + the scatterlists that 
describe these page sets would not only be useful for SCSI target 
implementations but also for the Linux SCSI initiator. Today the scsi-mq 
code reserves space in each scsi_cmnd for a scatterlist of 
SCSI_MAX_SG_SEGMENTS. If scatterlists would be cached together with page 
sets less memory would be needed per scsi_cmnd. See also 
scsi_mq_setup_tags() and scsi_alloc_sgtable().


Bart.


Re: [PATCH] Revert "netpoll: Fix extra refcount release in netpoll_cleanup()"

2016-04-06 Thread Bart Van Assche

On 04/05/16 13:58, Bjorn Helgaas wrote:

This reverts commit 543e3a8da5a4c453e992d5351ef405d5e32f27d7.

Direct callers of __netpoll_setup() depend on it to set np->dev,
so we can't simply move that assignment up to netpoll_stup().

Reported-by: Bart Van Assche <bart.vanass...@sandisk.com>
Signed-off-by: Bjorn Helgaas <bhelg...@google.com>


Tested-by: Bart Van Assche <bart.vanass...@sandisk.com>

Thanks!

Bart.


Re: [net-next][PATCH 11/13] RDS: IB: add Fastreg MR (FRMR) detection support

2016-02-22 Thread Bart Van Assche

On 02/21/16 19:36, David Miller wrote:

From: Santosh Shilimkar 
Date: Sat, 20 Feb 2016 03:30:02 -0800


@@ -54,6 +55,8 @@ module_param(rds_ib_mr_8k_pool_size, int, 0444);
  MODULE_PARM_DESC(rds_ib_mr_8k_pool_size, " Max number of 8K mr per HCA");
  module_param(rds_ib_retry_count, int, 0444);
  MODULE_PARM_DESC(rds_ib_retry_count, " Number of hw retries before reporting an 
error");
+module_param(prefer_frmr, bool, 0444);
+MODULE_PARM_DESC(prefer_frmr, "Preferred MR method if both FMR and FRMR 
supported");


Sorry, you're going to have to create a real run time method to configure
this parameter.

I'm strongly against module parameters.

Please don't go into details about why this might be difficult to do,
I'm totally not interested.  Doing things properly is sometimes not
easy, that's life.


Hello Santosh,

What is the purpose of the prefer_frmr kernel module parameter ? Is this 
a parameter that is useful to RDS users or is its only purpose to allow 
developers of the RDS module to test both the FMR and FRMR code paths on 
hardware that supports both MR methods ?


Bart.



Re: BUG/ spinlock lockup, 2.6.24

2008-02-15 Thread Bart Van Assche
2008/2/15 Denys Fedoryshchenko [EMAIL PROTECTED]:
  I have random crashes, at least once per week. It is very difficult to catch
  error message, and only recently i setup netconsole. Now i got crash, but
  there is no traceback and only single line came over netconsole, mentioned
  before.

Did you already run memtest ? You can run memtest by booting from the
Knoppix CD-ROM or DVD. Most Linux distributions also have included
memtest on their bootable distribution CD's/DVD's.

Bart Van Assche.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html