[PATCH] Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"
Revert the patch mentioned in the subject because it breaks at least the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi daemon to fail to start: Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot(). Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities. Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services. Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting. Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'. Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack. Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets") Cc: Maciej Żenczykowski Cc: Eric Dumazet Signed-off-by: Bart Van Assche --- net/core/sock.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index f333d75ef1a9..bcc41829a16d 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -728,22 +728,9 @@ int sock_setsockopt(struct socket *sock, int level, int optname, sock_valbool_flag(sk, SOCK_DBG, valbool); break; case SO_REUSEADDR: - val = (valbool ? SK_CAN_REUSE : SK_NO_REUSE); - if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) && - inet_sk(sk)->inet_num && - (sk->sk_reuse != val)) { - ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN; - break; - } - sk->sk_reuse = val; + sk->sk_reuse = (valbool ? SK_CAN_REUSE : SK_NO_REUSE); break; case SO_REUSEPORT: - if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) && - inet_sk(sk)->inet_num && - (sk->sk_reuseport != valbool)) { - ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN; - break; - } sk->sk_reuseport = valbool; break; case SO_TYPE: -- 2.17.0
Re: [PATCH net-next 5/5] net/smc: return booleans instead of integers
On Fri, 2018-01-26 at 09:28 +0100, Ursula Braun wrote: > From: Gustavo A. R. Silva> > Return statements in functions returning bool should use > true/false instead of 1/0. > > This issue was detected with the help of Coccinelle. > > Signed-off-by: Gustavo A. R. Silva > Signed-off-by: Ursula Braun > --- > net/smc/smc.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/smc/smc.h b/net/smc/smc.h > index bfbe20234105..9518986c97b1 100644 > --- a/net/smc/smc.h > +++ b/net/smc/smc.h > @@ -252,12 +252,12 @@ static inline int smc_uncompress_bufsize(u8 compressed) > static inline bool using_ipsec(struct smc_sock *smc) > { > return (smc->clcsock->sk->sk_policy[0] || > - smc->clcsock->sk->sk_policy[1]) ? 1 : 0; > + smc->clcsock->sk->sk_policy[1]) ? true : false; > } Hello Ursula, If you ever have to touch this code again, please follow the style of other kernel code and leave out the "? true : false" part and also the parentheses. Both are superfluous. Thanks, Bart.
Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution
On 01/05/18 22:30, Dan Williams wrote: On Fri, Jan 5, 2018 at 6:22 PM, Eric W. Biedermanwrote: Please expand this. It is not clear what the static analysis is looking for. Have a clear description of what is being fixed is crucial for allowing any of these changes. For the details given in the change description what I read is magic changes because a magic process says this code is vulnerable. Yes, that was my first reaction to the patches as well, I try below to add some more background and guidance, but in the end these are static analysis reports across a wide swath of sub-systems. It's going to take some iteration with domain experts to improve the patch descriptions, and that's the point of this series, to get the better trained eyes from the actual sub-system owners to take a look at these reports. More information about what the static analysis is looking for would definitely be welcome. Additionally, since the analysis tool is not publicly available, how are authors of new kernel code assumed to verify whether or not their code needs to use nospec_array_ptr()? How are reviewers of kernel code assumed to verify whether or not nospec_array_ptr() is missing where it should be used? Since this patch series only modifies the upstream kernel, how will out-of-tree drivers be fixed, e.g. the nVidia driver and the Android drivers? Thanks, Bart.
Re: [Cluster-devel] [PATCH 00/12] drop unneeded newline
On Tue, 2018-01-02 at 15:00 +0100, Julia Lawall wrote: > On Tue, 2 Jan 2018, Bob Peterson wrote: > > - Original Message - > > > - Original Message - > > > > > Still, the GFS2 and DLM code has a plethora of broken-up printk messages, > > and I don't like the thought of re-combining them all. > > Actually, the point of the patch was to remove the unnecessary \n at the > end of the string, because log_print will add another one. If you prefer > to keep the string broken up, I can resend the patch in that form, but > without the unnecessary \n. Please combine any user-visible strings into a single line for which the unneeded newline is dropped since these strings are modified anyway by your patch. Thanks, Bart.
Re: [PATCH v2 0/5] Support for generalized use of make C={1,2} via a wrapper program
On Mon, 2017-12-18 at 10:46 -0700, Jason Gunthorpe wrote: > On Sun, Dec 17, 2017 at 10:00:17PM -0800, Joe Perches wrote: > > > > Today when we run checkers we get so many warnings it is too hard to > > > make any sense of it. > > > > Here is a list of the checkpatch messages for drivers/infiniband > > sorted by type. > > > > Many of these might be corrected by using > > > > $ ./scripts/checkpatch.pl -f --fix-inplace --types= \ > > $(git ls-files drivers/infiniband/) > > How many of these do you think it is worth to fix? > > We do get a steady trickle of changes in this topic every cycle. > > Is it better to just do a big number of them all at once? Do you have > an idea how disruptive this kind of work is to the whole patch flow > eg new patches no longer applying to for-next, backports no longer > applying, merge conflicts? In my opinion patches that only change the coding style and do not change any functionality are annoying. Before posting a patch that fixes a bug the change history (git log -p) has to be cheched to figure out which patch introduced the bug. Patches that only change coding style pollute the change history. Bart.
Re: [PATCH net-next 00/10] net/smc: updates 2017-09-20
On Wed, 2017-09-20 at 13:58 +0200, Ursula Braun wrote: > here is a collection of small smc-patches built for net-next improving > the smc code in different areas. Hello Ursula, Can you provide us an update for the timeline of the plan to transition from PF_SMC to PF_INET/PF_INET6 + SOCK_STREAM? See also https://www.mail-archive.com/netdev@vger.kernel.org/msg166744.html. Thanks, Bart.
Re: [PATCH] net/smc: mark as BROKEN due to remote memory exposure
On Sun, 2017-05-14 at 11:51 -0400, David Miller wrote: > From: Christoph Hellwig> Date: Sun, 14 May 2017 07:58:48 +0200 > > > this patch has not been superceeded by anything, can you explain why > > it has been marked as such in patchworks? > > I think you're being overbearing by requiring this to be marked BROKEN > and I would like you to explore other ways with the authors to fix > whatever perceived problems you think SMC has. > > You claim that this is somehow "urgent" is false. You can ask > distributions to disable SMC or whatever in the short term if it > reallly, truly, bothers you. Hello Dave, There is agreement that the user-space API for using the SMC protocol must be changed, namely by dropping AF_SMC and by making applications use the SMC protocol through socket(AF_INET..., SOCK_STREAM, ...). What is your plan to avoid that applications start using and depending on AF_SMC? Thanks, Bart.
Re: [PATCH] net/smc: mark as BROKEN due to remote memory exposure
On Wed, 2017-05-10 at 09:26 +0200, Christoph Hellwig wrote: > The driver has a lot of quality issues due to the lack of RDMA-side > review, and explicitly bypasses APIs to register all memory once a > connection is made, and thus allows remote access to memoery. > > Mark it as broken until at least that part is fixed. Since this is the only way to get the BROKEN marker in the v4.11 stable kernel series: Acked-by: Bart Van Assche <bart.vanass...@sandisk.com>
Re: [RFC iproute2 0/8] RDMA tool
On Sun, 2017-05-07 at 12:20 +0200, Jiri Pirko wrote: > Sat, May 06, 2017 at 04:40:24PM CEST, bart.vanass...@sandisk.com wrote: > > On Sat, 2017-05-06 at 12:40 +0200, Jiri Pirko wrote: > > > Thu, May 04, 2017 at 08:10:54PM CEST, bart.vanass...@sandisk.com wrote: > > > > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote: > > > > > Following our discussion both in mailing list [1] and at the LPC 2016 > > > > > [2], > > > > > we would like to propose this RDMA tool to be part of iproute2 package > > > > > and finally improve this situation. > > > > > > > > Although I really appreciate your work: can you clarify why you would > > > > like to > > > > add *RDMA* functionality to an *IP routing* tool? I haven't found any > > > > motivation > > > > for adding RDMA functionality to iproute2 in [1]. > > > > > > Bart, please realize that iproute2 is much more than "*IP routing* tool". > > > I understand you got confused by the name. Please see sources. Your > > > comment > > > is totally pointless... > > > > I asked for a clarification that should have been in the cover letter but > > that > > was missing from that cover letter. So I think that was the right thing to > > do > > I think that was just complete misunderstanding about what iproute2 is. Hello Jiri, I do not agree with your reply. The abbreviation "IP" occurs in the package name and that is a reference to the "Internet Protocol". As far as I know originally the iproute2 package contained only tools related to the Internet Protocol. Other tools, e.g. the tipc tool, were added later on. What I'm wondering about is whether it really is a good idea to add tools like tipc and rdma to the iproute2 package. The iproute2 package is so essential that it gets installed on every Linux system, including embedded systems and smartphones based on Linux. Several companies maintain embedded Linux distributions and tools to build software images. These tools provide a user interface that allows to select what packages go into such an image. Adding tools like tipc and rdma to the iproute2 package makes it harder than necessary for those who build software images for embedded devices to minimize the size of such an image. As you probably know even today the size of a software image still matters for embedded devices. Something else I have been wondering about is whether bundling the tipc and rdma tools in the iproute2 package will make the job harder of people who build Android ROMs? The ip tool is present in every Android ROM, and the size of these ROMs matters because the larger these ROMs are the less space remains for apps. Bart.
Re: [RFC iproute2 0/8] RDMA tool
On Sat, 2017-05-06 at 12:40 +0200, Jiri Pirko wrote: > Thu, May 04, 2017 at 08:10:54PM CEST, bart.vanass...@sandisk.com wrote: > > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote: > > > Following our discussion both in mailing list [1] and at the LPC 2016 [2], > > > we would like to propose this RDMA tool to be part of iproute2 package > > > and finally improve this situation. > > > > Although I really appreciate your work: can you clarify why you would like > > to > > add *RDMA* functionality to an *IP routing* tool? I haven't found any > > motivation > > for adding RDMA functionality to iproute2 in [1]. > > Bart, please realize that iproute2 is much more than "*IP routing* tool". > I understand you got confused by the name. Please see sources. Your comment > is totally pointless... I asked for a clarification that should have been in the cover letter but that was missing from that cover letter. So I think that was the right thing to do instead of pointless. BTW, can you explain why you are using an e-mail address that is hiding that you are a Mellanox employee? Bart.
Re: [RFC iproute2 0/8] RDMA tool
On Thu, 2017-05-04 at 21:45 +0300, Leon Romanovsky wrote: > It is not hard to create new tool, the hardest part is to ensure that it is > part of the distributions. Did you count how many months we are trying to > add rdma-core to debian? Hello Leon, Sorry but I was not aware that the effort to add rdma-core to Debian was taking that long. Please let me know if I can help with that effort. Bart.
Re: [RFC iproute2 0/8] RDMA tool
On Thu, 2017-05-04 at 21:25 +0300, Leon Romanovsky wrote: > On Thu, May 04, 2017 at 06:10:54PM +0000, Bart Van Assche wrote: > > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote: > > > Following our discussion both in mailing list [1] and at the LPC 2016 [2], > > > we would like to propose this RDMA tool to be part of iproute2 package > > > and finally improve this situation. > > > > Hello Leon, > > > > Although I really appreciate your work: can you clarify why you would like > > to > > add *RDMA* functionality to an *IP routing* tool? I haven't found any > > motivation > > for adding RDMA functionality to iproute2 in [1]. > > We are planning to reuse the same infrastructure provided by iproute2, > like netlink parsing, access to distributions, same CLI and same standards. > > Right now, RDMA is already tightened to netdev: iWARP, RoCE, IPoIB, HFI-VNIC. > Many drivers (mlx, qed, i40, cxgb) are sharing code between net and > RDMA. > > I do expect that iproute2 will be installed on every machine with any > type of connection, including IB and OPA. > > So I think that it is enough to be part of that suite and don't invent > our own for one specific tool. Hello Leon, Sorry but to me that sounds like a weak argument for including RDMA functionality in iproute2. There is already a library for communication over netlink sockets, namely libnl. Is there functionality that is in iproute2 but not in libnl and that is needed for the new tool? If so, have you considered to create a new library for that functionality? Thanks, Bart.
Re: [RFC iproute2 0/8] RDMA tool
On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote: > Following our discussion both in mailing list [1] and at the LPC 2016 [2], > we would like to propose this RDMA tool to be part of iproute2 package > and finally improve this situation. Hello Leon, Although I really appreciate your work: can you clarify why you would like to add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation for adding RDMA functionality to iproute2 in [1]. Thanks, Bart.
Re: net/smc and the RDMA core
On Tue, 2017-05-02 at 14:25 +0200, Ursula Braun wrote: > if you can point out specific issues, we will be happy to work with you > to get them addressed! Hello Ursula, My list of issues that I would like to see addressed can be found below. Doug, Christoph and others may have additional inputs. The issues that have not yet been mentioned in other e-mails are: - The SMC driver only supports one RDMA transport type (RoCE v1) but none of the other RDMA transport types (RoCE v2, IB and iWARP). New RDMA drivers should support all RDMA transport types transparently. The traditional approach to support multiple RDMA transport types is by using the RDMA/CM to establish connections. - The implementation of the SMC driver only supports RoCEv1. This is a very unfortunate choice because: - RoCEv1 is not routable and hence is limited to a single Ethernet broadcast domain. - RoCEv1 packets escape a whole bunch of mechanisms that only work for IP packets. Firewalls pass all RoCEv1 packets and switches do not restrict RoCEv1 packets to a single VLAN. This means that if the network configuration is changed after an SMC connection has been set up such that IP communication between the endpoints of an SMC connection is blocked that the SMC RoCEv1 packets will not be blocked by the network equipment of which the configuration has just been changed. - As already mentioned by Christoph, the SMC implementation uses RDMA calls that probably will be deprecated soon (ib_create_cq()) and should use the RDMA R/W API instead of building sge lists itself. Bart.
Re: net/smc and the RDMA core
On Tue, 2017-05-02 at 14:41 +0200, Ursula Braun wrote: > On 05/01/2017 07:55 PM, Parav Pandit wrote: > > Hi Bart, Ursula, Dave, > > > > I am particularly concerned about SMC as address family. > > It should not be treated as address family, but rather an additional > > protocol similar for socket type SOCK_STREAM. > > We tried to avoid changes of the kernel TCP code. A new address family > seemed to be a feasible way to achieve this. Hello Ursula, I agree with Parav that introducing a new address family for SMC was an unfortunate choice. As one can see in e.g. the implementation of the SCTP protocol no changes to the TCP implementation are needed to add support for a new SOCK_STREAM protocol. I think the SCTP implementation uses inet_register_protosw() to register itself dynamically. Bart.
Re: net/smc and the RDMA core
On Mon, 2017-05-01 at 18:33 +0200, Christoph Hellwig wrote: > Hi Ursual, hi netdev reviewers, > > how did the smc protocol manage to get merged without any review > on linux-rdma at all? As the results it seems it's very substandard > in terms of RDMA API usage, e.g. it neither uses the proper CQ API > nor the RDMA R/W API, and other will probably find additional issues > as well. Hello Dave and Ursula, It seems very rude to me to have merged the SMC protocol driver without having involved the linux-rdma community. Anyway, I have the following questions for Dave and Ursula: * Since the Linux kernel is standards based: where can we find the standard that defines the SMC wire protocol? If this protocol has not been standardized yet: in what file (other than *.[ch]) in the Linux kernel tree has this protocol been documented? * What are the differences between the SMC protocol, the SDP protocol and the rsockets protocol? How do existing implementations for these protocols compare to each other from a performance point of view? If no performance comparison between these protocols is available, shouldn't the performance of these protocols have been compared with each other before a review of the SMC driver even started? * What are the reasons why the SDP driver was never accepted upstream? Do the arguments why SDP was not accepted upstream also apply to the SMC driver (SDP = Sockets Direct Protocol)? * Since SMC has to be selected by specifying AF_SMC, how are users expected to specify whether AF_INET, AF_INET6 or yet another address family should be used to set up a connection between SMC endpoints? * Is the SMC driver limited to RoCE? Are you aware that the rsockets library supports multiple transport layers (RoCE, IB and iWARP)? * Since functionality that is similar what the SMC driver provides already exists in user space (rsockets), why has this functionality been reimplemented as a kernel driver (SMC)? Bart.
Re: [patch] net/mlx4: && vs & typo
On 02/28/2017 02:23 PM, Joe Perches wrote: > On Tue, 2017-02-28 at 15:35 +0000, Bart Van Assche wrote: >> On Tue, 2017-02-28 at 15:02 +0300, Dan Carpenter wrote: >>> Bitwise & was obviously intended here. > [] >>> diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h > [] >>> @@ -109,7 +109,7 @@ static inline void (u8 *addr, u64 mac) >>> int i; >>> >>> for (i = ETH_ALEN; i > 0; i--) { >>> - addr[i - 1] = mac && 0xFF; >>> + addr[i - 1] = mac & 0xFF; >>> mac >>= 8; >>> } >>> } >> >> Is this the only place where such a loop occurs? > > Seems to be. > >> Should a put_unaligned_be48() >> function be introduced? > > Why? This is used exactly once. Really? Here is an example of another open-coded version of put_unaligned_be48() from arch/mips/cavium-octeon/octeon-platform.c: new_mac[0] = (mac >> 40) & 0xff; new_mac[1] = (mac >> 32) & 0xff; new_mac[2] = (mac >> 24) & 0xff; new_mac[3] = (mac >> 16) & 0xff; new_mac[4] = (mac >> 8) & 0xff; new_mac[5] = mac & 0xff; Bart.
Re: [patch] net/mlx4: && vs & typo
On Tue, 2017-02-28 at 15:02 +0300, Dan Carpenter wrote: > Bitwise & was obviously intended here. > > Fixes: 745d8ae4622c ("net/mlx4: Spoofcheck and zero MAC can't coexist") > Signed-off-by: Dan Carpenter> --- > Applies to net.git. > > diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h > index e965e5090d96..a858bcb6220b 100644 > --- a/include/linux/mlx4/driver.h > +++ b/include/linux/mlx4/driver.h > @@ -109,7 +109,7 @@ static inline void mlx4_u64_to_mac(u8 *addr, u64 mac) > int i; > > for (i = ETH_ALEN; i > 0; i--) { > - addr[i - 1] = mac && 0xFF; > + addr[i - 1] = mac & 0xFF; > mac >>= 8; > } > } Is this the only place where such a loop occurs? Should a put_unaligned_be48() function be introduced? Bart.
Re: [RFC v3 01/11] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) documentation
On Tue, 2017-02-07 at 12:23 -0800, Vishwanathapura, Niranjana wrote: > diff --git a/Documentation/infiniband/hfi_vnic.txt > b/Documentation/infiniband/hfi_vnic.txt > new file mode 100644 > index 000..c6c801e > --- /dev/null > +++ b/Documentation/infiniband/hfi_vnic.txt > @@ -0,0 +1,102 @@ > +Intel Omni-Path Host Fabric Interface (HFI) Virtual Network Interface > +Controller (VNIC) feature supports Ethernet functionality over Omni-Path > +fabric by encapsulating the Ethernet packets between HFI nodes. Please add a reference to the document that defines the Ethernet encapsulation wire protocol. > +The patterns of exchanges of Omni-Path encapsulated Ethernet packets > +involves one or more virtual Ethernet switches overlaid on the Omni-Path > +fabric topology. A subset of HFI nodes on the Omni-Path fabric are > +permitted to exchange encapsulated Ethernet packets across a particular > +virtual Ethernet switch. The virtual Ethernet switches are logical > +abstractions achieved by configuring the HFI nodes on the fabric for > +header generation and processing. In the simplest configuration all HFI > +nodes across the fabric exchange encapsulated Ethernet packets over a > +single virtual Ethernet switch. A virtual Ethernet switch, is effectively > +an independent Ethernet network. The configuration is performed by an > +Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM) > +application. HFI nodes can have multiple VNICs each connected to a > +different virtual Ethernet switch. The below diagram presents a case > +of two virtual Ethernet switches with two HFI nodes. Please elaborate this section. What is a virtual Ethernet switch? Is it a software entity or something that is implemented in hardware? Also, how are these independent Ethernet networks identified on the wire? The Linux kernel already supports IB partitions and Ethernet VLANs. How do these independent Ethernet networks compare to IB partitions and Ethernet VLANs? Which wire- level header contains the identity of these Ethernet networks? Is it possible to query from user space which Ethernet network a VNIC belongs to? If so, with which API and which tools? Thanks, Bart.
Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)
On Tue, 2017-02-07 at 16:54 -0800, Vishwanathapura, Niranjana wrote: > On Tue, Feb 07, 2017 at 09:58:50PM +0000, Bart Van Assche wrote: > > On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote: > > > This is Ethernet - not IP - encapsulation over a non-InfiniBand > > > device/protocol. > > > > That's more than clear from the cover letter. In my opinion the cover letter > > should explain why it is considered useful to have such a driver upstream > > and what the use cases are of encapsulating Ethernet frames inside RDMA > > packets. > > We believe on our HW, HFI VNIC design gives better hardware resource usage > which is also scalable and hence room for better performance. > Also as evident in the cover letter, it gives us better manageability by > defining virtual Ethernet switches overlaid on the fabric and > use standard Ethernet support provided by Linux. That kind of language is appropriate for a marketing brochure but not for a technical forum. Even reading your statement twice did not make me any wiser. You mentioned "better hardware resource usage". Compared to what? Is that perhaps compared to IPoIB? Since Ethernet frames have an extra header and are larger than IPoIB frames, how can larger frames result in better hardware resource usage? And what is a virtual Ethernet switch? Is this perhaps packet forwarding by software? If so, why are virtual Ethernet switches needed since the Linux networking stack already supports packet forwarding? Thanks, Bart.
Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)
On Tue, 2017-02-07 at 21:44 +, Hefty, Sean wrote: > This is Ethernet - not IP - encapsulation over a non-InfiniBand > device/protocol. That's more than clear from the cover letter. In my opinion the cover letter should explain why it is considered useful to have such a driver upstream and what the use cases are of encapsulating Ethernet frames inside RDMA packets. Bart.
Re: [RFC v3 00/11] HFI Virtual Network Interface Controller (VNIC)
On Tue, 2017-02-07 at 12:22 -0800, Vishwanathapura, Niranjana wrote: > Intel Omni-Path Host Fabric Interface (HFI) Virtual Network Interface > Controller (VNIC) feature supports Ethernet functionality over Omni-Path > fabric by encapsulating the Ethernet packets between HFI nodes. This may have been stated before, but what is missing from this description is an explanation of why accepting an Ethernet over RDMA driver in the upstream kernel is considered useful. We already have an IPoIB driver, so why to add another driver to the kernel tree for communicating IP packets over an RDMA network? Bart.
Re: [PATCH 4.10-rc3 09/13] iscsi: fix build errors when linux/phy*.h is removed from net/dsa.h
On Tue, 2017-01-31 at 19:19 +, Russell King wrote: > drivers/target/iscsi/iscsi_target_login.c:1135:7: error: implicit declaration > of function 'try_module_get' [-Werror=implicit-function-declaration] > > Add linux/module.h to iscsi_target_login.c. > > Signed-off-by: Russell King <rmk+ker...@armlinux.org.uk> > --- > drivers/target/iscsi/iscsi_target_login.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/target/iscsi/iscsi_target_login.c > b/drivers/target/iscsi/iscsi_target_login.c > index 450f51deb2a2..eab274d17b5c 100644 > --- a/drivers/target/iscsi/iscsi_target_login.c > +++ b/drivers/target/iscsi/iscsi_target_login.c > @@ -17,6 +17,7 @@ > > **/ > > #include > +#include > #include > #include > #include Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
Re: [PATCH] cxgbit: use T6 specific macro to set force bit
On Tue, 2017-01-24 at 17:07 +0530, Varun Prakash wrote: > For T6 adapters use T6 specific macro to set force bit. Thanks, I have applied this patch. Bart.
[PATCH v3 09/37] RDS: IB: Remove an unused structure member
Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com> Cc: David S. Miller <da...@davemloft.net> Cc: linux-r...@vger.kernel.org Cc: netdev@vger.kernel.org Cc: rds-de...@oss.oracle.com --- net/rds/ib_mr.h | 1 - 1 file changed, 1 deletion(-) diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h index 1c754f4acbe5..24c086db4511 100644 --- a/net/rds/ib_mr.h +++ b/net/rds/ib_mr.h @@ -45,7 +45,6 @@ struct rds_ib_fmr { struct ib_fmr *fmr; - u64 *dma; }; enum rds_ib_fr_state { -- 2.11.0
Re: [RFC] RESEND - rdmatool - tool for RDMA users
On Thu, 2017-01-19 at 11:03 -0700, Jason Gunthorpe wrote: > sysfs is unpopular because the 'one value per file' dogma is laregly > unsuitable for complex mulit-value atomic changes which are common in > netdev. You can force it to work, but it is pretty horrible.. > > It is also very expensive if you want to shuttle a lot of data, eg I > could not see doing something like 'netstat' for IB through sysfs Since the RDMA sysfs ABI defines a user space ABI and since user space ABIs must be backwards compatible removing the existing sysfs ABI is not an option. We will need to evaluate on a case-by-case basis whether new functionality should use sysfs or whether another mechanism should be used. Bart.
Re: [RFC] RESEND - rdmatool - tool for RDMA users
On 01/18/2017 10:31 AM, Jason Gunthorpe wrote: > I think it depends on what this tool is supposed to cover, but based > on the description, I would start with netlink-only. > > The only place verbs covers a similar ground is in 'device > capabilities' - for some of that you might want to open a new-uAPI > verbs fd, but even the capability data from that would not be > totally offensive to be accessed over netlink. > > IMHO netlink should cover almost everything found in sysfs today. We would need a very strong argument to introduce a netlink API that duplicates existing sysfs API functionality. Since the sysfs API is extensible, why not extend that API further? E.g. the SCST sysfs API shows that more is possible with sysfs than what most kernel drivers realize. Bart.
[PATCH v2 12/26] IB: Convert ib_dma_*_coherent() argument type from u64 into dma_addr_t
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> Reviewed-by: Leon Romanovsky <leo...@mellanox.com> Cc: David S. Miller <da...@davemloft.net> Cc: linux-r...@vger.kernel.org Cc: netdev@vger.kernel.org Cc: rds-de...@oss.oracle.com --- include/rdma/ib_verbs.h | 11 +++ net/rds/ib.h| 6 +++--- 2 files changed, 6 insertions(+), 11 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 89e80eb77e06..de8dfb61d2b6 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -3109,15 +3109,10 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev, */ static inline void *ib_dma_alloc_coherent(struct ib_device *dev, size_t size, - u64 *dma_handle, + dma_addr_t *dma_handle, gfp_t flag) { - dma_addr_t handle; - void *ret; - - ret = dma_alloc_coherent(dev->dma_device, size, , flag); - *dma_handle = handle; - return ret; + return dma_alloc_coherent(dev->dma_device, size, dma_handle, flag); } /** @@ -3129,7 +3124,7 @@ static inline void *ib_dma_alloc_coherent(struct ib_device *dev, */ static inline void ib_dma_free_coherent(struct ib_device *dev, size_t size, void *cpu_addr, - u64 dma_handle) + dma_addr_t dma_handle) { dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle); } diff --git a/net/rds/ib.h b/net/rds/ib.h index 45ac8e8e58f4..d21ca88ab628 100644 --- a/net/rds/ib.h +++ b/net/rds/ib.h @@ -134,7 +134,7 @@ struct rds_ib_connection { struct rds_ib_work_ring i_send_ring; struct rm_data_op *i_data_op; struct rds_header *i_send_hdrs; - u64 i_send_hdrs_dma; + dma_addr_t i_send_hdrs_dma; struct rds_ib_send_work *i_sends; atomic_ti_signaled_sends; @@ -144,7 +144,7 @@ struct rds_ib_connection { struct rds_ib_incoming *i_ibinc; u32 i_recv_data_rem; struct rds_header *i_recv_hdrs; - u64 i_recv_hdrs_dma; + dma_addr_t i_recv_hdrs_dma; struct rds_ib_recv_work *i_recvs; u64 i_ack_recv; /* last ACK received */ struct rds_ib_refill_cache i_cache_incs; @@ -161,7 +161,7 @@ struct rds_ib_connection { struct rds_header *i_ack; struct ib_send_wr i_ack_wr; struct ib_sge i_ack_sge; - u64 i_ack_dma; + dma_addr_t i_ack_dma; unsigned long i_ack_queued; /* Flow control related information -- 2.11.0
[PATCH v2 11/26] RDS: IB: Remove an unused structure member
Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com> Cc: David S. Miller <da...@davemloft.net> Cc: linux-r...@vger.kernel.org Cc: netdev@vger.kernel.org Cc: rds-de...@oss.oracle.com --- net/rds/ib_mr.h | 1 - 1 file changed, 1 deletion(-) diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h index 1c754f4acbe5..24c086db4511 100644 --- a/net/rds/ib_mr.h +++ b/net/rds/ib_mr.h @@ -45,7 +45,6 @@ struct rds_ib_fmr { struct ib_fmr *fmr; - u64 *dma; }; enum rds_ib_fr_state { -- 2.11.0
[PATCH 7/9] RDS: IB: Remove an unused structure member
Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> Cc: Santosh Shilimkar <ssant...@kernel.org> Cc: Santosh Shilimkar <santosh.shilim...@oracle.com> Cc: David S. Miller <da...@davemloft.net> Cc: linux-r...@vger.kernel.org Cc: netdev@vger.kernel.org Cc: rds-de...@oss.oracle.com --- net/rds/ib_mr.h | 1 - 1 file changed, 1 deletion(-) diff --git a/net/rds/ib_mr.h b/net/rds/ib_mr.h index 1c754f4acbe5..24c086db4511 100644 --- a/net/rds/ib_mr.h +++ b/net/rds/ib_mr.h @@ -45,7 +45,6 @@ struct rds_ib_fmr { struct ib_fmr *fmr; - u64 *dma; }; enum rds_ib_fr_state { -- 2.11.0
[PATCH 9/9] treewide: Inline ib_dma_map_*() functions
Almost all changes in this patch except the removal of local variables that became superfluous and the actual removal of the ib_dma_map_*() functions have been generated as follows: git grep -lE 'ib_(sg_|)dma_' | xargs -d\\n \ sed -i -e 's/\([^[:alnum:]_]\)ib_dma_\([^(]*\)(\&\([^,]\+\),/\1dma_\2(\3.dma_device,/g' \ -e 's/\([^[:alnum:]_]\)ib_dma_\([^(]*\)(\([^,]\+\),/\1dma_\2(\3->dma_device,/g' \ -e 's/ib_sg_dma_\(len\|address\)(\([^,]\+\), /sg_dma_\1(/g' Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> Reviewed-by: Christoph Hellwig <h...@lst.de> Cc: Andreas Dilger <andreas.dil...@intel.com> Cc: Anna Schumaker <anna.schuma...@netapp.com> Cc: David S. Miller <da...@davemloft.net> Cc: Eric Van Hensbergen <eri...@gmail.com> Cc: James Simmons <jsimm...@infradead.org> Cc: Latchesar Ionkov <lu...@ionkov.net> Cc: Oleg Drokin <oleg.dro...@intel.com> Cc: Ron Minnich <rminn...@sandia.gov> Cc: Trond Myklebust <trond.mykleb...@primarydata.com> Cc: de...@driverdev.osuosl.org Cc: linux-...@vger.kernel.org Cc: linux-n...@lists.infradead.org Cc: linux-r...@vger.kernel.org Cc: lustre-de...@lists.lustre.org Cc: netdev@vger.kernel.org Cc: rds-de...@oss.oracle.com Cc: target-de...@vger.kernel.org Cc: v9fs-develo...@lists.sourceforge.net --- drivers/infiniband/core/mad.c | 28 +-- drivers/infiniband/core/rw.c | 30 ++- drivers/infiniband/core/umem.c | 4 +- drivers/infiniband/core/umem_odp.c | 6 +- drivers/infiniband/hw/mlx4/cq.c| 2 +- drivers/infiniband/hw/mlx4/mad.c | 28 +-- drivers/infiniband/hw/mlx4/mr.c| 4 +- drivers/infiniband/hw/mlx4/qp.c| 10 +- drivers/infiniband/hw/mlx5/mr.c| 4 +- drivers/infiniband/ulp/ipoib/ipoib_cm.c| 20 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 22 +-- drivers/infiniband/ulp/iser/iscsi_iser.c | 6 +- drivers/infiniband/ulp/iser/iser_initiator.c | 38 ++-- drivers/infiniband/ulp/iser/iser_memory.c | 12 +- drivers/infiniband/ulp/iser/iser_verbs.c | 2 +- drivers/infiniband/ulp/isert/ib_isert.c| 60 +++--- drivers/infiniband/ulp/srp/ib_srp.c| 50 +++-- drivers/infiniband/ulp/srpt/ib_srpt.c | 10 +- drivers/nvme/host/rdma.c | 22 +-- drivers/nvme/target/rdma.c | 20 +- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h| 14 +- include/rdma/ib_verbs.h| 218 - net/9p/trans_rdma.c| 12 +- net/rds/ib.h | 39 net/rds/ib_cm.c| 18 +- net/rds/ib_fmr.c | 10 +- net/rds/ib_frmr.c | 8 +- net/rds/ib_rdma.c | 6 +- net/rds/ib_recv.c | 14 +- net/rds/ib_send.c | 28 +-- net/sunrpc/xprtrdma/fmr_ops.c | 6 +- net/sunrpc/xprtrdma/frwr_ops.c | 6 +- net/sunrpc/xprtrdma/rpc_rdma.c | 14 +- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 4 +- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c| 8 +- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 14 +- net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 +- net/sunrpc/xprtrdma/verbs.c| 8 +- 38 files changed, 275 insertions(+), 538 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index a009f7132c73..2d51f0bdc13f 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1152,21 +1152,21 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr) mad_agent = mad_send_wr->send_buf.mad_agent; sge = mad_send_wr->sg_list; - sge[0].addr = ib_dma_map_single(mad_agent->device, + sge[0].addr = dma_map_single(mad_agent->device->dma_device, mad_send_wr->send_buf.mad, sge[0].length, DMA_TO_DEVICE); - if (unlikely(ib_dma_mapping_error(mad_agent->device, sge[0].addr))) + if (unlikely(dma_mapping_error(mad_agent->device->dma_device, sge[0].addr))) return -ENOMEM; mad_send_wr->header_mapping = sge[0].addr; - sge[1].addr = ib_dma_map_single(mad_agent->device, + sge[1].addr = dma_map_single(mad_agent->device->dma_device, ib_get_payload(mad_send_wr), sge[1].length,
[PATCH 8/9] IB: Convert ib_dma_*_coherent() argument type from u64 into dma_addr_t
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> Cc: David S. Miller <da...@davemloft.net> Cc: linux-r...@vger.kernel.org Cc: netdev@vger.kernel.org Cc: rds-de...@oss.oracle.com --- include/rdma/ib_verbs.h | 11 +++ net/rds/ib.h| 6 +++--- 2 files changed, 6 insertions(+), 11 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 89e80eb77e06..de8dfb61d2b6 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -3109,15 +3109,10 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev, */ static inline void *ib_dma_alloc_coherent(struct ib_device *dev, size_t size, - u64 *dma_handle, + dma_addr_t *dma_handle, gfp_t flag) { - dma_addr_t handle; - void *ret; - - ret = dma_alloc_coherent(dev->dma_device, size, , flag); - *dma_handle = handle; - return ret; + return dma_alloc_coherent(dev->dma_device, size, dma_handle, flag); } /** @@ -3129,7 +3124,7 @@ static inline void *ib_dma_alloc_coherent(struct ib_device *dev, */ static inline void ib_dma_free_coherent(struct ib_device *dev, size_t size, void *cpu_addr, - u64 dma_handle) + dma_addr_t dma_handle) { dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle); } diff --git a/net/rds/ib.h b/net/rds/ib.h index 45ac8e8e58f4..d21ca88ab628 100644 --- a/net/rds/ib.h +++ b/net/rds/ib.h @@ -134,7 +134,7 @@ struct rds_ib_connection { struct rds_ib_work_ring i_send_ring; struct rm_data_op *i_data_op; struct rds_header *i_send_hdrs; - u64 i_send_hdrs_dma; + dma_addr_t i_send_hdrs_dma; struct rds_ib_send_work *i_sends; atomic_ti_signaled_sends; @@ -144,7 +144,7 @@ struct rds_ib_connection { struct rds_ib_incoming *i_ibinc; u32 i_recv_data_rem; struct rds_header *i_recv_hdrs; - u64 i_recv_hdrs_dma; + dma_addr_t i_recv_hdrs_dma; struct rds_ib_recv_work *i_recvs; u64 i_ack_recv; /* last ACK received */ struct rds_ib_refill_cache i_cache_incs; @@ -161,7 +161,7 @@ struct rds_ib_connection { struct rds_header *i_ack; struct ib_send_wr i_ack_wr; struct ib_sge i_ack_sge; - u64 i_ack_dma; + dma_addr_t i_ack_dma; unsigned long i_ack_queued; /* Flow control related information -- 2.11.0
Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
On 12/08/16 22:40, Madhani, Himanshu wrote: > We’ll take a look and send patches to resolve these warnings. Thanks! Bart.
Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
On 12/07/16 21:54, Michael S. Tsirkin wrote: > On Thu, Dec 08, 2016 at 05:21:47AM +0000, Bart Van Assche wrote: >> Additionally, there are notable exceptions to the rule that most drivers >> are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it >> would remain possible to check such drivers with sparse without enabling >> endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__ >> into e.g. #ifndef __DONT_CHECK_ENDIAN__? > > The right thing is probably just to fix these, isn't it? > Until then, why not just ignore the warnings? Neither option is realistic. With endian-checking enabled the qla2xxx driver triggers so many warnings that it becomes a real challenge to filter the non-endian warnings out manually: $ for f in "" CF=-D__CHECK_ENDIAN__; do make M=drivers/scsi/qla2xxx C=2\ $f | -c ': warning:'; done 4 752 If you think it would be easy to fix the endian warnings triggered by the qla2xxx driver, you are welcome to try to fix these. Bart.
Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
On 12/07/16 18:29, Michael S. Tsirkin wrote: > By now, linux is mostly endian-clean. Enabling endian-ness > checks for everyone produces about 200 new sparse warnings for me - > less than 10% over the 2000 sparse warnings already there. > > Not a big deal, OTOH enabling this helps people notice > they are introducing new bugs. > > So let's just drop __CHECK_ENDIAN__. Follow-up patches > can drop distinction between __bitwise and __bitwise__. Hello Michael, This patch makes a whole bunch of ccflags-y += -D__CHECK_ENDIAN__ statements obsolete. Have you considered to remove these statements? Additionally, there are notable exceptions to the rule that most drivers are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it would remain possible to check such drivers with sparse without enabling endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__ into e.g. #ifndef __DONT_CHECK_ENDIAN__? Thanks, Bart.
Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
On 04/11/2016 11:37 AM, Jesper Dangaard Brouer wrote: On Mon, 11 Apr 2016 14:46:25 -0300 Thadeu Lima de Souza Cascardowrote: So, Jesper, please take into consideration that this pool design would rather be per device. Otherwise, we allow some device to write into another's device/driver memory. Yes, that was my intended use. I want to have a page-pool per device. I actually, want to go as far as a page-pool per NIC HW RX-ring queue. Because the other use-case for the page-pool is zero-copy RX. The NIC HW trick is that we today can create a HW filter in the NIC (via ethtool) and place that traffic into a separate RX queue in the NIC. Lets say matching NFS traffic or guest traffic. Then we can allow RX zero-copy of these pages, into the application/guest, somehow binding it to RX queue, e.g. introducing a "cross-domain-id" in the page-pool page that need to match. I think it is important to keep in mind that using a page pool for zero-copy RX is specific to protocols that are based on TCP/IP. Protocols like FC, SRP and iSER have been designed such that the side that allocates the buffers also initiates the data transfer (the target side). With TCP/IP however transferring data and allocating receive buffers happens by opposite sides of the connection. Bart.
Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
On 04/07/16 07:38, Christoph Hellwig wrote: This is also very interesting for storage targets, which face the same issue. SCST has a mode where it caches some fully constructed SGLs, which is probably very similar to what NICs want to do. I think a cached allocator for page sets + the scatterlists that describe these page sets would not only be useful for SCSI target implementations but also for the Linux SCSI initiator. Today the scsi-mq code reserves space in each scsi_cmnd for a scatterlist of SCSI_MAX_SG_SEGMENTS. If scatterlists would be cached together with page sets less memory would be needed per scsi_cmnd. See also scsi_mq_setup_tags() and scsi_alloc_sgtable(). Bart.
Re: [PATCH] Revert "netpoll: Fix extra refcount release in netpoll_cleanup()"
On 04/05/16 13:58, Bjorn Helgaas wrote: This reverts commit 543e3a8da5a4c453e992d5351ef405d5e32f27d7. Direct callers of __netpoll_setup() depend on it to set np->dev, so we can't simply move that assignment up to netpoll_stup(). Reported-by: Bart Van Assche <bart.vanass...@sandisk.com> Signed-off-by: Bjorn Helgaas <bhelg...@google.com> Tested-by: Bart Van Assche <bart.vanass...@sandisk.com> Thanks! Bart.
Re: [net-next][PATCH 11/13] RDS: IB: add Fastreg MR (FRMR) detection support
On 02/21/16 19:36, David Miller wrote: From: Santosh ShilimkarDate: Sat, 20 Feb 2016 03:30:02 -0800 @@ -54,6 +55,8 @@ module_param(rds_ib_mr_8k_pool_size, int, 0444); MODULE_PARM_DESC(rds_ib_mr_8k_pool_size, " Max number of 8K mr per HCA"); module_param(rds_ib_retry_count, int, 0444); MODULE_PARM_DESC(rds_ib_retry_count, " Number of hw retries before reporting an error"); +module_param(prefer_frmr, bool, 0444); +MODULE_PARM_DESC(prefer_frmr, "Preferred MR method if both FMR and FRMR supported"); Sorry, you're going to have to create a real run time method to configure this parameter. I'm strongly against module parameters. Please don't go into details about why this might be difficult to do, I'm totally not interested. Doing things properly is sometimes not easy, that's life. Hello Santosh, What is the purpose of the prefer_frmr kernel module parameter ? Is this a parameter that is useful to RDS users or is its only purpose to allow developers of the RDS module to test both the FMR and FRMR code paths on hardware that supports both MR methods ? Bart.
Re: BUG/ spinlock lockup, 2.6.24
2008/2/15 Denys Fedoryshchenko [EMAIL PROTECTED]: I have random crashes, at least once per week. It is very difficult to catch error message, and only recently i setup netconsole. Now i got crash, but there is no traceback and only single line came over netconsole, mentioned before. Did you already run memtest ? You can run memtest by booting from the Knoppix CD-ROM or DVD. Most Linux distributions also have included memtest on their bootable distribution CD's/DVD's. Bart Van Assche. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html