Re: [PATCH v2 18/29] scsi: aic7xxx: aic7xxx_osm: Remove unused variable 'tinfo'
On Mon, 2020-07-13 at 08:46 +0100, Lee Jones wrote: > Looks like none of the artifact from ahc_fetch_transinfo() are used > anymore. > > Fixes the following W=1 kernel build warning(s): > > drivers/scsi/aic7xxx/aic7xxx_osm.c: In function > ‘ahc_linux_target_alloc’: > drivers/scsi/aic7xxx/aic7xxx_osm.c:567:30: warning: variable ‘tinfo’ > set but not used [-Wunused-but-set-variable] > 567 | struct ahc_initiator_tinfo *tinfo; > | ^ > > Cc: Hannes Reinecke > Cc: "Daniel M. Eischen" > Cc: Doug Ledford FWIW, I can't seem to figure out how you got mine or Dan's email addresses as related to this driver. The MAINTAINERS file only lists Hannes. The driver Dan and I worked on was a different driver. It was named aic7xxx, but that was back in the 1990s. It was renamed to aic7xxx_old so that Adaptec could contribute this driver you are currently patching back around 2001 or so. And then maybe around 2010 or something like that, the aic7xxx_old driver that Dan and I worked on was removed from the upstream source tree entirely. So, just out of curiosity, how did you get mine and Dan's email addresses to put on the Cc: list for these patches? > Signed-off-by: Lee Jones > --- > drivers/scsi/aic7xxx/aic7xxx_osm.c | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c > b/drivers/scsi/aic7xxx/aic7xxx_osm.c > index 2edfa0594f183..32bfe20d79cc1 100644 > --- a/drivers/scsi/aic7xxx/aic7xxx_osm.c > +++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c > @@ -564,8 +564,6 @@ ahc_linux_target_alloc(struct scsi_target > *starget) > struct scsi_target **ahc_targp = > ahc_linux_target_in_softc(starget); > unsigned short scsirate; > struct ahc_devinfo devinfo; > - struct ahc_initiator_tinfo *tinfo; > - struct ahc_tmode_tstate *tstate; > char channel = starget->channel + 'A'; > unsigned int our_id = ahc->our_id; > unsigned int target_offset; > @@ -612,9 +610,6 @@ ahc_linux_target_alloc(struct scsi_target > *starget) > spi_max_offset(starget) = 0; > spi_min_period(starget) = > ahc_find_period(ahc, scsirate, maxsync); > - > - tinfo = ahc_fetch_transinfo(ahc, channel, ahc->our_id, > - starget->id, &tstate); > } > ahc_compile_devinfo(&devinfo, our_id, starget->id, > CAM_LUN_WILDCARD, channel, -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/hns: Fix build error again
On Mon, 2019-10-21 at 23:51 +0200, Arnd Bergmann wrote: > On Mon, Oct 21, 2019 at 11:09 PM Doug Ledford > wrote: > > This fix looks reasonable, but since I can't test this at all, and > > I'm > > personally tired of trying and failing to fix this issue, I need to > > ask > > if you've tried all the permutations for this just to confirm it > > works > > in all valid cases? > > I'm fairly sure I would have found them all by now: Since I sent this > patch I built 4680 randconfig kernels, 293 of which had some HNS > driver enabled. > > I also like to think that I spent more time to think it through in > theory. I reviewed it pretty closely, and I couldn't find any way in which I thought it would fail, I'm just being picky because I want this to be the last fix ;-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/hns: Fix build error again
On Mon, 2019-10-07 at 23:18 +0200, Arnd Bergmann wrote: > This is not the first attempt to fix building random configurations, > unfortunately the attempt in commit a07fc0bb483e ("RDMA/hns: Fix build > error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m > and CONFIG_INFINIBAND_HNS_HIP08=y: > > drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined > reference to `__this_module' > > Revert commits a07fc0bb483e ("RDMA/hns: Fix build error") and > a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") to get > back to the previous state, then fix the issues described there > differently, by adding more specific dependencies: INFINIBAND_HNS > can now only be built-in if at least one of HNS or HNS3 are > built-in, and the individual back-ends are only available if > that code is reachable from the main driver. > > Fixes: a07fc0bb483e ("RDMA/hns: Fix build error") > Fixes: a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") > Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 > RoCE") > Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce > driver") > Signed-off-by: Arnd Bergmann This fix looks reasonable, but since I can't test this at all, and I'm personally tired of trying and failing to fix this issue, I need to ask if you've tried all the permutations for this just to confirm it works in all valid cases? -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: Fixes tag needs some work in the rdma-fixes tree
On Mon, 2019-10-21 at 18:41 +1100, Stephen Rothwell wrote: > Hi all, > > In commit > > 612e0486ad08 ("iw_cxgb4: fix ECN check on the passive accept") > > Fixes tag > > Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and > ISS for iWARP connections") > > has these problem(s): > > - SHA1 should be at least 12 digits long > Can be fixed by setting core.abbrev to 12 (or more) or (for git > v2.11 > or later) just making sure it is not set (or set to "auto"). I'll leave it to Potnuri to fix his stuff. As for the rdma tree, the 10 digit hash is still unique as of today, so I won't rebase the official branch to fix this. However, I'll see about adding a check for this in my workflow. Thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [GIT PULL] Thermal management updates for v5.4-rc1
On Fri, 2019-09-27 at 21:52 +0200, Richard Weinberger wrote: > On Fri, Sep 27, 2019 at 9:30 PM Doug Ledford > wrote: > > Because there are literally thousands of developers working on > > kernel > > bits here and there, and you're swatting this particular fly one > > developer at a time. > > I strongly disagree. One of the golden rules of kernel development is, > read what Linus writes. Especially during the merge window. > > Thanks, > //richard Developers come and go. Your argument is temporally flawed. A developer might start working on the tree, read everything during a merge window, and not catch one of Linus' rebase rants prior to committing a rebase felony of their own. Besides, I don't think this rule of yours is all that universal. If Linus is off on a thread about arm64 device tree brokenness and someone does a rebase and he rants about it, I'm very likely to miss that rant. I read what I reasonably deem to be relevant to me and my work, or sometimes additional stuff that just jumps out at me. But I never learned to speed read so I don't even try to read it all and wouldn't agree to a rule that says I have to. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [GIT PULL] Thermal management updates for v5.4-rc1
On Fri, 2019-09-27 at 12:41 -0700, Linus Torvalds wrote: > On Fri, Sep 27, 2019 at 12:29 PM Doug Ledford > wrote: > > Because there are literally thousands of developers working on > > kernel > > bits here and there, and you're swatting this particular fly one > > developer at a time. > > Well, at least these days it's also very clearly spelled out in the > Documentation directory. Yeah, but let's be fair. How many people read the documentation fully? Or even if some new maintainer read it, did it really sink in? Or will they rebase a year later not thinking about it? > And the "don't rebase" does get posted on the mailing lists each time, > and I've mentioned it over the years in my release notes too. Lots of people here and there miss those things. You never know who caught what. > Besides, I actually only work with about a hundred top-level > maintainers, not thousands. Yes, we have thousands of developers, but > doing the stats over the 5.0 releases, there have been "only" 131 > people sending me pull requests. Sure, more than a couple, but at the > same time it's not like this is a "every developer" kind of thing, > this is literally subsystem maintainers. We've got a fair number of > them, but it's definitely not about thousands. > > I feel like I've sent that email out way more than a hundred times > over the last 15+ years. I'm sure you probably have. I think I got it two or three times before I got all the nuances of which rebases were OK and which weren't :-). > .. and I don't think having git warn is right, since rebasing is > perfectly fine as you are doing development. > > It's really just that maintainers shouldn't do it for bad reasons and > at bad times. > > And "there was a conflict" and "yesterday" is really one of the > absolute worst reasons/times around. I think you send it out a lot because it doesn't get driven home in people's heads until they get yelled at. And there's more to the badness of a rebase than just annoying you when you want to see the conflicts to judge things for yourself. There are legitimate issues such as a rebase wipes out history. Or if you rebase a public tree it blows up everyone that is tracking that branch. These things apply even to when you are doing your own development. Or rebases might wipe out merge commits from other trees, depending on options. A git warning solely for you might not truly be appropriate, but I'm not at all convinced that a git warning with a more general "Here are the pros and cons of rebases, with a list of dos and donts if you are going to do a rebase" document is a bad idea, and might actually help you out too. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [GIT PULL] Thermal management updates for v5.4-rc1
On Fri, 2019-09-27 at 11:34 -0700, Linus Torvalds wrote: > On Fri, Sep 27, 2019 at 6:08 AM Zhang Rui wrote: > > One thing to mention is that, all the patches have been tested in > > linux-next for weeks, but there is a conflict detected, because > > upstream has took commit eaf7b46083a7e34 ("docs: thermal: add it to > > the > > driver API") from jc-docs tree while I'm keeping a wrong version of > > the > > patch, so I just rebased my tree to fix this. > > Why do I have to say this EVERY single release? Because there are literally thousands of developers working on kernel bits here and there, and you're swatting this particular fly one developer at a time. I might suggest that you need to speak with the git people and politely ask them to add a warning to the rebase command itself so that it prints out something like: If you are doing linux kernel development, and you are doing a rebase, please read Documentation/When_Not_To_Rebase.rst before rebasing your code and sending it to Linus. You've been warned. Acknowledge receipt of warning and proceed with rebase? (y/N) You would have free reign to put one of your more monumental yet funny rants in place in the documentation. You could also have a global git config to turn off the "Don't annoy Linus with rebases" warning. But only mention that global config at the end of the kernel documentation so you know people have read it before they turn the warning off. Maybe that would help. > A conflict is not a reason to rebase. Conflicts happen. They happen a > lot. I deal with them, and it's usually trivial. > > If you feel it's not trivial, just describe what the resolution is, > rather than rebasing. Really. > > Rebasing for a random conflict (particularly in documentation, for > chrissake!) is like using an atomic bomb to swat a fly. You have all > those downsides, and there are basically _no_ upsides. It only makes > for more work for me because I have to re-write this email for the > millionth time, and that takes longer and is more aggravating than the > conflict would have taken to just sort out. > >Linus -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
[PULL REQUEST] Please pull rdma.git
Hi Linus, *Much* calmer week this week. Just one -rc patch queued up. The way the siw driver was locking around the traversal of the list of ipv6 addresses on a device was causing a scheduling while atomic issue. Bernard straightened it out by using the rtnl_lock. Here's the boiler plate: The following changes since commit c536277e0db1ad2e9fbb9dfd940c3565a14d9c52: RDMA/siw: Fix 64/32bit pointer inconsistency (2019-08-23 12:08:27 -0400) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus for you to fetch changes up to 531a64e4c35bb9844b0cf813a6c9a87e00be05ff: RDMA/siw: Fix IPv6 addr_list locking (2019-08-28 10:29:19 -0400) Pull request for 5.3-rc6 - Fix locking on list traversal (siw) Signed-off-by: Doug Ledford Bernard Metzler (1): RDMA/siw: Fix IPv6 addr_list locking drivers/infiniband/sw/siw/siw_cm.c | 31 --- 1 file changed, 20 insertions(+), 11 deletions(-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PULL REQUEST] Please pull rdma.git
Hi Linus, I didn't notice I was on my personal email identity when I sent the pull request. Sorry about that. It's really me ;-) On Fri, 2019-08-23 at 14:48 -0400, Doug Ledford wrote: > Hi Linus, > > No beating around the bush: this is a monster pull request for an -rc5 > kernel. Intel hit me with a series of fixes for TID processing. > Mellanox hit me with a series for their UMR memory support. > > And we had one fix for siw that fixes the 32bit build warnings and > because of the number of casts that had to be changed to properly > silence the warnings, that one patch alone is a full 40% of the LOC of > this entire pull request. Given that this is the initial release > kernel > for siw, I'm trying to fix anything in it that we can, so that adds to > the impetus to take fixes for it like this one. > > I had to do a rebase early in the week. Jason had thought he put a > patch on the rc queue that he needed to be there so he could base some > work off of it, and it had actually not been placed there. So he > asked > me (on Tuesday) to fix that up before pushing my wip branch to the > official rc branch. I did, and that's why the early patches look like > they were all committed at the same time on Tuesday. That bunch had > been in my queue prior. > > The various patches all pass my test for being legitimate fixes and > not > attempts to slide new features or development into a late rc. Well, > they were all fixes with the exception of a couple clean up patches > people wrote for making the fixes they also wrote better (like a > cleanup > patch to move UMR checking into a function so that the remaining UMR > fix > patches can reference that function), so I left those in place too. > > My apologies for the LOC count and the number of patches here, it's > just > how the cards fell this cycle. I hope you agree with me that they're > justified fixes. > > Here's the boilerplate: > > The following changes since commit > d1abaeb3be7b5fa6d7a1fbbd2e14e3310005c4c1: > > Linux 5.3-rc5 (2019-08-18 14:31:08 -0700) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git > tags/for-linus > > for you to fetch changes up to > c536277e0db1ad2e9fbb9dfd940c3565a14d9c52: > > RDMA/siw: Fix 64/32bit pointer inconsistency (2019-08-23 12:08:27 > -0400) > > > Pull request for 5.3-rc5 > > - Fix siw buffer mapping issue > - Fix siw 32/64 casting issues > - Fix a KASAN access issue in bnxt_re > - Fix several memory leaks (hfi1, mlx4) > - Fix a NULL deref in cma_cleanup > - Fixes for UMR memory support in mlx5 (4 patch series) > - Fix namespace check for restrack > - Fixes for counter support > - Fixes for hfi1 TID processing (5 patch series) > - Fix potential NULL deref in siw > - Fix memory page calculations in mlx5 > > Signed-off-by: Doug Ledford > > > Bernard Metzler (3): > RDMA/siw: Fix potential NULL de-ref > RDMA/siw: Fix SGL mapping issues > RDMA/siw: Fix 64/32bit pointer inconsistency > > Ido Kalir (1): > IB/core: Fix NULL pointer dereference when bind QP to counter > > Jason Gunthorpe (1): > RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB > > Kaike Wan (5): > IB/hfi1: Drop stale TID RDMA packets > IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet > IB/hfi1: Add additional checks when handling TID RDMA READ RESP > packet > IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA > packet > IB/hfi1: Drop stale TID RDMA packets that cause TIDErr > > Leon Romanovsky (2): > RDMA/counters: Properly implement PID checks > RDMA/restrack: Rewrite PID namespace check to be reliable > > Moni Shoua (4): > IB/mlx5: Consolidate use_umr checks into single function > IB/mlx5: Report and handle ODP support properly > IB/mlx5: Fix MR re-registration flow to use UMR properly > IB/mlx5: Block MR WR if UMR is not possible > > Selvin Xavier (1): > RDMA/bnxt_re: Fix stack-out-of-bounds in > bnxt_qplib_rcfw_send_message > > Wenwen Wang (3): > IB/mlx4: Fix memory leaks > infiniband: hfi1: fix a memory leak bug > infiniband: hfi1: fix memory leaks > > zhengbin (1): > RDMA/cma: fix null-ptr-deref Read in cma_cleanup > > drivers/infiniband/core/cma.c | 6 ++- > drivers/infiniband/core/counters.c | 10 ++-- > drivers/infiniband/core/nldev.c| 3 +- >
Re: [PATCH] infiniband: hfi1: fix memory leaks
On Sun, 2019-08-18 at 13:54 -0500, Wenwen Wang wrote: > In fault_opcodes_write(), 'data' is allocated through kcalloc(). > However, > it is not deallocated in the following execution if an error occurs, > leading to memory leaks. To fix this issue, introduce the 'free_data' > label > to free 'data' before returning the error. > > Signed-off-by: Wenwen Wang Applied to for-rc, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] infiniband: hfi1: fix a memory leak bug
On Sun, 2019-08-18 at 14:29 -0500, Wenwen Wang wrote: > In fault_opcodes_read(), 'data' is not deallocated if > debugfs_file_get() > fails, leading to a memory leak. To fix this bug, introduce the > 'free_data' > label to free 'data' before returning the error. > > Signed-off-by: Wenwen Wang Applied to for-rc, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/mlx4: Fix memory leaks
On Sun, 2019-08-18 at 15:23 -0500, Wenwen Wang wrote: > In mlx4_ib_alloc_pv_bufs(), 'tun_qp->tx_ring' is allocated through > kcalloc(). However, it is not always deallocated in the following > execution > if an error occurs, leading to memory leaks. To fix this issue, free > 'tun_qp->tx_ring' whenever an error occurs. > > Signed-off-by: Wenwen Wang > --- Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/hns: Fix some white space check_mtu_validate()
On Fri, 2019-08-16 at 14:39 +0300, Dan Carpenter wrote: > This line was indented a bit too far. > > Signed-off-by: Dan Carpenter Thanks, applied to for-next. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
[PULL REQUEST] Please pull rdma.git
Hi Linus, Fairly small pull request for -rc3. I'm out of town the rest of this week, so I made sure to clean out as much as possible from patchworks in enough time for 0-day to chew through it (Yay! for 0-day being back online! :-)). Jason might send through any emergency stuff that could pop up, otherwise I'm back next week. The only real thing of note is the siw ABI change. Since we just merged siw *this* release, there are no prior kernel releases to maintain kernel ABI with. I told Bernard that if there is anything else about the siw ABI he thinks he might want to change before it goes set in stone, he should get it in ASAP. The siw module was around for several years outside the kernel tree, and it had to be revamped considerably for inclusion upstream, so we are making no attempts to be backward compatible with the out of tree version. Once 5.3 is actually released, we will have our baseline ABI to maintain. Here's the boiler plate: The following changes since commit e21a712a9685488f5ce80495b37b9fdbe96c230d: Linux 5.3-rc3 (2019-08-04 18:40:12 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus for you to fetch changes up to 2c8ccb37b08fe364f02a9914daca474d43151453: RDMA/siw: Change CQ flags from 64->32 bits (2019-08-13 12:22:06 -0400) Pull request for 5.3-rc3 - Fix a memory registration release flow issue that was causing a WARN_ON (mlx5) - If the counters for a port aren't allocated, then we can't do operations on the non-existent counters (core) - Check the right variable for error code result (mlx5) - Fix a use after free issue (mlx5) - Fix an off by one memory leak (siw) - Actually return an error code on error (core) - Allow siw to be built on 32bit arches (siw, ABI change, but OK since siw was just merged this merge window and there is no prior released kernel to maintain compatibility with and we also updated the rdma-core user space package to match) Signed-off-by: Doug Ledford Bernard Metzler (1): RDMA/siw: Change CQ flags from 64->32 bits Dan Carpenter (3): IB/mlx5: Check the correct variable in error handling code RDMA/siw: Fix a memory leak in siw_init_cpulist() RDMA/core: Fix error code in stat_get_doit_qp() Mark Zhang (1): RDMA/counter: Prevent QP counter binding if counters unsupported Yishai Hadas (2): IB/mlx5: Fix implicit MR release flow IB/mlx5: Fix use-after-free error while accessing ev_file pointer drivers/infiniband/core/counters.c| 6 ++ drivers/infiniband/core/nldev.c | 8 ++-- drivers/infiniband/core/umem_odp.c| 4 drivers/infiniband/hw/mlx5/devx.c | 11 ++- drivers/infiniband/hw/mlx5/odp.c | 24 +--- drivers/infiniband/sw/siw/Kconfig | 2 +- drivers/infiniband/sw/siw/siw.h | 2 +- drivers/infiniband/sw/siw/siw_main.c | 4 +--- drivers/infiniband/sw/siw/siw_qp.c| 14 ++ drivers/infiniband/sw/siw/siw_verbs.c | 16 +++- include/uapi/rdma/siw-abi.h | 3 ++- 11 files changed, 53 insertions(+), 41 deletions(-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/hns: remove obsolete Kconfig comment
On Wed, 2019-08-07 at 11:22 +0800, YueHaibing wrote: > Since commit a07fc0bb483e ("RDMA/hns: Fix build error") > these kconfig comment is obsolete, so just remove it. > > Signed-off-by: YueHaibing Thanks, applied to for-next. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH V2] mlx5: Fix formats with line continuation whitespace
On Fri, 2019-08-02 at 18:32 +, Saeed Mahameed wrote: > On Fri, 2019-08-02 at 11:09 -0700, Joe Perches wrote: > > On Tue, 2018-11-06 at 16:34 -0500, Doug Ledford wrote: > > > On Thu, 2018-11-01 at 09:34 +0200, Leon Romanovsky wrote: > > > > On Thu, Nov 01, 2018 at 12:24:08AM -0700, Joe Perches wrote: > > > > > The line continuations unintentionally add whitespace so > > > > > instead use coalesced formats to remove the whitespace. > > > > > > > > > > Signed-off-by: Joe Perches > > > > > --- > > > > > > > > > > v2: Remove excess space after %u > > > > > > > > > > drivers/net/ethernet/mellanox/mlx5/core/rl.c | 6 ++ > > > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > > > > > > > > Thanks, > > > > Reviewed-by: Leon Romanovsky > > > > > > Applied, thanks. > > > > Still not upstream. How long does it take? > > > > Doug, Leon, this patch still apply, let me know what happened here ? > and if you want me to apply it to one of my branches. I'm not entirely sure what happened here. Obviously I said I had taken it, which I don't do under my normal workflow until I've actually applied and build tested the patch. For it to not make it into the tree means that I probably applied it to my wip/dl-for-next branch, but prior to moving it to for-next, I might have had a rebase and it got lost in the shuffle or something like that. My apologies for letting it slip through the cracks. Anyway, I pulled the patch from patchworks, applied it, and pushed it to k.o. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
[PULL REQUEST] Please pull rdma.git
Hi Linus, Here's our second -rc pull request. Nothing particularly special in this one. The client removal deadlock fix is kindy tricky, but we had multiple eyes on it and no one could find a fault in it. A couple Spectre V1 fixes too. Otherwise, all just normal -rc fodder. Here's the boilerplate: The following changes since commit b7165bd0d6cbb93732559be6ea8774653b204480: IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification (2019-07-25 11:45:48 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus for you to fetch changes up to 020fb3bebc224dfe9353a56ecbe2d5fac499dffc: RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp() (2019-08-01 12:53:53 -0400) Pull request for 5.3-rc2 - A couple Spectre V1 fixes (umad, hfi1) - Fix a tricky deadlock in the rdma core code with refcounting instead of locks (client removal patches) - Build errors (hns) - Fix a scheduling while atomic issue (mlx5) - Use after free fix (mad) - Fix error path return code (hns) - Null deref fix (siw_crypto_hash) - A few other misc. minor fixes Bernard Metzler (1): Do not dereference 'siw_crypto_shash' before checking Gal Pressman (1): RDMA/restrack: Track driver QP types in resource tracker Gustavo A. R. Silva (1): IB/hfi1: Fix Spectre v1 vulnerability Guy Levi (1): IB/mlx5: Fix MR registration flow to use UMR properly Jack Morgenstein (1): IB/mad: Fix use-after-free in ib mad completion handling Jason Gunthorpe (2): RDMA/devices: Do not deadlock during client removal RDMA/devices: Remove the lock around remove_client_context Leon Romanovsky (1): RDMA/mlx5: Release locks during notifier unregister Michal Kalderon (1): RDMA/qedr: Fix the hca_type and hca_rev returned in device attributes Tony Luck (1): IB/core: Add mitigation for Spectre V1 Wei Yongjun (1): RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp() YueHaibing (1): RDMA/hns: Fix build error drivers/infiniband/core/core_priv.h| 5 +- drivers/infiniband/core/device.c | 102 +++-- drivers/infiniband/core/mad.c | 20 +++--- drivers/infiniband/core/user_mad.c | 6 +- drivers/infiniband/hw/hfi1/verbs.c | 2 + drivers/infiniband/hw/hns/Kconfig | 6 +- drivers/infiniband/hw/hns/Makefile | 8 +-- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 4 +- drivers/infiniband/hw/mlx5/main.c | 7 +- drivers/infiniband/hw/mlx5/mr.c| 27 +++- drivers/infiniband/hw/qedr/main.c | 10 ++- drivers/infiniband/sw/siw/siw_qp.c | 6 +- include/rdma/ib_verbs.h| 4 +- 13 files changed, 124 insertions(+), 83 deletions(-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/hfi1: Fix Spectre v1 vulnerability
On Wed, 2019-07-31 at 12:54 -0500, Gustavo A. R. Silva wrote: > sl is controlled by user-space, hence leading to a potential > exploitation of the Spectre variant 1 vulnerability. > > Fix this by sanitizing sl before using it to index ibp->sl_to_sc. > > Notice that given that speculation windows are large, the policy is > to kill the speculation on the first load and not worry if it can be > completed with a dependent load/store [1]. > > [1] > https://lore.kernel.org/lkml/20180423164740.gy17...@dhcp22.suse.cz/ > > Cc: sta...@vger.kernel.org > Signed-off-by: Gustavo A. R. Silva > --- Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH -next] RDMA/hns: remove set but not used variable 'irq_num'
On Thu, 2019-08-01 at 13:10 +0300, Leon Romanovsky wrote: > On Wed, Jul 31, 2019 at 03:37:48PM +0800, YueHaibing wrote: > > Fixes gcc '-Wunused-but-set-variable' warning: > > > > drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function > > hns_roce_v2_cleanup_eq_table: > > drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5920:6: > > warning: variable irq_num set but not used [-Wunused-but-set- > > variable] > > > > It is not used since > > commit 33db6f94847c ("RDMA/hns: Refactor eq table init for hip08") > > > > Reported-by: Hulk Robot > > Signed-off-by: YueHaibing > > --- > > drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > I'm hitting this error too. > > Thanks, > Reviewed-by: Leon Romanovsky Applied to for-next, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH V2] IB/core: Add mitigation for Spectre V1
On Wed, 2019-07-31 at 14:52 -0400, Doug Ledford wrote: > On Wed, 2019-07-31 at 12:52 -0500, Gustavo A. R. Silva wrote: > > This is insufficient. The speculation windows are large: > > > > "Speculative execution on modern CPUs can run several > > hundred instructions ahead." [1] > > > > [1] https://spectreattack.com/spectre.pdf > > Thanks, I'll take a look at that. That issue aside, returning without > wasting time on two mutexes is still better IMO, so I like my patch > more > than the proposed one. Tony, would you like to resubmit? > Never mind, I took your V2 and fixed it up like I wanted. Patch applied, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH V2] IB/core: Add mitigation for Spectre V1
On Wed, 2019-07-31 at 12:52 -0500, Gustavo A. R. Silva wrote: > This is insufficient. The speculation windows are large: > > "Speculative execution on modern CPUs can run several > hundred instructions ahead." [1] > > [1] https://spectreattack.com/spectre.pdf Thanks, I'll take a look at that. That issue aside, returning without wasting time on two mutexes is still better IMO, so I like my patch more than the proposed one. Tony, would you like to resubmit? -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH][next] RDMA/core: fix spelling mistake "Nelink" -> "Netlink"
On Wed, 2019-07-31 at 11:28 +0300, Leon Romanovsky wrote: > On Wed, Jul 31, 2019 at 09:01:44AM +0100, Colin King wrote: > > From: Colin Ian King > > > > There is a spelling mistake in a warning message, fix it. > > > > Signed-off-by: Colin Ian King > > --- > > drivers/infiniband/core/netlink.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Thanks, > Reviewed-by: Leon Romanovsky Thanks, applied to for-next. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH v6 20/57] infiniband: Remove dev_err() usage after platform_get_irq()
On Tue, 2019-07-30 at 11:15 -0700, Stephen Boyd wrote: > We don't need dev_err() messages when platform_get_irq() fails now > that > platform_get_irq() prints an error message itself when something goes > wrong. Let's remove these prints with a simple semantic patch. > > // > @@ > expression ret; > struct platform_device *E; > @@ > > ret = > ( > platform_get_irq(E, ...) > platform_get_irq_byname(E, ...) > ); > > if ( \( ret < 0 \| ret <= 0 \) ) > { > ( > -if (ret != -EPROBE_DEFER) > -{ ... > -dev_err(...); > -... } > ... > -dev_err(...); > ) > ... > } > // > > While we're here, remove braces on if statements that only have one > statement (manually). > > Cc: Doug Ledford > Cc: Jason Gunthorpe > Cc: linux-r...@vger.kernel.org > Cc: Greg Kroah-Hartman > Signed-off-by: Stephen Boyd > --- > > Please apply directly to subsystem trees > Thanks for being clear about where you wanted these applied. This patch applied to rdma for-next, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH V2] IB/core: Add mitigation for Spectre V1
On Tue, 2019-07-30 at 21:39 -0700, Luck, Tony wrote: > Some processors may mispredict an array bounds check and > speculatively access memory that they should not. With > a user supplied array index we like to play things safe > by masking the value with the array size before it is > used as an index. > > Signed-off-by: Tony Luck > > --- > V2: Mask the index *AFTER* the bounds check. Issue pointed > out by Gustavo. Fix suggested by Ira. > > drivers/infiniband/core/user_mad.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/user_mad.c > b/drivers/infiniband/core/user_mad.c > index 9f8a48016b41..32cea5ed9ce1 100644 > --- a/drivers/infiniband/core/user_mad.c > +++ b/drivers/infiniband/core/user_mad.c > @@ -49,6 +49,7 @@ > #include > #include > #include > +#include > > #include > > @@ -888,7 +889,12 @@ static int ib_umad_unreg_agent(struct > ib_umad_file *file, u32 __user *arg) > mutex_lock(&file->port->file_mutex); > mutex_lock(&file->mutex); > > - if (id >= IB_UMAD_MAX_AGENTS || !__get_agent(file, id)) { > + if (id >= IB_UMAD_MAX_AGENTS) { > + ret = -EINVAL; > + goto out; > + } > + id = array_index_nospec(id, IB_UMAD_MAX_AGENTS); > + if (!__get_agent(file, id)) { > ret = -EINVAL; > goto out; > } I'm not sure this is the best fix for this. However, here is where I get to admit that I largely ignored the whole Spectre V1 thing, so I'm not sure I completely understand the vulnerability and the limits to that. But, looking at the function, it seems we can do an early return without ever taking any of the mutexes in the function in the case of id >= IB_UMAD_MAX_AGENTS, so if we did that, would that separate the checking of id far enough from the usage of it as an array index that we wouldn't need the clamp to prevent speculative prefetch? About how far, in code terms, does this speculative prefetch occur? This is the patch I was thinking of: diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 9f8a48016b41..1e507dd257ff 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -884,11 +885,18 @@ static int ib_umad_unreg_agent(struct ib_umad_file *file, u32 __user *arg) if (get_user(id, arg)) return -EFAULT; + if (id >= IB_UMAD_MAX_AGENTS) + return -EINVAL; mutex_lock(&file->port->file_mutex); mutex_lock(&file->mutex); - if (id >= IB_UMAD_MAX_AGENTS || !__get_agent(file, id)) { + /* +* Is our check of id far enough away, code wise, to prevent +* speculative prefetch? +*/ + id = array_index_nospec(id, IB_UMAD_MAX_AGENTS); + if (!__get_agent(file, id)) { ret = -EINVAL; goto out; } -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] rdma: siw: remove unused variable
On Mon, 2019-07-29 at 16:03 -0300, Jason Gunthorpe wrote: > On Mon, Jul 29, 2019 at 02:19:35PM -0400, Doug Ledford wrote: > > On Fri, 2019-07-26 at 11:25 +0200, Anders Roxell wrote: > > > The variable 'p' si no longer used and the compiler rightly > > > complains > > > that it should be removed. > > > > > > ../drivers/infiniband/sw/siw/siw_mem.c: In function > > > ‘siw_free_plist’: > > > ../drivers/infiniband/sw/siw/siw_mem.c:66:16: warning: unused > > > variable > > > ‘p’ [-Wunused-variable] > > > struct page **p = chunk->plist; > > > ^ > > > > > > Rework to remove unused variable. > > > > > > Fixes: 8288d030447f ("mm/gup: add make_dirty arg to > > > put_user_pages_dirty_lock()") > > > > This commit hash and the commit subject does not exist in Linus' > > tree as > > of today. What tree is this being merged through, and is it slated > > to > > merge soon or is this a for-next item? > > This is though -mm, maybe John knows what is what Hmmm...if it's through -mm, doesn't that mean that we can't rely on the hash because the next time Andrew's tree rebases (using quilt or whatever it is he does) that the hash will change? It doesn't really matter too much...we can't take the fix anyway, it should probably be squashed into the patch that it's fixing, and if you follow Bernard's advice, you fix the problem by eliminating this function and changing the sole call site to just call put_user_pages_dirty_lock() directly. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] rdma: siw: remove unused variable
On Fri, 2019-07-26 at 11:25 +0200, Anders Roxell wrote: > The variable 'p' si no longer used and the compiler rightly complains > that it should be removed. > > ../drivers/infiniband/sw/siw/siw_mem.c: In function ‘siw_free_plist’: > ../drivers/infiniband/sw/siw/siw_mem.c:66:16: warning: unused variable > ‘p’ [-Wunused-variable] > struct page **p = chunk->plist; > ^ > > Rework to remove unused variable. > > Fixes: 8288d030447f ("mm/gup: add make_dirty arg to > put_user_pages_dirty_lock()") This commit hash and the commit subject does not exist in Linus' tree as of today. What tree is this being merged through, and is it slated to merge soon or is this a for-next item? -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] infiniband: hw: cxgb3: Fix a possible null-pointer dereference in connect_reply_upcall()
On Thu, 2019-07-25 at 20:15 +0800, Jia-Ju Bai wrote: > In connect_reply_upcall(), there is an if statement on line 730 to > check > whether ep->com.cm_id is NULL: > if (ep->com.cm_id) > > When ep->com.cm_id is NULL, it is used on line 736: > ep->com.cm_id->rem_ref(ep->com.cm_id); > > Thus, a possible null-pointer dereference may occur. > > To fix this bug, ep->com.cm_id is checked before being used. > > This bug is found by a static analysis tool STCheck written by us. I think this is one of those theoretical issues that is a non-issue in practice. The cxgb3 driver is older than dirt and only hangs around because people out there are still using it in a few places. We have no reports of bugs in this function. I looked through the code, but it wasn't quickly obvious why this isn't an issue, but I suspect the real answer is "we can never have a negative status and not have a cm_id" as a result of the design of the code. Verifying that with an audit is more time than I want to spend though. So, I'm going to drop this patch. If you can find a plausible path by which this is actually a bug, then we can revisit it. But I don't want to go around changing a known working for years driver on the basis of "my new static code checker found this thing" versus someone who reports an actual crash (which is what this bug would cause if it exists). > Signed-off-by: Jia-Ju Bai > --- > drivers/infiniband/hw/cxgb3/iwch_cm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c > b/drivers/infiniband/hw/cxgb3/iwch_cm.c > index 0bca72cb4d9a..2b31c4726d3e 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c > @@ -733,7 +733,8 @@ static void connect_reply_upcall(struct iwch_ep > *ep, int status) > ep->com.cm_id->event_handler(ep->com.cm_id, &event); > } > if (status < 0) { > - ep->com.cm_id->rem_ref(ep->com.cm_id); > + if (ep->com.cm_id) > + ep->com.cm_id->rem_ref(ep->com.cm_id); > ep->com.cm_id = NULL; > ep->com.qp = NULL; > } -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/hns: Fix build error
On Wed, 2019-07-24 at 08:32 -0300, Jason Gunthorpe wrote: > On Wed, Jul 24, 2019 at 02:54:43PM +0800, YueHaibing wrote: > > If INFINIBAND_HNS_HIP08 is selected and HNS3 is m, > > but INFINIBAND_HNS is y, building fails: > > > > drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function > > `hns_roce_hw_v2_exit': > > hns_roce_hw_v2.c:(.exit.text+0xd): undefined reference to > > `hnae3_unregister_client' > > drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function > > `hns_roce_hw_v2_init': > > hns_roce_hw_v2.c:(.init.text+0xd): undefined reference to > > `hnae3_register_client' > > > > Also if INFINIBAND_HNS_HIP06 is selected and HNS_DSAF > > is m, but INFINIBAND_HNS is y, building fails: > > > > drivers/infiniband/hw/hns/hns_roce_hw_v1.o: In function > > `hns_roce_v1_reset': > > hns_roce_hw_v1.c:(.text+0x39fa): undefined reference to > > `hns_dsaf_roce_reset' > > hns_roce_hw_v1.c:(.text+0x3a25): undefined reference to > > `hns_dsaf_roce_reset' > > > > Reported-by: Hulk Robot > > Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 > > RoCE") > > Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce > > driver") > > Signed-off-by: YueHaibing > > drivers/infiniband/hw/hns/Kconfig | 6 +++--- > > drivers/infiniband/hw/hns/Makefile | 8 ++-- > > 2 files changed, 5 insertions(+), 9 deletions(-) > > did you test this approach with CONFIG_MODULES=n? This version of the patch looks like the right fix. Applying to for-rc, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
[PULL REQUEST] Please pull rdma.git
Hi Linus, This is probably our last -rc pull request. We don't have anything else outstanding at the moment anyway, and with the summer months on us and people taking trips, I expect the next weeks leading up to the merge window to be pretty calm and sedate. This has two simple, no brainer fixes for the EFA driver. Then it has ten not quite so simple fixes for the hfi1 driver. The problem with them is that they aren't simply one liner typo fixes. They're still fixes, but they're more complex issues like livelock under heavy load where the answer was to change work queue usage and spinlock usage to resolve the problem, or issues with orphaned requests during certain types of failures like link down which required some more complex work to fix too. They all look like legitimate fixes to me, they just aren't small like I wish they were. Here's the boilerplate: The following changes since commit d1fdb6d8f6a4109a4263176c84b899076a5f8008: Linux 5.2-rc4 (2019-06-08 20:24:46 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus for you to fetch changes up to 7a5834e456f7fb3eca9b63af2a6bc7f460ae482f: RDMA/efa: Handle mmap insertions overflow (2019-06-18 16:27:24 -0400) Pull request for 5.1-rc5 - 2 minor EFA fixes - 10 hfi1 fixes related to scaling issues Gal Pressman (2): RDMA/efa: Fix success return value in case of error RDMA/efa: Handle mmap insertions overflow Kaike Wan (1): IB/hfi1: Validate fault injection opcode user input Mike Marciniszyn (9): IB/hfi1: Close PSM sdma_progress sleep window IB/hfi1: Correct tid qp rcd to match verbs context IB/hfi1: Avoid hardlockup with flushlist_lock IB/hfi1: Silence txreq allocation warnings IB/hfi1: Create inline to get extended headers IB/hfi1: Use aborts to trigger RC throttling IB/hfi1: Wakeup QPs orphaned on wait list after flush IB/hfi1: Handle wakeup of orphaned QPs for pio IB/hfi1: Handle port down properly in pio drivers/infiniband/hw/efa/efa_com_cmd.c | 24 +++ drivers/infiniband/hw/efa/efa_verbs.c| 21 ++--- drivers/infiniband/hw/hfi1/chip.c| 13 drivers/infiniband/hw/hfi1/chip.h| 1 + drivers/infiniband/hw/hfi1/fault.c | 5 +++ drivers/infiniband/hw/hfi1/hfi.h | 31 +++ drivers/infiniband/hw/hfi1/pio.c | 21 +++-- drivers/infiniband/hw/hfi1/rc.c | 53 +++- drivers/infiniband/hw/hfi1/sdma.c| 26 drivers/infiniband/hw/hfi1/tid_rdma.c| 4 +-- drivers/infiniband/hw/hfi1/ud.c | 4 +-- drivers/infiniband/hw/hfi1/user_sdma.c | 12 +++- drivers/infiniband/hw/hfi1/user_sdma.h | 1 - drivers/infiniband/hw/hfi1/verbs.c | 14 + drivers/infiniband/hw/hfi1/verbs.h | 1 + drivers/infiniband/hw/hfi1/verbs_txreq.c | 2 +- drivers/infiniband/hw/hfi1/verbs_txreq.h | 3 +- 17 files changed, 174 insertions(+), 62 deletions(-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: manual merge of the rdma tree with Linus' tree
On Thu, 2019-06-20 at 12:10 +1000, Stephen Rothwell wrote: > 2d3c72ed5041 ("rdma: Remove nes") Yeah, not much you can do about tree wide patchsets conflicting with a removal ;-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: manual merge of the rdma tree with Linus' tree
On Thu, 2019-06-20 at 12:06 +1000, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the rdma tree got a conflict in: > > include/rdma/ib_verbs.h > > between commit: > > dc1435c00fcd ("RDMA/srp: Rename SRP sysfs name after IB device > rename trigger") > > from Linus' tree and commit: > > 0e2d00eb6fd4 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev > discovery and autoload") > > from the rdma tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your > tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any > particularly > complex conflicts. > Yep, this one was expected. Thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH trivial] IB/hfi1: Spelling s/statisfied/satisfied/
On Mon, 2019-06-17 at 16:01 +0200, Geert Uytterhoeven wrote: > Signed-off-by: Geert Uytterhoeven > --- > drivers/infiniband/hw/hfi1/tid_rdma.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c > b/drivers/infiniband/hw/hfi1/tid_rdma.c > index 6fb93032fbefcb7e..24f30ff6b5fbc868 100644 > --- a/drivers/infiniband/hw/hfi1/tid_rdma.c > +++ b/drivers/infiniband/hw/hfi1/tid_rdma.c > @@ -477,7 +477,7 @@ static struct rvt_qp *first_qp(struct > hfi1_ctxtdata *rcd, > * Must hold the qp s_lock and the exp_lock. > * > * Return: > - * false if either of the conditions below are statisfied: > + * false if either of the conditions below are satisfied: > * 1. The list is empty or > * 2. The indicated qp is at the head of the list and the > *HFI1_S_WAIT_TID_SPACE bit is set in qp->s_flags. Thanks, applied to for-next. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: RDMA: Clean destroy CQ in drivers do not return errors
On Fri, 2019-06-14 at 14:59 +0100, Colin Ian King wrote: > Hi, > > Static analysis with Coverity reported an issue with the following > commit: > > commit a52c8e2469c30cf7ac453d624aed9c168b23d1af > Author: Leon Romanovsky > Date: Tue May 28 14:37:28 2019 +0300 > > RDMA: Clean destroy CQ in drivers do not return errors > > In function bnxt_re_destroy_cq() contains the following: > > if (!cq->umem) > ib_umem_release(cq->umem); Given that the original test that was replaced was: if (!IS_ERR_OR_NULL(cq->umem)) we aren't really worried about a null cq, just that umem is valid. So, the logic is inverted on the test (or possibly we shouldn't have replaced !IS_ERR_OR_NULL(cq->umem) at all). But on closer inspection, the bnxt_re specific portion of this patch appears to have another problem in that it no longer checks the result of bnxt_qplib_destroy_cq() yet it does nothing to keep that function from failing. Leon, can you send a followup fix? > Coverity detects this as a deference after null check on the null > pointer cq->umem: > > "var_deref_model: Passing null pointer cq->umem to ib_umem_release, > which dereferences it" > > Is the logic inverted on that null check? > > Colin -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: manual merge of the rdma tree with Linus' tree
On Fri, 2019-06-14 at 13:00 +1000, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the rdma tree got conflicts in: > > drivers/infiniband/core/uverbs_cmd.c > drivers/infiniband/core/uverbs_std_types_cq.c > > between commit: > > 6876aaedc8a1 ("RDMA/uverbs: Pass udata on uverbs error unwind") > > from Linus' tree and commit: > > e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core > responsibility") > > from the rdma tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your > tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any > particularly > complex conflicts. > Thanks Stephen. There will be at least one more coming too. We've had a number of -rc patches and -next patches that touch the same area of code this cycle. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH v2] RDMA/cma: Make CM response timeout and # CM retries configurable
On Thu, 2019-06-13 at 18:58 +0200, Håkon Bugge wrote: > > On 13 Jun 2019, at 16:25, Doug Ledford wrote: > > > > On Tue, 2019-02-26 at 08:57 +0100, Håkon Bugge wrote: > > > During certain workloads, the default CM response timeout is too > > > short, leading to excessive retries. Hence, make it configurable > > > through sysctl. While at it, also make number of CM retries > > > configurable. > > > > > > The defaults are not changed. > > > > > > Signed-off-by: Håkon Bugge > > > --- > > > v1 -> v2: > > > * Added unregister_net_sysctl_table() in cma_cleanup() > > > --- > > > drivers/infiniband/core/cma.c | 52 > > > ++--- > > > -- > > > 1 file changed, 45 insertions(+), 7 deletions(-) > > > > This has been sitting on patchworks since forever. Presumably > > because > > Jason and I neither one felt like we really wanted it, but also > > couldn't justify flat refusing it. > > I thought the agreement was to use NL and iproute2. But I haven't had > the capacity. To be fair, the email thread was gone from my linux-rdma folder. So, I just had to review the entry in patchworks, and there was no captured discussion there. So, if the agreement was made, it must have been face to face some time and if I was involed, I had certainly forgotten by now. But I still needed to clean up patchworks, hence my email ;-). > > Well, I've made up my mind, so > > unless Jason wants to argue the other side, I'm rejecting this > > patch. > > Here's why. The whole concept of a timeout is to help recovery in > > a > > situation that overloads one end of the connection. There is a > > relationship between the max queue backlog on the one host and the > > timeout on the other host. > > If you refer to the backlog parameter in rdma_listen(), I cannot see > it being used at all for IB. No, not exactly. I was more referring to heavy load causing an overflow in the mad packet receive processing. We have IB_MAD_QP_RECV_SIZE set to 512 by default, but it can be changed at module load time of the ib_core module and that represents the maximum number of backlogged mad packets we can have waiting to be processed before we just drop them on the floor. There can be other places to drop them too, but this is the one I was referring to. > For CX-3, which is paravirtualized wrt. MAD packets, it is the proxy > UD receive queue length for the PF driver that can be construed as a > backlog. Remember that any MAD packet being sent from a VF or the PF > itself, is sent to a proxy UD QP in the PF. Those packets are then > multiplexed out on the real QP0/1. Incoming MAD packets are > demultiplexed and sent once more to the proxy QP in the VF. > > > Generally, in order for a request to get > > dropped and us to need to retransmit, the queue must already have a > > full backlog. So, how long does it take a heavily loaded system to > > process a full backlog? That, plus a fuzz for a margin of error, > > should be our timeout. We shouldn't be asking users to configure > > it. > > Customer configures #VMs and different workload may lead to way > different number of CM connections. The proxying of MAD packet > through the PF driver has a finite packet rate. With 64 VMs, 10.000 > QPs on each, all going down due to a switch failing or similar, you > have 640.000 DREQs to be sent, and with the finite packet rate of MA > packets through the PF, this takes more than the current CM timeout. > And then you re-transmit and increase the burden of the PF proxying. > > So, we can change the default to cope with this. But, a MAD packet is > unreliable, we may have transient loss. In this case, we want a short > timeout. > > > However, if users change the default backlog queue on their > > systems, > > *then* it would make sense to have the users also change the > > timeout > > here, but I think guidance would be helpful. > > > > So, to revive this patch, what I'd like to see is some attempt to > > actually quantify a reasonable timeout for the default backlog > > depth, > > then the patch should actually change the default to that > > reasonable > > timeout, and then put in the ability to adjust the timeout with > > some > > sort of doc guidance on how to calculate a reasonable timeout based > > on > > configured backlog depth. > > I can agree to this :-) > > > Thxs, Håkon > > > -- > > Doug Ledford > >GPG KeyID: B826A3330E572FDD > >Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 > > 2FDD -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH v2] RDMA/cma: Make CM response timeout and # CM retries configurable
On Thu, 2019-06-13 at 08:28 -0700, Bart Van Assche wrote: > On 6/13/19 7:25 AM, Doug Ledford wrote: > > So, to revive this patch, what I'd like to see is some attempt to > > actually quantify a reasonable timeout for the default backlog > > depth, > > then the patch should actually change the default to that > > reasonable > > timeout, and then put in the ability to adjust the timeout with > > some > > sort of doc guidance on how to calculate a reasonable timeout based > > on > > configured backlog depth. > > How about following the approach of the SRP initiator driver? It > derives > the CM timeout from the subnet manager timeout. The assumption > behind > this is that in large networks the subnet manager timeout has to be > set > higher than its default to make communication work. See also > srp_get_subnet_timeout(). Theoretically, the subnet manager needs a longer timeout in a bigger network because it's handling more data as a single point of lookup for the entire subnet. Individual machines, on the other hand, have the same backlog size (by default) regardless of the size of the network, and there is no guarantee that if the admin increased the subnet manager timeout, that they also increased the backlog queue depth size. So, while I like things that auto-tune like you are suggesting, the problem is that the one item does not directly correlate with the other. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH v2] RDMA/cma: Make CM response timeout and # CM retries configurable
On Tue, 2019-02-26 at 08:57 +0100, Håkon Bugge wrote: > During certain workloads, the default CM response timeout is too > short, leading to excessive retries. Hence, make it configurable > through sysctl. While at it, also make number of CM retries > configurable. > > The defaults are not changed. > > Signed-off-by: Håkon Bugge > --- > v1 -> v2: >* Added unregister_net_sysctl_table() in cma_cleanup() > --- > drivers/infiniband/core/cma.c | 52 ++--- > -- > 1 file changed, 45 insertions(+), 7 deletions(-) This has been sitting on patchworks since forever. Presumably because Jason and I neither one felt like we really wanted it, but also couldn't justify flat refusing it. Well, I've made up my mind, so unless Jason wants to argue the other side, I'm rejecting this patch. Here's why. The whole concept of a timeout is to help recovery in a situation that overloads one end of the connection. There is a relationship between the max queue backlog on the one host and the timeout on the other host. Generally, in order for a request to get dropped and us to need to retransmit, the queue must already have a full backlog. So, how long does it take a heavily loaded system to process a full backlog? That, plus a fuzz for a margin of error, should be our timeout. We shouldn't be asking users to configure it. However, if users change the default backlog queue on their systems, *then* it would make sense to have the users also change the timeout here, but I think guidance would be helpful. So, to revive this patch, what I'd like to see is some attempt to actually quantify a reasonable timeout for the default backlog depth, then the patch should actually change the default to that reasonable timeout, and then put in the ability to adjust the timeout with some sort of doc guidance on how to calculate a reasonable timeout based on configured backlog depth. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: manual merge of the rdma-fixes tree with Linus' tree
On Tue, 2019-04-30 at 08:13 +1000, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the rdma-fixes tree got a conflict in: > > drivers/infiniband/core/uverbs_main.c > > between commit: > > 6a5c5d26c4c6 ("rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE > use") > > from Linus' tree and commit: > > d79a26b99f5f ("RDMA/uverbs: Fix compilation error on s390 and mips > platforms") > > from the rdma-fixes tree. > > I fixed it up (I just used the version from Linus' tree) and can carry the > fix as necessary. This is now fixed as far as linux-next is concerned, > but any non trivial conflicts should be mentioned to your upstream > maintainer when your tree is submitted for merging. You may also want > to consider cooperating with the maintainer of the conflicting tree to > minimise any particularly complex conflicts. > Sorry, I forgot to back that head commit out. Once Linus committed his fix the one in the rdma tree was superfluous (and wrong anyway). -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [GIT PULL] Please pull rdma.git
On Mon, 2019-04-29 at 21:00 +0300, Leon Romanovsky wrote: > On Mon, Apr 29, 2019 at 01:13:01PM -0400, Doug Ledford wrote: > > On Mon, 2019-04-29 at 09:48 -0700, Linus Torvalds wrote: > > > On Mon, Apr 29, 2019 at 9:29 AM Doug Ledford wrote: > > > > drivers/infiniband/core/uverbs_main.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > This trivial one-liner is actually incorrect. > > > > > > It should use 'vmf->address', because the point of the ZERO_PAGE > > > argument is to pick the page with the right virtual address alias for > > > broken architectures that need those kinds. > > > > > > I'm actually surprised s390 wants it, usually it's just MIPS that has > > > the horribly broken virtual address translation stuff. But it looks > > > like for s390 it's at least only a performance issue (ie it causes > > > some aliases in L1 that cause cacheline ping-pong rather than anything > > > else). > > > > That's what I get for listening to Jason ;-) > > > > Well, since you have just essentially re-written the patch to be > > correct, you are now the developer of origin. Do you want to commit the > > fix directly or shall I respin it for you to pull? > > Linus already committed it. > > The whole breakage is mystery for me, the patch which introduced > breakage, got successful build report from 0-build. Yeah, successful 0-day build is a good indicator, but not a guarantee. You just can't test all possible permutations and sometimes things slip by. But it's still way better than 20 years ago when half the patches weren't even compile tested before coming to Linus ;-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [GIT PULL] Please pull rdma.git
On Mon, 2019-04-29 at 09:48 -0700, Linus Torvalds wrote: > On Mon, Apr 29, 2019 at 9:29 AM Doug Ledford wrote: > > > > drivers/infiniband/core/uverbs_main.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > This trivial one-liner is actually incorrect. > > It should use 'vmf->address', because the point of the ZERO_PAGE > argument is to pick the page with the right virtual address alias for > broken architectures that need those kinds. > > I'm actually surprised s390 wants it, usually it's just MIPS that has > the horribly broken virtual address translation stuff. But it looks > like for s390 it's at least only a performance issue (ie it causes > some aliases in L1 that cause cacheline ping-pong rather than anything > else). That's what I get for listening to Jason ;-) Well, since you have just essentially re-written the patch to be correct, you are now the developer of origin. Do you want to commit the fix directly or shall I respin it for you to pull? -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [GIT PULL] Please pull RDMA subsystem changes
On Mon, 2019-04-29 at 11:42 -0400, Doug Ledford wrote: > On Mon, 2019-04-29 at 08:40 +, Jason Gunthorpe wrote: > > On Mon, Apr 29, 2019 at 08:09:47AM +0200, Heiko Carstens wrote: > > > On Sun, Apr 28, 2019 at 11:52:12AM +, Jason Gunthorpe wrote: > > > > Hi Linus, > > > > > > > > Third rc pull request > > > > > > > > Nothing particularly special here. There is a small merge conflict > > > > with Adrea's mm_still_valid patches which is resolved as below: > > > ... > > > > Jason Gunthorpe (3): > > > > RDMA/mlx5: Do not allow the user to write to the clock page > > > > RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages > > > > RDMA/ucontext: Fix regression with disassociate > > > > > > This doesn't compile. The patch below would fix it, but not sure if > > > this is what is intended: > > > > > > drivers/infiniband/core/uverbs_main.c: In function 'rdma_umap_fault': > > > drivers/infiniband/core/uverbs_main.c:898:28: error: 'struct vm_fault' > > > has no member named 'vm_start' > > >vmf->page = ZERO_PAGE(vmf->vm_start); > > > ^~ > > > diff --git a/drivers/infiniband/core/uverbs_main.c > > > b/drivers/infiniband/core/uverbs_main.c > > > index 7843e89235c3..65fe89b3fa2d 100644 > > > +++ b/drivers/infiniband/core/uverbs_main.c > > > @@ -895,7 +895,7 @@ static vm_fault_t rdma_umap_fault(struct vm_fault > > > *vmf) > > > > > > /* Read only pages can just use the system zero page. */ > > > if (!(vmf->vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) { > > > - vmf->page = ZERO_PAGE(vmf->vm_start); > > > + vmf->page = ZERO_PAGE(vmf->vma->vm_start); > > > get_page(vmf->page); > > > return 0; > > > } > > > > > > > Thanks Heiko, this looks right to me. > > > > I'm surprised to be seeing this at this point, these patches should > > have been seen by 0 day for several days now, and they were in > > linux-next already too.. > > > > Doug, can you send this to Linus today? > > Yep. > Done. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
[GIT PULL] Please pull rdma.git
Hi Linus, As per the thread on our last pull request, this is the two line build fix for s390 and mips. The following changes since commit 2557fabd6e29f349bfa0ac13f38ac98aa5eafc74: RDMA/hns: Bugfix for mapping user db (2019-04-25 10:40:04 -0300) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for- linus for you to fetch changes up to d79a26b99f5f40db6863b1973750fd1d134d99b4: RDMA/uverbs: Fix compilation error on s390 and mips platforms (2019- 04-29 11:47:55 -0400) Pull request for 5.1-rc - Build fix for the last pull request on s390 Leon Romanovsky (1): RDMA/uverbs: Fix compilation error on s390 and mips platforms drivers/infiniband/core/uverbs_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [GIT PULL] Please pull RDMA subsystem changes
On Mon, 2019-04-29 at 08:40 +, Jason Gunthorpe wrote: > On Mon, Apr 29, 2019 at 08:09:47AM +0200, Heiko Carstens wrote: > > On Sun, Apr 28, 2019 at 11:52:12AM +, Jason Gunthorpe wrote: > > > Hi Linus, > > > > > > Third rc pull request > > > > > > Nothing particularly special here. There is a small merge conflict > > > with Adrea's mm_still_valid patches which is resolved as below: > > ... > > > Jason Gunthorpe (3): > > > RDMA/mlx5: Do not allow the user to write to the clock page > > > RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages > > > RDMA/ucontext: Fix regression with disassociate > > > > This doesn't compile. The patch below would fix it, but not sure if > > this is what is intended: > > > > drivers/infiniband/core/uverbs_main.c: In function 'rdma_umap_fault': > > drivers/infiniband/core/uverbs_main.c:898:28: error: 'struct vm_fault' has > > no member named 'vm_start' > >vmf->page = ZERO_PAGE(vmf->vm_start); > > ^~ > > diff --git a/drivers/infiniband/core/uverbs_main.c > > b/drivers/infiniband/core/uverbs_main.c > > index 7843e89235c3..65fe89b3fa2d 100644 > > +++ b/drivers/infiniband/core/uverbs_main.c > > @@ -895,7 +895,7 @@ static vm_fault_t rdma_umap_fault(struct vm_fault *vmf) > > > > /* Read only pages can just use the system zero page. */ > > if (!(vmf->vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) { > > - vmf->page = ZERO_PAGE(vmf->vm_start); > > + vmf->page = ZERO_PAGE(vmf->vma->vm_start); > > get_page(vmf->page); > > return 0; > > } > > > > Thanks Heiko, this looks right to me. > > I'm surprised to be seeing this at this point, these patches should > have been seen by 0 day for several days now, and they were in > linux-next already too.. > > Doug, can you send this to Linus today? Yep. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] cxgb4: fix undefined behavior in mem.c
On Thu, 2019-02-28 at 16:57 -0700, Shaobo He wrote: > Good catch. But if we agree on that memory management functions are those > specified by the C standard, would it be OK to ignore so-called use after > free > or double free bugs for the kernel as C standard does not apply to kfree? No, most kernel use-after-free bugs are real bugs. This one might be technically a bug by certain readings of the standard, but it's a non- issue. Real use-after-free bugs don't just look at the value of a local stack variable to get the memory's old address (which is what this does, and the same could be achieved and be totally in spec by doing this: old_ptr = mhp; kfree(mhp); pr_debug("%p\n", old_ptr);) Real use after free things would actually dereference the pointer to either read or write from the old memory region. That leads to data corruption or kernel data leaks. Plus, in this case, the purpose of printing the literal value of mhp is simply to provide a unique name for tracing purposes. Since kfree() doesn't alter the local stack variable, the name is still present in the local stack variable at the point you call pr_debug(). It could be fixed. It's not like this patch is wrong. But I wouldn't submit it this late in the -rc cycle, I'd just take it for next. > On 2/28/19 4:33 PM, Bart Van Assche wrote: > > On Thu, 2019-02-28 at 16:18 -0700, Shaobo He wrote: > > > I can't afford a pdf version of the C standard. So I looked at the draft > > > version > > > used in the link I put in the commit message. It says (in 6.2.4:2), > > > > > > ``` > > > The lifetime of an object is the portion of program execution during which > > > storage is guaranteed to be reserved for it. An object exists, has a > > > constant > > > address, and retains its last-stored value throughout its lifetime. If an > > > object > > > is referred to outside of its lifetime, the behavior is undefined. The > > > value of > > > a pointer becomes indeterminate when the object it points to (or just > > > past) > > > reaches the end of its lifetime. > > > ``` > > > I couldn't find the definition of lifetime over a dynamically allocated > > > object > > > in the draft of C standard. I refer to this link > > > (https://en.cppreference.com/w/c/language/lifetime) which suggests that > > > the > > > lifetime of an allocated object ends after the deallocation function is > > > called > > > upon it. > > > > > > I think maybe the more problematic issue is that the value of a freed > > > pointer is > > > intermediate. > > > > In another section of the same draft I found the following: > > > > J.2 Undefined behavior [ ... ] The value of a pointer that refers to space > > deallocated by a call to the free or realloc function is used (7.22.3). > > > > Since the C standard explicitly refers to free() and realloc(), does that > > mean that that statement about undefined behavior does not apply to munmap() > > (for user space code) nor to kfree() (for kernel code)? > > > > Bart. > > -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: manual merge of the net-next tree with the rdma tree
On Wed, 2019-02-27 at 11:25 +1100, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the net-next tree got a conflict in: > > drivers/infiniband/hw/mlx4/Kconfig > > between commit: > > 6fa8f1afd337 ("IB/{core,uverbs}: Move ib_umem_xxx functions from ib_core to > ib_uverbs") > > from the rdma tree and commit: > > f4b6bcc7002f ("net: devlink: turn devlink into a built-in") > > from the net-next tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > Thanks Stephen, we'll add it to the (largish this release) list of conflicts to bring to Linus' attention. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable
> On Feb 22, 2019, at 12:14 PM, Steve Wise wrote: > > > On 2/22/2019 10:36 AM, Jason Gunthorpe wrote: >> On Sun, Feb 17, 2019 at 06:09:09PM +0100, Håkon Bugge wrote: >>> During certain workloads, the default CM response timeout is too >>> short, leading to excessive retries. Hence, make it configurable >>> through sysctl. While at it, also make number of CM retries >>> configurable. >>> >>> The defaults are not changed. >>> >>> Signed-off-by: Håkon Bugge >>> drivers/infiniband/core/cma.c | 51 ++- >>> 1 file changed, 44 insertions(+), 7 deletions(-) >>> >>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >>> index c43512752b8a..ce99e1cd1029 100644 >>> +++ b/drivers/infiniband/core/cma.c >>> @@ -43,6 +43,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> >>> #include >>> @@ -68,13 +69,46 @@ MODULE_AUTHOR("Sean Hefty"); >>> MODULE_DESCRIPTION("Generic RDMA CM Agent"); >>> MODULE_LICENSE("Dual BSD/GPL"); >>> >>> -#define CMA_CM_RESPONSE_TIMEOUT 20 >>> #define CMA_QUERY_CLASSPORT_INFO_TIMEOUT 3000 >>> -#define CMA_MAX_CM_RETRIES 15 >>> #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24) >>> #define CMA_IBOE_PACKET_LIFETIME 18 >>> #define CMA_PREFERRED_ROCE_GID_TYPE IB_GID_TYPE_ROCE_UDP_ENCAP >>> >>> +#define CMA_DFLT_CM_RESPONSE_TIMEOUT 20 >>> +static int cma_cm_response_timeout = CMA_DFLT_CM_RESPONSE_TIMEOUT; >>> +static int cma_cm_response_timeout_min = 8; >>> +static int cma_cm_response_timeout_max = 31; >>> +#undef CMA_DFLT_CM_RESPONSE_TIMEOUT >>> + >>> +#define CMA_DFLT_MAX_CM_RETRIES 15 >>> +static int cma_max_cm_retries = CMA_DFLT_MAX_CM_RETRIES; >>> +static int cma_max_cm_retries_min = 1; >>> +static int cma_max_cm_retries_max = 100; >>> +#undef CMA_DFLT_MAX_CM_RETRIES >>> + >>> +static struct ctl_table_header *cma_ctl_table_hdr; >>> +static struct ctl_table cma_ctl_table[] = { >>> + { >>> + .procname = "cma_cm_response_timeout", >>> + .data = &cma_cm_response_timeout, >>> + .maxlen = sizeof(cma_cm_response_timeout), >>> + .mode = 0644, >>> + .proc_handler = proc_dointvec_minmax, >>> + .extra1 = &cma_cm_response_timeout_min, >>> + .extra2 = &cma_cm_response_timeout_max, >>> + }, >>> + { >>> + .procname = "cma_max_cm_retries", >>> + .data = &cma_max_cm_retries, >>> + .maxlen = sizeof(cma_max_cm_retries), >>> + .mode = 0644, >>> + .proc_handler = proc_dointvec_minmax, >>> + .extra1 = &cma_max_cm_retries_min, >>> + .extra2 = &cma_max_cm_retries_max, >>> + }, >>> + { } >>> +}; >> Is sysctl the right approach here? Should it be rdma tool instead? >> >> Jason > > There are other rdma sysctls currently: net.rdma_ucm.max_backlog and > net.iw_cm.default_backlog. The core network stack seems to use sysctl > and not ip tool to set basically globals. > > To use rdma tool, we'd have to have some concept of a "module" object, I > guess. IE there's dev, link, and resource rdma tool objects currently. > But these cma timeout settings are really not per dev, link, nor a > resource. Maybe we have just a "core" object: rdma core set > cma_max_cm_retries min 8 max 30. I don’t know, I think you make a fairly good argument for leaving it as a sysctl. We have infrastructure in place for admins to set persistent sysctl settings. The per device/link settings need something different because link names and such can change. Since these are globals, I’d leave them where they are. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: Message signed with OpenPGP
Re: linux-next: build warning after merge of the rdma tree
On Thu, 2019-02-14 at 11:18 +1100, Stephen Rothwell wrote: > Hi all, > > After merging the rdma tree, today's linux-next build (x86_64 > allmodconfig) produced this warning: > > drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe': > drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall > through [-Wimplicit-fallthrough=] >hfi1_setup_tid_rdma_wqe(qp, wqe); >^~~~ > drivers/infiniband/hw/hfi1/qp.c:329:2: note: here > case IB_QPT_UC: > ^~~~ > > Introduced by commit > > f1ab4efa6d32 ("IB/hfi1: Enable TID RDMA READ protocol") > > I get this warning because I am building with -Wimplicit-fallthrough > in attempt to catch new additions early. The gcc warning can be turned > off by adding a /* fall through */ comment at the point the fall through > happens (assuming that the fall through is intentional). > Thanks Stephen, we'll sort it and make an appropriate fixup patch. Kaike? -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
I think I've finally wrapped my head around all of this. Let's see if I have this right: * People are using filesystem DAX to expose byte addressable persistent memory because putting a filesystem on the memory makes an easy way to organize the data in the memory and share it between various processes. It's worth noting that this is not the only way to share this memory, and arguably not even the best way, but it's what people are doing. However, to get byte level addressability on the remote side, we must create files on the server side, mmap those files, then give a handle to the memory region to the client side that the client then addresses on a byte by byte basis. This is because all of the normal kernel based device sharing mechanisms are block based and don't provide byte level addressability. * People are asking for thin allocations, reflinks, deduplication, whatever else because persistent memory is still small in terms of size compared to the amount of data people want to put on it, so these techniques stretch its usefulness. * Because there is no kernel level mechanism for sharing byte addressable memory, this only works with specific applications that have been written to create files on byte addressable memory, mmap them, then share them out via RDMA. I bring this up because in the video linked in this email, Oracle is gushing about how great this feature is. But it's important to understand that this only works because the Oracle processes themselves are the filesystem sharing entity. That means at other points in this conversation where we've talked about the need for forward progress, and non-ODP hardware, and the talk has come down to sending SIGKILL to a process in order to free memory reservations, I feel confident in saying that Oracle would *never* agree to this. If you kill an Oracle process to make forward progress, you are probably also killing the very process that needed you to make progress in the first place. I'm pretty confident that Oracle will have no problem what-so-ever saying that ODP capable hardware is a hard requirement for using their software with DAX. * So if Oracle is likely to demand ODP hardware, period, are there other scenarios that might be more accepting of a more limited FS on top of DAX that doesn't support reflinks and deduplication? I can think of a possible yes to that answer rather easily. Message brokerage servers (amqp, qpid) have strict requirements about receiving a message and then making sure that it makes it once, and only once, to all subscribed receivers. A natural way of organizing this sort of thing is to create a persistent ring buffer for incoming messages, one per each connecting client that is sending messages. Then a log file for each client you are sending messages back out to. Putting these files on persistent memory and then mapping the ring buffer to the clients, and writing your own transmission journals to the persistent memory, would allow the program to be very robust in the face of a program or system crash. This sort of usage would not require any thin allocations, reflinks, or other such features, and yet would still find the filesystem organization useful. Therefore I think the answer is yes, there are at least some use cases that would find a less featureful filesystem that works with persistent memory and RDMA but without ODP to be of value. * Really though, as I said in my email to Tom Talpey, this entire situation is simply screaming that we are doing DAX networking wrong. We shouldn't be writing the networking code once in every single application that wants to do this. If we had a memory segment that we shared from server to client(s), and in that memory segment we implemented a clustered filesystem, then applications would simply mmap local files and be done with it. If the file needed to move, the kernel would update the mmap in the application, done. If you ask me, it is the attempt to do this the wrong way that is resulting in all this heartache. That said, for today, my recommendation would be to require ODP hardware for XFS filesystem with the DAX option, but allow ext2 filesystems to mount DAX filesystems on non-ODP hardware, and go in and modify the ext2 filesystem so that on DAX mounts, it disables hole punch and ftrunctate any time they would result in the forced removal of an established mmap. On Wed, 2019-02-06 at 14:44 -0800, Dan Williams wrote: > On Wed, Feb 6, 2019 at 2:25 PM Doug Ledford wrote: > > On Wed, 2019-02-06 at 15:08 -0700, Jason Gunthorpe wrote: > > > On Thu, Feb 07, 2019 at 08:03:56AM +1100, Dave Chinner wrote: > > > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote: > > > > > On Wed, 6 Feb 2019, Doug Ledford wrote: > > > > > > > > > > > > Most of the cases we want revoke for are things like truncate(). > >
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Thu, 2019-02-07 at 10:41 -0500, Tom Talpey wrote: > On 2/7/2019 10:37 AM, Doug Ledford wrote: > > On Thu, 2019-02-07 at 10:28 -0500, Tom Talpey wrote: > > > On 2/7/2019 10:04 AM, Chuck Lever wrote: > > > > > On Feb 7, 2019, at 12:23 AM, Jason Gunthorpe wrote: > > > > > > > > > > On Thu, Feb 07, 2019 at 02:52:58PM +1100, Dave Chinner wrote: > > > > > > > > > > > Requiring ODP capable hardware and applications that control RDMA > > > > > > access to use file leases and be able to cancel/recall client side > > > > > > delegations (like NFS is already able to do!) seems like a pretty > > > > > > > > > > So, what happens on NFS if the revoke takes too long? > > > > > > > > NFS distinguishes between "recall" and "revoke". Dave used "recall" > > > > here, it means that the server recalls the client's delegation. If > > > > the client doesn't respond, the server revokes the delegation > > > > unilaterally and other users are allowed to proceed. > > > > > > The SMB3 protocol has a similar "lease break" mechanism, btw. > > > > > > SMB3 "push mode" has long-expected to allow DAX mapping of files > > > only when an exclusive lease is held by the requesting client. > > > The server may recall the lease if the DAX mapping needs to change. > > > > > > Once local (MMU) and remote (RDMA) mappings are dropped, the > > > client may re-request that the server reestablish them. No > > > connection or process is terminated, and no data is silently lost. > > > > Yeah, but you're referring to a situation where the communication agent > > and the filesystem agent are one and the same and they work > > cooperatively to resolve the issue. With DAX under Linux, the > > filesystem agent and the communication agent are separate, and right > > now, to my knowledge, the filesystem agent doesn't tell the > > communication agent about a broken lease, it want's to be able to do > > things 100% transparently without any work on the communication agent's > > part. That works for ODP, but not for anything else. If the filesystem > > notified the communication agent of the need to drop the MMU region and > > rebuild it, the communication agent could communicate that to the remote > > host, and things would work. But there's no POSIX message for "your > > file is moving on media, redo your mmap". > > Indeed, the MMU notifier and the filesystem need to be integrated. And right now, the method of sharing this across the network is: persistent memory in machine local filesystem supporting a DAX mount custom application that knows how to mmap then rdma map files, and can manage the connection long term The point being that every single method of sharing this stuff is a one off custom application (Oracle just being one). I'm not really all that thrilled about the idea of writing the same mmap/rdma map/oob-management code in every single app out there. To me, this problem is screaming for a more general purpose kernel solution, just like NVMe over Fabrics. I'm thinking a clustered filesystem on top of a shared memory segment between hosts is a much more natural fit. Then applications just mmap the files locally, and the kernel does the rest. > I'm unmoved by the POSIX argument. This stuff didn't happen in 1990. > > Tom. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Thu, 2019-02-07 at 10:28 -0500, Tom Talpey wrote: > On 2/7/2019 10:04 AM, Chuck Lever wrote: > > > > > On Feb 7, 2019, at 12:23 AM, Jason Gunthorpe wrote: > > > > > > On Thu, Feb 07, 2019 at 02:52:58PM +1100, Dave Chinner wrote: > > > > > > > Requiring ODP capable hardware and applications that control RDMA > > > > access to use file leases and be able to cancel/recall client side > > > > delegations (like NFS is already able to do!) seems like a pretty > > > > > > So, what happens on NFS if the revoke takes too long? > > > > NFS distinguishes between "recall" and "revoke". Dave used "recall" > > here, it means that the server recalls the client's delegation. If > > the client doesn't respond, the server revokes the delegation > > unilaterally and other users are allowed to proceed. > > The SMB3 protocol has a similar "lease break" mechanism, btw. > > SMB3 "push mode" has long-expected to allow DAX mapping of files > only when an exclusive lease is held by the requesting client. > The server may recall the lease if the DAX mapping needs to change. > > Once local (MMU) and remote (RDMA) mappings are dropped, the > client may re-request that the server reestablish them. No > connection or process is terminated, and no data is silently lost. Yeah, but you're referring to a situation where the communication agent and the filesystem agent are one and the same and they work cooperatively to resolve the issue. With DAX under Linux, the filesystem agent and the communication agent are separate, and right now, to my knowledge, the filesystem agent doesn't tell the communication agent about a broken lease, it want's to be able to do things 100% transparently without any work on the communication agent's part. That works for ODP, but not for anything else. If the filesystem notified the communication agent of the need to drop the MMU region and rebuild it, the communication agent could communicate that to the remote host, and things would work. But there's no POSIX message for "your file is moving on media, redo your mmap". -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 14:44 -0800, Dan Williams wrote: > On Wed, Feb 6, 2019 at 2:25 PM Doug Ledford wrote: > > Can someone give me a real world scenario that someone is *actually* > > asking for with this? > > I'll point to this example. At the 6:35 mark Kodi talks about the > Oracle use case for DAX + RDMA. > > https://youtu.be/ywKPPIE8JfQ?t=395 I watched this, and I see that Oracle is all sorts of excited that their storage machines can scale out, and they can access the storage and it has basically no CPU load on the storage server while performing millions of queries. What I didn't hear in there is why DAX has to be in the picture, or why Oracle couldn't do the same thing with a simple memory region exported directly to the RDMA subsystem, or why reflink or any of the other features you talk about are needed. So, while these things may legitimately be needed, this video did not tell me about how/why they are needed, just that RDMA is really, *really* cool for their use case and gets them 0% CPU utilization on their storage servers. I didn't watch the whole thing though. Do they get into that later on? Do they get to that level of technical discussion, or is this all higher level? -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 14:44 -0800, Dan Williams wrote: > On Wed, Feb 6, 2019 at 2:25 PM Doug Ledford wrote: > > On Wed, 2019-02-06 at 15:08 -0700, Jason Gunthorpe wrote: > > > On Thu, Feb 07, 2019 at 08:03:56AM +1100, Dave Chinner wrote: > > > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote: > > > > > On Wed, 6 Feb 2019, Doug Ledford wrote: > > > > > > > > > > > > Most of the cases we want revoke for are things like truncate(). > > > > > > > Shouldn't happen with a sane system, but we're trying to avoid > > > > > > > users > > > > > > > doing awful things like being able to DMA to pages that are now > > > > > > > part of > > > > > > > a different file. > > > > > > > > > > > > Why is the solution revoke then? Is there something besides > > > > > > truncate > > > > > > that we have to worry about? I ask because EBUSY is not currently > > > > > > listed as a return value of truncate, so extending the API to > > > > > > include > > > > > > EBUSY to mean "this file has pinned pages that can not be freed" is > > > > > > not > > > > > > (or should not be) totally out of the question. > > > > > > > > > > > > Admittedly, I'm coming in late to this conversation, but did I miss > > > > > > the > > > > > > portion where that alternative was ruled out? > > > > > > > > > > Coming in late here too but isnt the only DAX case that we are > > > > > concerned > > > > > about where there was an mmap with the O_DAX option to do direct write > > > > > though? If we only allow this use case then we may not have to worry > > > > > about > > > > > long term GUP because DAX mapped files will stay in the physical > > > > > location > > > > > regardless. > > > > > > > > No, that is not guaranteed. Soon as we have reflink support on XFS, > > > > writes will physically move the data to a new physical location. > > > > This is non-negotiatiable, and cannot be blocked forever by a gup > > > > pin. > > > > > > > > IOWs, DAX on RDMA requires a) page fault capable hardware so that > > > > the filesystem can move data physically on write access, and b) > > > > revokable file leases so that the filesystem can kick userspace out > > > > of the way when it needs to. > > > > > > Why do we need both? You want to have leases for normal CPU mmaps too? > > > > > > > Truncate is a red herring. It's definitely a case for revokable > > > > leases, but it's the rare case rather than the one we actually care > > > > about. We really care about making copy-on-write capable filesystems > > > > like > > > > XFS work with DAX (we've got people asking for it to be supported > > > > yesterday!), and that means DAX+RDMA needs to work with storage that > > > > can change physical location at any time. > > > > > > Then we must continue to ban longterm pin with DAX.. > > > > > > Nobody is going to want to deploy a system where revoke can happen at > > > any time and if you don't respond fast enough your system either locks > > > with some kind of FS meltdown or your process gets SIGKILL. > > > > > > I don't really see a reason to invest so much design work into > > > something that isn't production worthy. > > > > > > It *almost* made sense with ftruncate, because you could architect to > > > avoid ftruncate.. But just any FS op might reallocate? Naw. > > > > > > Dave, you said the FS is responsible to arbitrate access to the > > > physical pages.. > > > > > > Is it possible to have a filesystem for DAX that is more suited to > > > this environment? Ie designed to not require block reallocation (no > > > COW, no reflinks, different approach to ftruncate, etc) > > > > Can someone give me a real world scenario that someone is *actually* > > asking for with this? > > I'll point to this example. At the 6:35 mark Kodi talks about the > Oracle use case for DAX + RDMA. > > https://youtu.be/ywKPPIE8JfQ?t=395 Thanks for the link, I'll review the panel. > Currently the only way to
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 15:08 -0700, Jason Gunthorpe wrote: > On Thu, Feb 07, 2019 at 08:03:56AM +1100, Dave Chinner wrote: > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote: > > > On Wed, 6 Feb 2019, Doug Ledford wrote: > > > > > > > > Most of the cases we want revoke for are things like truncate(). > > > > > Shouldn't happen with a sane system, but we're trying to avoid users > > > > > doing awful things like being able to DMA to pages that are now part > > > > > of > > > > > a different file. > > > > > > > > Why is the solution revoke then? Is there something besides truncate > > > > that we have to worry about? I ask because EBUSY is not currently > > > > listed as a return value of truncate, so extending the API to include > > > > EBUSY to mean "this file has pinned pages that can not be freed" is not > > > > (or should not be) totally out of the question. > > > > > > > > Admittedly, I'm coming in late to this conversation, but did I miss the > > > > portion where that alternative was ruled out? > > > > > > Coming in late here too but isnt the only DAX case that we are concerned > > > about where there was an mmap with the O_DAX option to do direct write > > > though? If we only allow this use case then we may not have to worry about > > > long term GUP because DAX mapped files will stay in the physical location > > > regardless. > > > > No, that is not guaranteed. Soon as we have reflink support on XFS, > > writes will physically move the data to a new physical location. > > This is non-negotiatiable, and cannot be blocked forever by a gup > > pin. > > > > IOWs, DAX on RDMA requires a) page fault capable hardware so that > > the filesystem can move data physically on write access, and b) > > revokable file leases so that the filesystem can kick userspace out > > of the way when it needs to. > > Why do we need both? You want to have leases for normal CPU mmaps too? > > > Truncate is a red herring. It's definitely a case for revokable > > leases, but it's the rare case rather than the one we actually care > > about. We really care about making copy-on-write capable filesystems like > > XFS work with DAX (we've got people asking for it to be supported > > yesterday!), and that means DAX+RDMA needs to work with storage that > > can change physical location at any time. > > Then we must continue to ban longterm pin with DAX.. > > Nobody is going to want to deploy a system where revoke can happen at > any time and if you don't respond fast enough your system either locks > with some kind of FS meltdown or your process gets SIGKILL. > > I don't really see a reason to invest so much design work into > something that isn't production worthy. > > It *almost* made sense with ftruncate, because you could architect to > avoid ftruncate.. But just any FS op might reallocate? Naw. > > Dave, you said the FS is responsible to arbitrate access to the > physical pages.. > > Is it possible to have a filesystem for DAX that is more suited to > this environment? Ie designed to not require block reallocation (no > COW, no reflinks, different approach to ftruncate, etc) Can someone give me a real world scenario that someone is *actually* asking for with this? Are DAX users demanding xfs, or is it just the filesystem of convenience? Do they need to stick with xfs? Are they really trying to do COW backed mappings for the RDMA targets? Or do they want a COW backed FS but are perfectly happy if the specific RDMA targets are *not* COW and are statically allocated? > > And that's the real problem we need to solve here. RDMA has no trust > > model other than "I'm userspace, I pinned you, trust me!". That's > > not good enough for FS-DAX+RDMA > > It is baked into the silicon, and I don't see much motion on this > front right now. My best hope is that IOMMU PASID will get widely > deployed and RDMA silicon will arrive that can use it. Seems to be > years away, if at all. > > At least we have one chip design that can work in a page faulting mode > .. > > Jason -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 13:04 -0800, Dan Williams wrote: > On Wed, Feb 6, 2019 at 12:14 PM Doug Ledford wrote: > > On Wed, 2019-02-06 at 11:45 -0800, Dan Williams wrote: > > > On Wed, Feb 6, 2019 at 10:52 AM Jason Gunthorpe wrote: > > > > On Wed, Feb 06, 2019 at 10:35:04AM -0800, Matthew Wilcox wrote: > > > > > > > > > > Admittedly, I'm coming in late to this conversation, but did I miss > > > > > > the > > > > > > portion where that alternative was ruled out? > > > > > > > > > > That's my preferred option too, but the preponderance of opinion leans > > > > > towards "We can't give people a way to make files un-truncatable". > > > > > > > > I haven't heard an explanation why blocking ftruncate is worse than > > > > giving people a way to break RDMA using process by calling ftruncate?? > > > > > > > > Isn't it exactly the same argument the other way? > > > > > > If the > > > RDMA application doesn't want it to happen, arrange for it by > > > permissions or other coordination to prevent truncation, > > > > I just argued the *exact* same thing, except from the other side: if you > > want a guaranteed ability to truncate, then arrange the perms so the > > RDMA or DAX capable things can't use the file. > > That doesn't make sense. All we have to work with is rwx bits. It's > possible to prevents writes / truncates. There's no permission bit for > mmap, O_DIRECT and RDMA mappings, hence leases. There's ownership. What you can't open, you can't mmap or O_DIRECT or whatever... Regardless though, this is mostly moot as Dave's email makes it clear the underlying issue that is the problem is not ftruncate, but other things. > > -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 12:20 -0800, Matthew Wilcox wrote: > On Wed, Feb 06, 2019 at 03:16:02PM -0500, Doug Ledford wrote: > > On Wed, 2019-02-06 at 11:40 -0800, Matthew Wilcox wrote: > > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote: > > > > though? If we only allow this use case then we may not have to worry > > > > about > > > > long term GUP because DAX mapped files will stay in the physical > > > > location > > > > regardless. > > > > > > ... except for truncate. And now that I think about it, there was a > > > desire to support hot-unplug which also needed revoke. > > > > We already support hot unplug of RDMA devices. But it is extreme. How > > does hot unplug deal with a program running from the device (something > > that would have returned ETXTBSY)? > > Not hot-unplugging the RDMA device but hot-unplugging an NV-DIMM. Is an NV-DIMM the only thing we use DAX on? -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 12:49 -0800, Matthew Wilcox wrote: > On Wed, Feb 06, 2019 at 03:47:53PM -0500, Doug Ledford wrote: > > On Wed, 2019-02-06 at 12:41 -0800, Matthew Wilcox wrote: > > > On Wed, Feb 06, 2019 at 03:28:35PM -0500, Doug Ledford wrote: > > > > On Wed, 2019-02-06 at 12:20 -0800, Matthew Wilcox wrote: > > > > > Not hot-unplugging the RDMA device but hot-unplugging an NV-DIMM. > > ^^^ I think you missed this line ^^^ Indeed, I did ;-) > > > You said "now that I think about it, there was a desire to support hot- > > unplug which also needed revoke". For us, hot unplug is done at the > > device level and means all connections must be torn down. So in the > > context of this argument, if people want revoke so DAX can migrate from > > one NV-DIMM to another, ok. But revoke does not help RDMA migrate. > > > > If, instead, you mean that you want to support hot unplug of an NV-DIMM > > that is currently the target of RDMA transfers, then I believe > > Christoph's answer on this is correct. It all boils down to which > > device you are talking about doing the hot unplug on. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 12:41 -0800, Matthew Wilcox wrote: > On Wed, Feb 06, 2019 at 03:28:35PM -0500, Doug Ledford wrote: > > On Wed, 2019-02-06 at 12:20 -0800, Matthew Wilcox wrote: > > > On Wed, Feb 06, 2019 at 03:16:02PM -0500, Doug Ledford wrote: > > > > On Wed, 2019-02-06 at 11:40 -0800, Matthew Wilcox wrote: > > > > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote: > > > > > > though? If we only allow this use case then we may not have to > > > > > > worry about > > > > > > long term GUP because DAX mapped files will stay in the physical > > > > > > location > > > > > > regardless. > > > > > > > > > > ... except for truncate. And now that I think about it, there was a > > > > > desire to support hot-unplug which also needed revoke. > > > > > > > > We already support hot unplug of RDMA devices. But it is extreme. How > > > > does hot unplug deal with a program running from the device (something > > > > that would have returned ETXTBSY)? > > > > > > Not hot-unplugging the RDMA device but hot-unplugging an NV-DIMM. > > > > > > It's straightforward to migrate text pages from one DIMM to another; > > > you remove the PTEs from the CPU's page tables, copy the data over and > > > pagefaults put the new PTEs in place. We don't have a way to do similar > > > things to an RDMA device, do we? > > > > We don't have a means of migration except in the narrowly scoped sense > > of queue pair migration as defined by the IBTA and implemented on some > > dual port IB cards. This narrowly scoped migration even still involves > > notification of the app. > > > > Since there's no guarantee that any other port can connect to the same > > machine as any port that's going away, it would always be a > > disconnect/reconnect sequence in the app to support this, not an under > > the covers migration. > > I don't understand you. We're not talking about migrating from one IB > card to another, we're talking about changing the addresses that an STag > refers to. You said "now that I think about it, there was a desire to support hot- unplug which also needed revoke". For us, hot unplug is done at the device level and means all connections must be torn down. So in the context of this argument, if people want revoke so DAX can migrate from one NV-DIMM to another, ok. But revoke does not help RDMA migrate. If, instead, you mean that you want to support hot unplug of an NV-DIMM that is currently the target of RDMA transfers, then I believe Christoph's answer on this is correct. It all boils down to which device you are talking about doing the hot unplug on. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 12:20 -0800, Matthew Wilcox wrote: > On Wed, Feb 06, 2019 at 03:16:02PM -0500, Doug Ledford wrote: > > On Wed, 2019-02-06 at 11:40 -0800, Matthew Wilcox wrote: > > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote: > > > > though? If we only allow this use case then we may not have to worry > > > > about > > > > long term GUP because DAX mapped files will stay in the physical > > > > location > > > > regardless. > > > > > > ... except for truncate. And now that I think about it, there was a > > > desire to support hot-unplug which also needed revoke. > > > > We already support hot unplug of RDMA devices. But it is extreme. How > > does hot unplug deal with a program running from the device (something > > that would have returned ETXTBSY)? > > Not hot-unplugging the RDMA device but hot-unplugging an NV-DIMM. > > It's straightforward to migrate text pages from one DIMM to another; > you remove the PTEs from the CPU's page tables, copy the data over and > pagefaults put the new PTEs in place. We don't have a way to do similar > things to an RDMA device, do we? We don't have a means of migration except in the narrowly scoped sense of queue pair migration as defined by the IBTA and implemented on some dual port IB cards. This narrowly scoped migration even still involves notification of the app. Since there's no guarantee that any other port can connect to the same machine as any port that's going away, it would always be a disconnect/reconnect sequence in the app to support this, not an under the covers migration. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 11:40 -0800, Matthew Wilcox wrote: > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote: > > > > though? If we only allow this use case then we may not have to worry about > > long term GUP because DAX mapped files will stay in the physical location > > regardless. > > ... except for truncate. And now that I think about it, there was a > desire to support hot-unplug which also needed revoke. We already support hot unplug of RDMA devices. But it is extreme. How does hot unplug deal with a program running from the device (something that would have returned ETXTBSY)? -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 11:45 -0800, Dan Williams wrote: > On Wed, Feb 6, 2019 at 10:52 AM Jason Gunthorpe wrote: > > On Wed, Feb 06, 2019 at 10:35:04AM -0800, Matthew Wilcox wrote: > > > > > > Admittedly, I'm coming in late to this conversation, but did I miss the > > > > portion where that alternative was ruled out? > > > > > > That's my preferred option too, but the preponderance of opinion leans > > > towards "We can't give people a way to make files un-truncatable". > > > > I haven't heard an explanation why blocking ftruncate is worse than > > giving people a way to break RDMA using process by calling ftruncate?? > > > > Isn't it exactly the same argument the other way? > > > If the > RDMA application doesn't want it to happen, arrange for it by > permissions or other coordination to prevent truncation, I just argued the *exact* same thing, except from the other side: if you want a guaranteed ability to truncate, then arrange the perms so the RDMA or DAX capable things can't use the file. > but once the > two conflicting / valid requests have arrived at the filesystem try to > move the result forward to the user requested state not block and fail > indefinitely. Except this is wrong. We already have ETXTBSY, and arguably it is much easier for ETXTBSY to simply kill all of the running processes with extreme prejudice. But we don't do that. We block indefinitely. So, no, there is no expectation that things will "move forward to the user requested state". Not when pages are in use by the kernel, and very arguably pages being used for direct I/O are absolutely in use by the kernel, then truncate blocks. There is a major case of dissonant cognitive behavior here if the syscall supports ETXTBSY, even though the ability to kill apps using the text pages is trivial, but thinks supporting EBUSY is out of the question. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 10:35 -0800, Matthew Wilcox wrote: > On Wed, Feb 06, 2019 at 01:32:04PM -0500, Doug Ledford wrote: > > On Wed, 2019-02-06 at 09:52 -0800, Matthew Wilcox wrote: > > > On Wed, Feb 06, 2019 at 10:31:14AM -0700, Jason Gunthorpe wrote: > > > > On Wed, Feb 06, 2019 at 10:50:00AM +0100, Jan Kara wrote: > > > > > > > > > MM/FS asks for lease to be revoked. The revoke handler agrees with the > > > > > other side on cancelling RDMA or whatever and drops the page pins. > > > > > > > > This takes a trip through userspace since the communication protocol > > > > is entirely managed in userspace. > > > > > > > > Most existing communication protocols don't have a 'cancel operation'. > > > > > > > > > Now I understand there can be HW / communication failures etc. in > > > > > which case the driver could either block waiting or make sure future > > > > > IO will fail and drop the pins. > > > > > > > > We can always rip things away from the userspace.. However.. > > > > > > > > > But under normal conditions there should be a way to revoke the > > > > > access. And if the HW/driver cannot support this, then don't let it > > > > > anywhere near DAX filesystem. > > > > > > > > I think the general observation is that people who want to do DAX & > > > > RDMA want it to actually work, without data corruption, random process > > > > kills or random communication failures. > > > > > > > > Really, few users would actually want to run in a system where revoke > > > > can be triggered. > > > > > > > > So.. how can the FS/MM side provide a guarantee to the user that > > > > revoke won't happen under a certain system design? > > > > > > Most of the cases we want revoke for are things like truncate(). > > > Shouldn't happen with a sane system, but we're trying to avoid users > > > doing awful things like being able to DMA to pages that are now part of > > > a different file. > > > > Why is the solution revoke then? Is there something besides truncate > > that we have to worry about? I ask because EBUSY is not currently > > listed as a return value of truncate, so extending the API to include > > EBUSY to mean "this file has pinned pages that can not be freed" is not > > (or should not be) totally out of the question. > > > > Admittedly, I'm coming in late to this conversation, but did I miss the > > portion where that alternative was ruled out? > > That's my preferred option too, but the preponderance of opinion leans > towards "We can't give people a way to make files un-truncatable". Has anyone looked at the laundry list of possible failures truncate already has? Among others, ETXTBSY is already in the list, and it allows someone to make a file un-truncatable by running it. There's EPERM for multiple failures. In order for someone to make a file untruncatable using this, they would have to have perms to the file already anyway as well as perms to get the direct I/O pin. I see no reason why, if they have the perms to do it, that you don't allow them to. If you don't want someone else to make a file untruncatable that you want to truncate, then don't share file perms with them. What's the difficulty here? Really, creating this complex revoke thing to tear down I/O when people really *don't* want that I/O getting torn down seems like forcing a bad API on I/O to satisfy not doing what is an entirely natural extension to an existing API. You *shouldn't* have the right to truncate a file that is busy, and ETXTBSY is a perfect example of that, and an example of the API done right. This other -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
On Wed, 2019-02-06 at 09:52 -0800, Matthew Wilcox wrote: > On Wed, Feb 06, 2019 at 10:31:14AM -0700, Jason Gunthorpe wrote: > > On Wed, Feb 06, 2019 at 10:50:00AM +0100, Jan Kara wrote: > > > > > MM/FS asks for lease to be revoked. The revoke handler agrees with the > > > other side on cancelling RDMA or whatever and drops the page pins. > > > > This takes a trip through userspace since the communication protocol > > is entirely managed in userspace. > > > > Most existing communication protocols don't have a 'cancel operation'. > > > > > Now I understand there can be HW / communication failures etc. in > > > which case the driver could either block waiting or make sure future > > > IO will fail and drop the pins. > > > > We can always rip things away from the userspace.. However.. > > > > > But under normal conditions there should be a way to revoke the > > > access. And if the HW/driver cannot support this, then don't let it > > > anywhere near DAX filesystem. > > > > I think the general observation is that people who want to do DAX & > > RDMA want it to actually work, without data corruption, random process > > kills or random communication failures. > > > > Really, few users would actually want to run in a system where revoke > > can be triggered. > > > > So.. how can the FS/MM side provide a guarantee to the user that > > revoke won't happen under a certain system design? > > Most of the cases we want revoke for are things like truncate(). > Shouldn't happen with a sane system, but we're trying to avoid users > doing awful things like being able to DMA to pages that are now part of > a different file. Why is the solution revoke then? Is there something besides truncate that we have to worry about? I ask because EBUSY is not currently listed as a return value of truncate, so extending the API to include EBUSY to mean "this file has pinned pages that can not be freed" is not (or should not be) totally out of the question. Admittedly, I'm coming in late to this conversation, but did I miss the portion where that alternative was ruled out? -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
[PULL REQUEST] Please pull rdma.git
Hi Linus, We have 5 small fixes for this pull request. One is a performance regression, so not necessarily strictly a fix, but it was small and reasonable and claimed to avoid thrashing in the scheduler, so I took it. The remaining are all legitimate fixes that match the "we take fixes any time" criteria. Here's the boilerplate: The following changes since commit 7bca603a69c0c239654a8f0bcb99e1a60b30040c: RDMA/mlx5: Initialize return variable in case pagefault was skipped (2018-11-29 15:16:45 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus for you to fetch changes up to 37fbd834b4e492dc41743830cbe435f35120abd8: IB/core: Fix oops in netdev_next_upper_dev_rcu() (2018-12-12 12:14:49 -0500) Pull request for 4.20-rc - One performance regression for hfi1 - One kasan fix for hfi1 - A couple mlx5 fixes - A core oops fix Artemy Kovalyov (1): IB/mlx5: Fix implicit ODP interrupted page fault Mark Zhang (1): IB/core: Fix oops in netdev_next_upper_dev_rcu() Michael J. Ruhl (1): IB/hfi1: Fix a latency issue for small messages Piotr Stankiewicz (1): IB/hfi1: Fix an out-of-bounds access in get_hw_stats Yishai Hadas (1): IB/mlx5: Block DEVX umem from the non applicable cases drivers/infiniband/core/roce_gid_mgmt.c | 3 +++ drivers/infiniband/hw/hfi1/chip.c | 3 ++- drivers/infiniband/hw/hfi1/hfi.h| 2 ++ drivers/infiniband/hw/hfi1/qp.c | 7 +++ drivers/infiniband/hw/hfi1/verbs.c | 2 +- drivers/infiniband/hw/mlx5/devx.c | 4 +++- drivers/infiniband/hw/mlx5/odp.c| 9 - 7 files changed, 22 insertions(+), 8 deletions(-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: manual merge of the mlx5-next tree with the rdma tree
FWIW this will go away in a day or two. I merged mlx5-next into rdma for-next in order to take a series that depended on it. On Wed, 2018-12-05 at 12:07 +1100, Stephen Rothwell wrote: > Hi Leon, > > Today's linux-next merge of the mlx5-next tree got a conflict in: > > drivers/infiniband/hw/mlx5/main.c > drivers/infiniband/hw/mlx5/mlx5_ib.h > > between commit: > > 36e235c88299 ("RDMA/mlx5: Use the uapi disablement APIs instead of code") > > from the rdma tree and commit: > > 81773ce5f07f ("RDMA/mlx5: Use stages for callback to setup and release > DEVX") > > from the mlx5-next tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/uverbs: fix ptr_ret.cocci warnings
On Thu, 2018-11-29 at 16:37 -0700, Jason Gunthorpe wrote: > On Wed, Nov 28, 2018 at 07:21:30AM +0800, kbuild test robot wrote: > > From: kbuild test robot > > > > drivers/infiniband/core/uverbs_cmd.c:1095:1-3: WARNING: PTR_ERR_OR_ZERO can > > be used > > > > > > Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR > > > > Generated by: scripts/coccinelle/api/ptr_ret.cocci > > > > Fixes: 7106a9769715 ("RDMA/uverbs: Make write() handlers return 0 on > > success") > > Signed-off-by: kbuild test robot > > --- > > applied to for-next, thanks > > Jason This caused a conflict with your make write() handlers use a consistent flow series, which I fixed up during git am run. Just FYI in case you want to check out the conflict spot as a double check (but it was a simple fixup). -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/hns: Use 64-bit arithmetic instead of 32-bit
On Thu, 2018-10-18 at 14:01 +0300, Leon Romanovsky wrote: > On Thu, Oct 18, 2018 at 10:02:58AM +0200, Gustavo A. R. Silva wrote: > > Cast *max_num_sg* to u64 in order to give the compiler complete > > information about the proper arithmetic to use. > > > > Notice that such variable is used in a context that expects an > > expression of type u64 (64 bits, unsigned) and the following > > expression is currently being evaluated using 32-bit > > arithmetic: > > And what is wrong with that? > Please fix static analyzer tool instead of fixing proper C code. Judging on the static analyzer tool's message, I don't see anything wrong with it. The code contains a potential unintentional overflow error. The author might have been well aware of the overflow and not cared and in that case this is valid C, but the analyzer has no way of knowing that, so it flags it for review. To silence the checker you could either cast the arithmetic to u64, or cast length to u32. Either would clear up the ambiguity. I guess I'm not seeing why you would blame the static checker in this case, it did the best it is possible for it to do. > Thanks > > > > > length = max_num_sg * page_size; > > > > Addresses-Coverity-ID: 1474517 ("Unintentional integer overflow") > > Signed-off-by: Gustavo A. R. Silva > > --- > > drivers/infiniband/hw/hns/hns_roce_mr.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c > > b/drivers/infiniband/hw/hns/hns_roce_mr.c > > index 521ad2a..d479d5e 100644 > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c > > @@ -1219,7 +1219,7 @@ struct ib_mr *hns_roce_alloc_mr(struct ib_pd *pd, > > enum ib_mr_type mr_type, > > int ret; > > > > page_size = 1 << (hr_dev->caps.pbl_buf_pg_sz + PAGE_SHIFT); > > - length = max_num_sg * page_size; > > + length = (u64)max_num_sg * page_size; > > > > if (mr_type != IB_MR_TYPE_MEM_REG) > > return ERR_PTR(-EINVAL); > > -- > > 2.7.4 > > -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH v3 0/2] IB/mlx4: Enable debug print of SMPs and enhance output for MADs
On Tue, 2018-10-09 at 15:27 +0200, Håkon Bugge wrote: > SMPs were not printed at all. Added printing of port number and TID to > all MADs > > > Håkon Bugge (2): > IB/mlx4: Enable debug print of SMPs > IB/mlx4: Add port and TID to MAD debug print > > drivers/infiniband/hw/mlx4/mad.c | 20 +++- > 1 file changed, 11 insertions(+), 9 deletions(-) > > -- > > v1 -> v2 > * Fixed incorrect format for printing the TID for the second patch > v2 -> v3 >* Made v3 consistently for cover-letter and the two patches, my v2 > was partly original submission plus a v2. That probably fooled > both the kbuild bot and patchwork > > 2.14.3 Thanks, applied. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/ucma: Fix Spectre v1 vulnerability
On Tue, 2018-10-16 at 16:59 +0200, Gustavo A. R. Silva wrote: > hdr.cmd can be indirectly controlled by user-space, hence leading to > a potential exploitation of the Spectre variant 1 vulnerability. > > This issue was detected with the help of Smatch: > > drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential > spectre issue 'ucma_cmd_table' [r] (local cap) > > Fix this by sanitizing hdr.cmd before using it to index > ucm_cmd_table. > > Notice that given that speculation windows are large, the policy is > to kill the speculation on the first load and not worry if it can be > completed with a dependent load/store [1]. > > [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 > > Cc: sta...@vger.kernel.org > Signed-off-by: Gustavo A. R. Silva Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/ucm: Fix Spectre v1 vulnerability
On Tue, 2018-10-16 at 16:32 +0200, Gustavo A. R. Silva wrote: > hdr.cmd can be indirectly controlled by user-space, hence leading to > a potential exploitation of the Spectre variant 1 vulnerability. > > This issue was detected with the help of Smatch: > > drivers/infiniband/core/ucm.c:1127 ib_ucm_write() warn: potential > spectre issue 'ucm_cmd_table' [r] (local cap) > > Fix this by sanitizing hdr.cmd before using it to index > ucm_cmd_table. > > Notice that given that speculation windows are large, the policy is > to kill the speculation on the first load and not worry if it can be > completed with a dependent load/store [1]. > > [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 > > Cc: sta...@vger.kernel.org > Signed-off-by: Gustavo A. R. Silva Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH rdma-next 00/18] RDMA: Add support for ib_device_ops
On Tue, 2018-10-09 at 14:46 -0400, Jason Gunthorpe wrote: > > > On Tue., Oct. 9, 2018, 2:44 p.m. Kamal Heib, wrote: > > On Tue, Oct 09, 2018 at 02:31:27PM -0400, Doug Ledford wrote: > > > On Tue, 2018-10-09 at 19:27 +0300, Kamal Heib wrote: > > > > This patchset introduce a new structure that will contain all the > > > > infiniband device operations, the structure will be used by the > > > > providers to initialize their supported operations. This patchset also > > > > includes the required changes in the core and ulps to start using it. > > > > > > > > Thanks, > > > > Kamal > > > > > > Hi Kamal, > > > > > > Please repost this to linux-r...@vger.kernel.org as that's how this gets > > > into patchworks. > > > > > > > Hi Doug, > > > > Oops, My bad instead of picking the linux-rdma email from get_maintainer.pl > > output > > I picked the linux-kernel email, I'll repost them soon and thanks for your > > reply. > > Wait for comments before reposting such a large series.. It didn't go to linux-rdma at all, so many of the people that might comment on it don't even know it exists, so I'm not sure how many comments he ought to wait for ;-) -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH rdma-next 00/18] RDMA: Add support for ib_device_ops
On Tue, 2018-10-09 at 19:27 +0300, Kamal Heib wrote: > This patchset introduce a new structure that will contain all the > infiniband device operations, the structure will be used by the > providers to initialize their supported operations. This patchset also > includes the required changes in the core and ulps to start using it. > > Thanks, > Kamal Hi Kamal, Please repost this to linux-r...@vger.kernel.org as that's how this gets into patchworks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/nes: Remove unnecessary parentheses
On Wed, 2018-09-19 at 20:29 -0700, Nathan Chancellor wrote: > Clang warns when more than one set of parentheses are used in single > conditional statements. > > drivers/infiniband/hw/nes/nes_hw.c:1446:27: warning: equality comparison > with extraneous parentheses [-Wparentheses-equality] > } while ((temp_phy_data2 == temp_phy_data)); > ~~~^~~~ > drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: remove extraneous > parentheses around the comparison to silence this warning > } while ((temp_phy_data2 == temp_phy_data)); > ~ ^ ~ > drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: use '=' to turn this > equality comparison into an assignment > } while ((temp_phy_data2 == temp_phy_data)); > ^~ > = > > Remove the unnecessary parentheses to silence this warning. > > Reported-by: Nick Desaulniers > Signed-off-by: Nathan Chancellor Applied to for-next, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/cxgb4: remove redundant null pointer check before kfree_skb
On Thu, 2018-09-20 at 17:52 +0800, zhong jiang wrote: > kfree_skb has taken the null pointer into account. hence it is safe > to remove the redundant null pointer check before kfree_skb. > > Signed-off-by: zhong jiang Applied to for-next, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/mlx4: Remove unnecessary parentheses
On Wed, 2018-09-19 at 20:32 -0700, Nathan Chancellor wrote: > Clang warns when more than one set of parentheses are used in single > conditional statements. > > drivers/infiniband/hw/mlx4/mcg.c:676:16: warning: equality comparison > with extraneous parentheses [-Wparentheses-equality] > if ((method == IB_MGMT_METHOD_GET_RESP)) { > ~~~^~ > drivers/infiniband/hw/mlx4/mcg.c:676:16: note: remove extraneous > parentheses around the comparison to silence this warning > if ((method == IB_MGMT_METHOD_GET_RESP)) { > ~ ^ ~ > drivers/infiniband/hw/mlx4/mcg.c:676:16: note: use '=' to turn this > equality comparison into an assignment > if ((method == IB_MGMT_METHOD_GET_RESP)) { > ^~ > = > > Remove the unnecessary parentheses to silence this warning. > > Reported-by: Nick Desaulniers > Signed-off-by: Nathan Chancellor Applied to for-next, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] ucma: fix a use-after-free in ucma_resolve_ip()
On Wed, 2018-09-12 at 16:27 -0700, Cong Wang wrote: > There is a race condition between ucma_close() and ucma_resolve_ip(): > > CPU0CPU1 > ucma_resolve_ip(): ucma_close(): > > ctx = ucma_get_ctx(file, cmd.id); > > list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { > mutex_lock(&mut); > idr_remove(&ctx_idr, ctx->id); > mutex_unlock(&mut); > ... > mutex_lock(&mut); > if (!ctx->closing) { > mutex_unlock(&mut); > rdma_destroy_id(ctx->cm_id); > ... > ucma_free_ctx(ctx); > > ret = rdma_resolve_addr(); > ucma_put_ctx(ctx); > > Before idr_remove(), ucma_get_ctx() could still find the ctx > and after rdma_destroy_id(), rdma_resolve_addr() may still > access id_priv pointer. Also, ucma_put_ctx() may use ctx after > ucma_free_ctx() too. > > ucma_close() should call ucma_put_ctx() too which tests the > refcnt and waits for the last one releasing it. The similar > pattern is already used by ucma_destroy_id(). > > Reported-and-tested-by: syzbot+da2591e115d57a9cb...@syzkaller.appspotmail.com > Reported-by: syzbot+cfe3c1e8ef634ba89...@syzkaller.appspotmail.com > Cc: Jason Gunthorpe > Cc: Doug Ledford > Cc: Leon Romanovsky > Signed-off-by: Cong Wang Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH -next] RDMA: Remove duplicated include from ib_addr.h
On Thu, 2018-09-13 at 18:58 +0300, Leon Romanovsky wrote: > On Thu, Sep 13, 2018 at 09:47:52PM +0800, YueHaibing wrote: > > Remove duplicated include. > > > > Signed-off-by: YueHaibing > > --- > > include/rdma/ib_addr.h | 1 - > > 1 file changed, 1 deletion(-) > > > > Thanks, > Reviewed-by: Leon Romanovsky Thanks, applied to for-next. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] infiniband: nes: add unlikely() to assert()
On Wed, 2018-09-05 at 09:01 +0300, Leon Romanovsky wrote: > On Fri, Aug 31, 2018 at 10:24:18PM +0300, Igor Stoppa wrote: > > Typically the assert is expected to not fail. > > This whole assert can be removed. > > > > > Signed-off-by: Igor Stoppa > > Acked-by: Doug Ledford > > Most probably that I missed discussion, but anyway, does this > "Acked-by" come from internal or external discussion? This patch was part of a larger series on lkml. In that context, I acked it so that the series could be applied by whomever took it (it didn't belong on rdma-list as a series since only one patch out of some large number touched rdma files). Now it is being resent as not part of a series, but my ack was preserved. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH 01/23] infiniband: nes: add unlikely() to assert()
On Fri, 2018-08-31 at 01:34 +0300, Igor Stoppa wrote: > Typically the assert is expected to not fail. > > Signed-off-by: Igor Stoppa > Cc: Chien Tung > Cc: Roland Dreier > Cc: Faisal Latif > Cc: Doug Ledford > Cc: Jason Gunthorpe Acked-by: Doug Ledford > --- > drivers/infiniband/hw/nes/nes.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h > index bedaa02749fb..d2d0098f38e0 100644 > --- a/drivers/infiniband/hw/nes/nes.h > +++ b/drivers/infiniband/hw/nes/nes.h > @@ -151,7 +151,7 @@ do { \ > > #define assert(expr) \ > do { \ > - if (!(expr)) { \ > + if (unlikely(!(expr))) { \ > printk(KERN_ERR PFX "Assertion failed! %s, %s, %s, line %d\n", \ > #expr, __FILE__, __func__, __LINE__); \ > } \ -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid (4)
On Thu, 2018-08-23 at 16:39 +, Parav Pandit wrote: > > -Original Message- > > From: Jason Gunthorpe > > Sent: Thursday, August 23, 2018 9:55 AM > > To: Eric Biggers > > Cc: Doug Ledford ; linux-r...@vger.kernel.org; > > dasaratharaman.chandramo...@intel.com; Leon Romanovsky > > ; linux-kernel@vger.kernel.org; Mark Bloch > > ; Moni Shoua ; Parav Pandit > > ; syzkaller-b...@googlegroups.com; syzbot > > > > Subject: Re: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid > > (4) > > > > On Wed, Aug 22, 2018 at 11:16:31PM -0700, Eric Biggers wrote: > > > Hello RDMA / InfiniBand maintainers, > > > > > > This is an RDMA bug and it still occurs on Linus' tree as of today > > > (commit 815f0ddb346c1960). > > > > > > I've also simplified the reproducer for it; see below after the original > > > report. > > > Apparently it involves a race between RDMA_USER_CM_CMD_RESOLVE_IP > > > > and > > > RDMA_USER_CM_CMD_LISTEN. > > > > That is an amazing reproducer! > > > > I have a feeling this is the same cause as all the other syzkaller bugs in > > this code: > > lack of any sane locking at all :\ > > > > We've talked about chucking a big lock around this whole thing, but nobody > > has > > done it yet.. It isn't so simple. > > > > I had some code in which reduces three locks (handler_lock, qp_mutex, > id_lock) to single mutex to protect the cm_id and protects every exported > symbol of rdmacm which works on cm_id. > But not ready enough to post it as patch yet. Lot of tests required before I > get there and some refactor too before that. Does it finally address the fact that the rdmacm code was written so that it was always synchronous but RoCE src gid (I think that's what it was, I'm typing this from long ago memory) lookup broke that assumption? -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency
On Wed, 2018-05-30 at 21:03 -0700, Greg Thelen wrote: > On Wed, May 30, 2018 at 4:01 PM Jason Gunthorpe wrote: > > > On Thu, May 31, 2018 at 12:40:54AM +0200, Arnd Bergmann wrote: > > > > On 5/30/2018 5:25 PM, Jason Gunthorpe wrote: > > > > > On Wed, May 30, 2018 at 05:10:35PM -0500, Steve Wise wrote: > > > > > > > > > > > > On 5/30/2018 5:04 PM, Jason Gunthorpe wrote: > > > > > > > On Wed, May 30, 2018 at 11:58:18PM +0200, Arnd Bergmann wrote: > > > > > > > > The newly added fill_res_ep_entry function fails to link if > > > > > > > > CONFIG_INFINIBAND_ADDR_TRANS is not set: > > > > > > > > > > > > > > > > drivers/infiniband/hw/cxgb4/restrack.o: In function > > `fill_res_ep_entry': > > > > > > > > restrack.c:(.text+0x3cc): undefined reference to > > > > > > > > `rdma_res_to_id' > > > > > > > > restrack.c:(.text+0x3d0): undefined reference to `rdma_iw_cm_id' > > > > > > > > > > > > > > > > This adds a Kconfig dependency for the driver. > > > > > > > > > > > > > > > > Fixes: 116aeb887371 ("iw_cxgb4: provide detailed > > provider-specific CM_ID information") > > > > > > > > Signed-off-by: Arnd Bergmann > > > > > > > > drivers/infiniband/hw/cxgb4/Kconfig | 1 + > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > Oh, I think we need to solve this with maybe a header fill null > > stub > > > > > > > instead.. > > > > > > > > > > > > > > We don't want to disable drivers just because a user interface is > > > > > > > disabled. > > > > > > > > > > > > > > > > > > > Why does CONFIG_INFINIBAND_ADDR_TRANS disable building rdma_cm.ko? > > That > > > > > > is not correct. > > > > > > > > > > That seems like a reasonable thing to do.. > > > > > > > > rdma_ucm.ko is for usermode users, rdma_cm.ko is for kernel users, and > > > > is required for iwarp drivers. It seems rdma_cm.ko is not being > > > > compiled if ADDR_TRANS is not set. > > I think the intention was to completely disable rdma-cm, including all > > support for rx'ing remote packets? Greg? > > Yes. That's my goal when INFINIBAND_ADDR_TRANS is unset. > > > If this is required for iwarp then Arnd's patch is probably the right > > way to go.. > > Jason > > Agreed. > Acked-by: Greg Thelen If that's the case, then there should be a NOTE: in the Kconfig that disabling the connection manager completely disables iWARP hardware. I don't really think I like this approach though. At a minimum if you are going to make iWARP totally dependent on rdmacm, then there is zero reason that iw_cm.o should be part of the obj-$(CONFIG_INFINIBAND) Makefile recipe when ADDR_TRANS is disabled. We can take this patch as a band-aid, but IMO it's either incomplete or simply not the right solution. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/qedr: fix spelling mistake: "adrresses" -> "addresses"
On Wed, 2018-05-30 at 10:40 +0100, Colin King wrote: > From: Colin Ian King > > Trivial fix to spelling mistake in DP_ERR error message > > Signed-off-by: Colin Ian King Thanks, applied. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH 6/6] infiniband: qplib_fp: Use dev_fmt
On Wed, 2018-05-09 at 08:15 -0700, Joe Perches wrote: > Convert the embedded dev_ uses to use the new dev_fmt > prefix and remove the prefix from the formats. > > Miscellanea: > > o Add missing terminating newlines to avoid possible interleaving > o Realign arguments > > Signed-off-by: Joe Perches I don't see any problem with this patch. For the IB part: Acked-by: Doug Ledford -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
On Tue, 2018-05-08 at 07:44 +0200, Christophe JAILLET wrote: > The error handling path of 'c4iw_get_dma_mr()' does not free resources > in the correct order. > If an error occures, it can leak 'mhp->wr_waitp'. > > Fixes: a3f12da0e99a ("iw_cxgb4: allocate wait object for each memory object") > Signed-off-by: Christophe JAILLET Applied to for-rc, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH v4] nvmet,rxe: defer ip datagram sending to tasklet
On Tue, 2018-05-08 at 11:02 +0200, Alexandru Moise wrote: > This addresses 3 separate problems: > > 1. When using NVME over Fabrics we may end up sending IP > packets in interrupt context, we should defer this work > to a tasklet. > > [ 50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 > __local_bh_enable_ip+0x1f/0xa0 > [ 50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G > W 4.17.0-rc3-ARCH+ #104 > [ 50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.11.0-20171110_100015-anatol 04/01/2014 > [ 50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0 > [ 50.949631] RSP: 0018:88009c183900 EFLAGS: 00010006 > [ 50.951029] RAX: 80010403 RBX: 0200 RCX: > 0001 > [ 50.952636] RDX: RSI: 0200 RDI: > 817e04ec > [ 50.954278] RBP: 88009c183910 R08: 0001 R09: > 0614 > [ 50.956000] R10: ea00021d5500 R11: 0001 R12: > 817e04ec > [ 50.957779] R13: R14: 88009566f400 R15: > 8800956c7000 > [ 50.959402] FS: () GS:88009c18() > knlGS: > [ 50.961552] CS: 0010 DS: ES: CR0: 80050033 > [ 50.963798] CR2: 55c4ec0ccac0 CR3: 02209001 CR4: > 000606e0 > [ 50.966121] Call Trace: > [ 50.966845] > [ 50.967497] __dev_queue_xmit+0x62d/0x690 > [ 50.968722] dev_queue_xmit+0x10/0x20 > [ 50.969894] neigh_resolve_output+0x173/0x190 > [ 50.971244] ip_finish_output2+0x2b8/0x370 > [ 50.972527] ip_finish_output+0x1d2/0x220 > [ 50.973785] ? ip_finish_output+0x1d2/0x220 > [ 50.975010] ip_output+0xd4/0x100 > [ 50.975903] ip_local_out+0x3b/0x50 > [ 50.976823] rxe_send+0x74/0x120 > [ 50.977702] rxe_requester+0xe3b/0x10b0 > [ 50.978881] ? ip_local_deliver_finish+0xd1/0xe0 > [ 50.980260] rxe_do_task+0x85/0x100 > [ 50.981386] rxe_run_task+0x2f/0x40 > [ 50.982470] rxe_post_send+0x51a/0x550 > [ 50.983591] nvmet_rdma_queue_response+0x10a/0x170 > [ 50.985024] __nvmet_req_complete+0x95/0xa0 > [ 50.986287] nvmet_req_complete+0x15/0x60 > [ 50.987469] nvmet_bio_done+0x2d/0x40 > [ 50.988564] bio_endio+0x12c/0x140 > [ 50.989654] blk_update_request+0x185/0x2a0 > [ 50.990947] blk_mq_end_request+0x1e/0x80 > [ 50.991997] nvme_complete_rq+0x1cc/0x1e0 > [ 50.993171] nvme_pci_complete_rq+0x117/0x120 > [ 50.994355] __blk_mq_complete_request+0x15e/0x180 > [ 50.995988] blk_mq_complete_request+0x6f/0xa0 > [ 50.997304] nvme_process_cq+0xe0/0x1b0 > [ 50.998494] nvme_irq+0x28/0x50 > [ 50.999572] __handle_irq_event_percpu+0xa2/0x1c0 > [ 51.000986] handle_irq_event_percpu+0x32/0x80 > [ 51.002356] handle_irq_event+0x3c/0x60 > [ 51.003463] handle_edge_irq+0x1c9/0x200 > [ 51.004473] handle_irq+0x23/0x30 > [ 51.005363] do_IRQ+0x46/0xd0 > [ 51.006182] common_interrupt+0xf/0xf > [ 51.007129] > > 2. Work must always be offloaded to tasklet for rxe_post_send_kernel() > when using NVMEoF in order to solve lock ordering between neigh->ha_lock > seqlock and the nvme queue lock: > > [ 77.833783] Possible interrupt unsafe locking scenario: > [ 77.833783] > [ 77.835831]CPU0CPU1 > [ 77.837129] > [ 77.838313] lock(&(&n->ha_lock)->seqcount); > [ 77.839550]local_irq_disable(); > [ 77.841377]lock(&(&nvmeq->q_lock)->rlock); > [ 77.843222]lock(&(&n->ha_lock)->seqcount); > [ 77.845178] > [ 77.846298] lock(&(&nvmeq->q_lock)->rlock); > [ 77.847986] > [ 77.847986] *** DEADLOCK *** > > 3. Same goes for the lock ordering between sch->q.lock and nvme queue lock: > > [ 47.634271] Possible interrupt unsafe locking scenario: > [ 47.634271] > [ 47.636452]CPU0CPU1 > [ 47.637861] > [ 47.639285] lock(&(&sch->q.lock)->rlock); > [ 47.640654]local_irq_disable(); > [ 47.642451]lock(&(&nvmeq->q_lock)->rlock); > [ 47.644521]lock(&(&sch->q.lock)->rlock); > [ 47.646480] > [ 47.647263] lock(&(&nvmeq->q_lock)->rlock); > [ 47.648492] > [ 47.648492] *** DEADLOCK *** > > Using NVMEoF after this patch seems to finally be stable, without it, > rxe eventually deadlocks the whole system and causes RCU stalls. > > Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com> Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB: remove redundant INFINIBAND kconfig dependencies
On Thu, 2018-05-03 at 20:29 -0700, Greg Thelen wrote: > INFINIBAND_ADDR_TRANS depends on INFINIBAND. So there's no need for > options which depend INFINIBAND_ADDR_TRANS to also depend on INFINIBAND. > Remove the unnecessary INFINIBAND depends. > > Signed-off-by: Greg Thelen > --- > drivers/infiniband/ulp/srpt/Kconfig | 2 +- > drivers/nvme/host/Kconfig | 2 +- > drivers/nvme/target/Kconfig | 2 +- > drivers/staging/lustre/lnet/Kconfig | 2 +- > fs/cifs/Kconfig | 2 +- > net/9p/Kconfig | 2 +- > net/rds/Kconfig | 2 +- > net/sunrpc/Kconfig | 2 +- > 8 files changed, 8 insertions(+), 8 deletions(-) Thanks. Since the original changes you submitted went through -rc, I'm sending this cleanup there too. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: manual merge of the rdma tree with the rdma-fixes tree
On Wed, 2018-05-02 at 13:22 +0300, Leon Romanovsky wrote: > On Mon, Apr 30, 2018 at 08:55:35PM -0400, Doug Ledford wrote: > > On Tue, 2018-05-01 at 10:10 +1000, Stephen Rothwell wrote: > > > Hi all, > > > > > > Today's linux-next merge of the rdma tree got a conflict in: > > > > > > drivers/infiniband/sw/rxe/rxe_resp.c > > > > > > between commit: > > > > > > 9fd4350ba895 ("B/rxe: avoid double kfree_skb") > > > > > > from the rdma-fixes tree and commit: > > > > > > 2e47350789eb ("IB/rxe: optimize the function duplicate_request") > > > > > > from the rdma tree. > > > > > > I fixed it up (I think - see below) and can carry the fix as necessary. > > > This is now fixed as far as linux-next is concerned, but any non trivial > > > conflicts should be mentioned to your upstream maintainer when your tree > > > is submitted for merging. You may also want to consider cooperating > > > with the maintainer of the conflicting tree to minimise any particularly > > > complex conflicts. > > > > > > > We will probably merge the for-rc branch into the for-next branch in the > > next few days, at which point we will do the conflict resolution > > ourselves and your need to carry anything should drop out. > > Isn't "rdma/wip/for-testing" branch intended for this? Not really. It's there to provide a pre-merged branch for people to test. But, I've rarely seen a release cycle where, *sometime*, we didn't get a patch set in the for-next that depends on changes in the for-rc area, and in that case, you need to merge for-rc into for-next. If we don't have that this cycle, then you're right, I won't merge for- rc into for-next and for-testing will be the throwaway merge branch. On occasion, if the merge fixups needed between for-rc and for-next get too difficult for a non-RDMA person to sus out, then we will do a merge of for-rc into for-next simply so we can provide the right merge fixup, but I doubt this merge fixup rises to that level. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH 3/5] ib_srpt: depend on INFINIBAND_ADDR_TRANS
On Tue, 2018-05-01 at 03:08 +, Greg Thelen wrote: > On Mon, Apr 30, 2018 at 4:35 PM Jason Gunthorpe wrote: > > > On Wed, Apr 25, 2018 at 03:33:39PM -0700, Greg Thelen wrote: > > > INFINIBAND_SRPT code depends on INFINIBAND_ADDR_TRANS provided symbols. > > > So declare the kconfig dependency. This is necessary to allow for > > > enabling INFINIBAND without INFINIBAND_ADDR_TRANS. > > > > > > Signed-off-by: Greg Thelen > > > Cc: Tarick Bedeir > > > drivers/infiniband/ulp/srpt/Kconfig | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/infiniband/ulp/srpt/Kconfig > > b/drivers/infiniband/ulp/srpt/Kconfig > > > index 31ee83d528d9..fb8b7182f05e 100644 > > > +++ b/drivers/infiniband/ulp/srpt/Kconfig > > > @@ -1,6 +1,6 @@ > > > config INFINIBAND_SRPT > > > tristate "InfiniBand SCSI RDMA Protocol target support" > > > - depends on INFINIBAND && TARGET_CORE > > > + depends on INFINIBAND && INFINIBAND_ADDR_TRANS && TARGET_CORE > > Isn't INFINIBAND && INFINIBAND_ADDR_TRANS a bit redundant? Can't have > > INFINIBAND_ADDR_TRANS without INFINIBAND. > > By kconfig INFINIBAND_ADDR_TRANS depends on INFINIBAND. So yes, it seems > redundant. I don't know if anyone has designs to break this dependency and > allow for ADDR_TRANS without INFINIBAND. No, not at the moment (and I'm not sure we ever would, it would only happen if the subsystem itself became something like RDMA and INFINIBAND specifically related only to the INFINIBAND link layer support, in which case you might want to enable RoCE without INFINIBAND or something like that, but no one has plans to do that as far as I'm aware). > Assuming not, I'd be willing to > amend my series removing redundant INFINIBAND and a followup series to > remove it from similar depends. Though I'm not familiar with rdma dev tree > lifecycle. Is rdma/for-rc a throw away branch (akin to linux-next), or > will it be merged into linus/master? If throwaway, then we can amend > its patches, otherwise followups will be needed. Followups will be needed. > Let me know what you'd prefer. Thanks. > > FYI from v4.17-rc3: > drivers/staging/lustre/lnet/Kconfig: depends on LNET && PCI && INFINIBAND > && INFINIBAND_ADDR_TRANS > net/9p/Kconfig: depends on INET && INFINIBAND && INFINIBAND_ADDR_TRANS > net/rds/Kconfig: depends on RDS && INFINIBAND && INFINIBAND_ADDR_TRANS > net/sunrpc/Kconfig: depends on SUNRPC && INFINIBAND && > INFINIBAND_ADDR_TRANS -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/qedr: fix spelling mistake: "failes" -> "fails"
On Tue, 2018-05-01 at 09:25 +0100, Colin King wrote: > From: Colin Ian King > > Trivial fix to spelling mistake in DP_ERR error message > > Signed-off-by: Colin Ian King Thanks, applied. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/cxgb4: use skb_put_zero()/__skb_put_zero
On Sat, 2018-04-28 at 06:53 -0500, Steve Wise wrote: > > -Original Message- > > From: YueHaibing > > Sent: Saturday, April 28, 2018 2:31 AM > > To: sw...@chelsio.com; dledf...@redhat.com; j...@ziepe.ca; > > mo...@mellanox.com > > Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org; YueHaibing > > > > Subject: [PATCH] IB/cxgb4: use skb_put_zero()/__skb_put_zero > > > > Use the recently introduced helper to replace the pattern of > > skb_put_zero/__skb_put() && memset(). > > > > Signed-off-by: YueHaibing > > Looks ok. > > Reviewed-by: Steve Wise > > Applied to for-next, thanks. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: linux-next: manual merge of the rdma tree with the rdma-fixes tree
On Tue, 2018-05-01 at 10:10 +1000, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the rdma tree got a conflict in: > > drivers/infiniband/sw/rxe/rxe_resp.c > > between commit: > > 9fd4350ba895 ("B/rxe: avoid double kfree_skb") > > from the rdma-fixes tree and commit: > > 2e47350789eb ("IB/rxe: optimize the function duplicate_request") > > from the rdma tree. > > I fixed it up (I think - see below) and can carry the fix as necessary. > This is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > We will probably merge the for-rc branch into the for-next branch in the next few days, at which point we will do the conflict resolution ourselves and your need to carry anything should drop out. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/core: Make ib_mad_client_id atomic
On Mon, 2018-04-30 at 08:49 -0600, Jason Gunthorpe wrote: > On Mon, Apr 23, 2018 at 10:16:18PM +0300, jackm wrote: > > > > > TIDs need to be globally unique on the entire machine. > > Jason, that is not exactly correct. > > The expecation for /dev/umad users is that they all receive locally > unique TID prefixes. The kernel may be OK to keep things port-specific > but it is slightly breaking the API we are presenting to userspace to > allow them to alias.. > > Jason Would people be happier with this commit message then: IB/core: Make ib_mad_client_id atomic Currently, the kernel protects access to the agent ID allocator on a per port basis using a spinlock, so it is impossible for two apps/threads on the same port to get the same TID, but it is entirely possible for two threads on different ports to end up with the same TID. As this can be confusing (regardless of it being legal according to the IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID usage), and as the rdma-core user space API for /dev/umad devices implies unique TIDs even across ports, make the TID an atomic type so that no two allocations, regardless of port number, will be the same. Signed-off-by: Håkon Bugge Reviewed-by: Jack Morgenstein Reviewed-by: Ira Weiny Reviewed-by: Zhu Yanjun Signed-off-by: Doug Ledford -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/core: Make ib_mad_client_id atomic
On Thu, 2018-04-26 at 20:51 +0200, Håkon Bugge wrote: > > Jason is out this week. I'll end up processing this one (probably later > > today). But I’ll fix up the commit message to suit my tastes when I do. > > Thank you, Doug and Jack, I reworded the commit message, let me know if you think I worded it wrong: commit 69f01b81539c62f3dd96f9f02138ad7b839a0c70 (HEAD -> k.o/wip/dl-for-rc) Author: Håkon Bugge Date: Wed Apr 18 16:24:50 2018 +0200 IB/core: Make ib_mad_client_id atomic Currently, kernel protects access to the agent ID allocator on a per port basis using a spinlock, so it is impossible for two apps/threads on the same port to get the same TID, but it is entirely possible for two threads on different ports to end up with the same TID. As this can be confusing (regardless of it being legal according to the IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID usage: "Then initiating a new operation, MADHeader:TransactionID shall be set to such a value that within that MAD the combination of TIG, SGID, and MgmtClass is different from that of any other currently executing operation. If the MAD does not have a GRH, its SLID is used in the combination in place of an SGID." which guarantees we are legal because our different ports will have different SGID/SLID creating a unique tuple even if the TIDs are identical), and as we might want to open the TID allocator up to more parallel usage later, make the TID an atomic type so that no two allocations, regardless of port number, will be the same. Signed-off-by: Håkon Bugge Reviewed-by: Jack Morgenstein Reviewed-by: Ira Weiny Reviewed-by: Zhu Yanjun Signed-off-by: Doug Ledford -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV
On Thu, 2018-04-26 at 12:02 +0800, Yanjun Zhu wrote: > On 2018/4/26 11:52, Jianchao Wang wrote: > > w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV > > will not be updated in update_wqe_psn, and the corresponding > > wqe will not be acked in rxe_completer due to its last_psn is > > zero. Finally, the other wqe will also not be able to be acked, > > because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0 > > is still there. This causes large amount of io timeout when > > nvmeof is over rxe. > > > > Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this. > > > > Signed-off-by: Jianchao Wang > > Reviewed-by: Zhu Yanjun Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] RDMA/iwpm: fix memory leak on map_info
On Wed, 2018-04-25 at 17:24 +0100, Colin King wrote: > From: Colin Ian King > > In the cases where iwpm_hash_bucket is NULL and where function > get_mapinfo_hash_bucket returns NULL then the map_info is never added > to hash_bucket_head and hence there is a leak of map_info. Fix this > by nullifying hash_bucket_head and if that is null we know that > that map_info was not added to hash_bucket_head and hence map_info > should be free'd. > > Detected by CoverityScan, CID#1222481 ("Resource Leak") > > Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space > service") > Signed-off-by: Colin Ian King Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/ipoib: fix ipoib_start_xmit()'s return type
On Wed, 2018-04-25 at 18:35 +0300, Leon Romanovsky wrote: > On Tue, Apr 24, 2018 at 03:15:47PM +0200, Luc Van Oostenryck wrote: > > The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', > > which is a typedef for an enum type, but the implementation in this > > driver returns an 'int'. > > > > Fix this by returning 'netdev_tx_t' in this driver too. > > > > Signed-off-by: Luc Van Oostenryck > > --- > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Thanks, > Reviewed-by: Leon Romanovsky Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] IB/nes: fix nes_netdev_start_xmit()'s return type
On Wed, 2018-04-25 at 18:35 +0300, Leon Romanovsky wrote: > On Tue, Apr 24, 2018 at 08:36:12PM +0300, Leon Romanovsky wrote: > > On Tue, Apr 24, 2018 at 03:15:45PM +0200, Luc Van Oostenryck wrote: > > > The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', > > > which is a typedef for an enum type, but the implementation in this > > > driver returns an 'int'. > > > > > > Fix this by returning 'netdev_tx_t' in this driver too. > > > > > > Signed-off-by: Luc Van Oostenryck > > > --- > > > drivers/infiniband/hw/nes/nes_nic.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > ipoib_start_xmit() needs the same fix. > > > > Thanks, > Reviewed-by: Leon Romanovsky Thanks, applied to for-rc. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] infiniband: hw: qib: Change return type to vm_fault_t
On Tue, 2018-04-17 at 20:04 +0530, Souptick Joarder wrote: > Use new return type vm_fault_t for fault handler. For > now, this is just documenting that the function returns > a VM_FAULT value rather than an errno. Once all instances > are converted, vm_fault_t will become a distinct type. > > Reference id -> 1c8f422059ae ("mm: change return type to > vm_fault_t") Thanks, applied to for-next. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH] infiniband: hw: hfi1: Change return type to vm_fault_t
On Tue, 2018-04-17 at 19:53 +0530, Souptick Joarder wrote: > Use new return type vm_fault_t for fault handler. For > now, this is just documenting that the function returns > a VM_FAULT value rather than an errno. Once all instances > are converted, vm_fault_t will become a distinct type. > > Reference id -> 1c8f422059ae ("mm: change return type to > vm_fault_t") > > Signed-off-by: Souptick Joarder Thanks, applied to for-next. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH 0/6] IB: make INFINIBAND_ADDR_TRANS configurable
On Thu, 2018-04-26 at 11:19 -0700, Greg Thelen wrote: > This series allows for CONFIG_INFINIBAND without > CONFIG_INFINIBAND_ADDR_TRANS (aka RDMA communication manager). > Fuzzing has been finding fair number of CM bugs. > So provide an option to disable it in systems which don't need it. > > Changes since last posting (https://lkml.org/lkml/2018/4/25/1266): > - added ("ib_srp: depend on INFINIBAND_ADDR_TRANS") patch [ snip explanation of series ] Thanks for taking the time to research this out. Series applied. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH for-rc] uapi: Fix SPDX tags for files referring to the 'OpenIB.org' license
On Sun, 2018-04-22 at 21:08 -0400, David Miller wrote: > From: Jason Gunthorpe > Date: Fri, 20 Apr 2018 09:49:10 -0600 > > > Based on discussion with Kate Stewart this license is not a > > BSD-2-Clause, but is now formally identified as Linux-OpenIB > > by SPDX. > > > > The key difference between the licenses is in the 'warranty' > > paragraph. > > > > if_infiniband.h refers to the 'OpenIB.org' license, but > > does not include the text, instead it links to an obsolete > > web site that contains a license that matches the BSD-2-Clause > > SPX. There is no 'three clause' version of the OpenIB.org > > license. > > > > Signed-off-by: Jason Gunthorpe > > Acked-by: David S. Miller > Thanks, applied. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: This is a digitally signed message part