from:"Doug Ledford"

Re: [PATCH v2 18/29] scsi: aic7xxx: aic7xxx_osm: Remove unused variable 'tinfo'

2020-07-14 Thread Doug Ledford

On Mon, 2020-07-13 at 08:46 +0100, Lee Jones wrote:
> Looks like none of the artifact from  ahc_fetch_transinfo() are used
> anymore.
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/scsi/aic7xxx/aic7xxx_osm.c: In function
> ‘ahc_linux_target_alloc’:
>  drivers/scsi/aic7xxx/aic7xxx_osm.c:567:30: warning: variable ‘tinfo’
> set but not used [-Wunused-but-set-variable]
>  567 | struct ahc_initiator_tinfo *tinfo;
>  | ^
> 
> Cc: Hannes Reinecke 
> Cc: "Daniel M. Eischen" 
> Cc: Doug Ledford 

FWIW, I can't seem to figure out how you got mine or Dan's email
addresses as related to this driver.  The MAINTAINERS file only lists
Hannes.  The driver Dan and I worked on was a different driver.  It was
named aic7xxx, but that was back in the 1990s.  It was renamed to
aic7xxx_old so that Adaptec could contribute this driver you are
currently patching back around 2001 or so.  And then maybe around 2010
or something like that, the aic7xxx_old driver that Dan and I worked on
was removed from the upstream source tree entirely.  So, just out of
curiosity, how did you get mine and Dan's email addresses to put on the
Cc: list for these patches?

> Signed-off-by: Lee Jones 
> ---
>  drivers/scsi/aic7xxx/aic7xxx_osm.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c
> b/drivers/scsi/aic7xxx/aic7xxx_osm.c
> index 2edfa0594f183..32bfe20d79cc1 100644
> --- a/drivers/scsi/aic7xxx/aic7xxx_osm.c
> +++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c
> @@ -564,8 +564,6 @@ ahc_linux_target_alloc(struct scsi_target
> *starget)
>   struct scsi_target **ahc_targp =
> ahc_linux_target_in_softc(starget);
>   unsigned short scsirate;
>   struct ahc_devinfo devinfo;
> - struct ahc_initiator_tinfo *tinfo;
> - struct ahc_tmode_tstate *tstate;
>   char channel = starget->channel + 'A';
>   unsigned int our_id = ahc->our_id;
>   unsigned int target_offset;
> @@ -612,9 +610,6 @@ ahc_linux_target_alloc(struct scsi_target
> *starget)
>   spi_max_offset(starget) = 0;
>   spi_min_period(starget) = 
>   ahc_find_period(ahc, scsirate, maxsync);
> -
> - tinfo = ahc_fetch_transinfo(ahc, channel, ahc->our_id,
> -     starget->id, &tstate);
>   }
>   ahc_compile_devinfo(&devinfo, our_id, starget->id,
>   CAM_LUN_WILDCARD, channel,

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/hns: Fix build error again

2019-10-22 Thread Doug Ledford

On Mon, 2019-10-21 at 23:51 +0200, Arnd Bergmann wrote:
> On Mon, Oct 21, 2019 at 11:09 PM Doug Ledford 
> wrote:
> > This fix looks reasonable, but since I can't test this at all, and
> > I'm
> > personally tired of trying and failing to fix this issue, I need to
> > ask
> > if you've tried all the permutations for this just to confirm it
> > works
> > in all valid cases?
> 
> I'm fairly sure I would have found them all by now: Since I sent this
> patch I built 4680 randconfig kernels, 293 of which had some HNS
> driver enabled.
> 
> I also like to think that I spent more time to think it through in
> theory.

I reviewed it pretty closely, and I couldn't find any way in which I
thought it would fail, I'm just being picky because I want this to be
the last fix ;-)

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/hns: Fix build error again

2019-10-21 Thread Doug Ledford

On Mon, 2019-10-07 at 23:18 +0200, Arnd Bergmann wrote:
> This is not the first attempt to fix building random configurations,
> unfortunately the attempt in commit a07fc0bb483e ("RDMA/hns: Fix build
> error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m
> and CONFIG_INFINIBAND_HNS_HIP08=y:
> 
> drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined
> reference to `__this_module'
> 
> Revert commits a07fc0bb483e ("RDMA/hns: Fix build error") and
> a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") to get
> back to the previous state, then fix the issues described there
> differently, by adding more specific dependencies: INFINIBAND_HNS
> can now only be built-in if at least one of HNS or HNS3 are
> built-in, and the individual back-ends are only available if
> that code is reachable from the main driver.
> 
> Fixes: a07fc0bb483e ("RDMA/hns: Fix build error")
> Fixes: a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment")
> Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08
> RoCE")
> Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce
> driver")
> Signed-off-by: Arnd Bergmann 

This fix looks reasonable, but since I can't test this at all, and I'm
personally tired of trying and failing to fix this issue, I need to ask
if you've tried all the permutations for this just to confirm it works
in all valid cases?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: linux-next: Fixes tag needs some work in the rdma-fixes tree

2019-10-21 Thread Doug Ledford

On Mon, 2019-10-21 at 18:41 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> In commit
> 
>   612e0486ad08 ("iw_cxgb4: fix ECN check on the passive accept")
> 
> Fixes tag
> 
>   Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and
> ISS for iWARP connections")
> 
> has these problem(s):
> 
>   - SHA1 should be at least 12 digits long
> Can be fixed by setting core.abbrev to 12 (or more) or (for git
> v2.11
> or later) just making sure it is not set (or set to "auto").

I'll leave it to Potnuri to fix his stuff.  As for the rdma tree, the 10
digit hash is still unique as of today, so I won't rebase the official
branch to fix this.  However, I'll see about adding a check for this in
my workflow.  Thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [GIT PULL] Thermal management updates for v5.4-rc1

2019-09-27 Thread Doug Ledford

On Fri, 2019-09-27 at 21:52 +0200, Richard Weinberger wrote:
> On Fri, Sep 27, 2019 at 9:30 PM Doug Ledford 
> wrote:
> > Because there are literally thousands of developers working on
> > kernel
> > bits here and there, and you're swatting this particular fly one
> > developer at a time.
> 
> I strongly disagree. One of the golden rules of kernel development is,
> read what Linus writes. Especially during the merge window.
> 
> Thanks,
> //richard

Developers come and go.  Your argument is temporally flawed.  A
developer might start working on the tree, read everything during a
merge window, and not catch one of Linus' rebase rants prior to
committing a rebase felony of their own.

Besides, I don't think this rule of yours is all that universal.  If
Linus is off on a thread about arm64 device tree brokenness and someone
does a rebase and he rants about it, I'm very likely to miss that rant. 
I read what I reasonably deem to be relevant to me and my work, or
sometimes additional stuff that just jumps out at me.  But I never
learned to speed read so I don't even try to read it all and wouldn't
agree to a rule that says I have to.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [GIT PULL] Thermal management updates for v5.4-rc1

2019-09-27 Thread Doug Ledford

On Fri, 2019-09-27 at 12:41 -0700, Linus Torvalds wrote:
> On Fri, Sep 27, 2019 at 12:29 PM Doug Ledford 
> wrote:
> > Because there are literally thousands of developers working on
> > kernel
> > bits here and there, and you're swatting this particular fly one
> > developer at a time.
> 
> Well, at least these days it's also very clearly spelled out in the
> Documentation directory.

Yeah, but let's be fair.  How many people read the documentation fully? 
Or even if some new maintainer read it, did it really sink in?  Or will
they rebase a year later not thinking about it?

> And the "don't rebase" does get posted on the mailing lists each time,
> and I've mentioned it over the years in my release notes too.

Lots of people here and there miss those things.  You never know who
caught what.

> Besides, I actually only work with about a hundred top-level
> maintainers, not thousands. Yes, we have thousands of developers, but
> doing the stats over the 5.0 releases, there have been "only" 131
> people sending me pull requests. Sure, more than a couple, but at the
> same time it's not like this is a "every developer" kind of thing,
> this is literally subsystem maintainers. We've got a fair number of
> them, but it's definitely not about thousands.
> 
> I feel like I've sent that email out way more than a hundred times
> over the last 15+ years.

I'm sure you probably have.  I think I got it two or three times before
I got all the nuances of which rebases were OK and which weren't :-).

> .. and I don't think having git warn is right, since rebasing is
> perfectly fine as you are doing development.
> 
> It's really just that maintainers shouldn't do it for bad reasons and
> at bad times.
> 
> And "there was a conflict" and "yesterday" is really one of the
> absolute worst reasons/times around.

I think you send it out a lot because it doesn't get driven home in
people's heads until they get yelled at.  And there's more to the
badness of a rebase than just annoying you when you want to see the
conflicts to judge things for yourself.  There are legitimate issues
such as a rebase wipes out history.  Or if you rebase a public tree it
blows up everyone that is tracking that branch.  These things apply even
to when you are doing your own development.  Or rebases might wipe out
merge commits from other trees, depending on options.

A git warning solely for you might not truly be appropriate, but I'm not
at all convinced that a git warning with a more general "Here are the
pros and cons of rebases, with a list of dos and donts if you are going
to do a rebase" document is a bad idea, and might actually help you out
too.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [GIT PULL] Thermal management updates for v5.4-rc1

2019-09-27 Thread Doug Ledford

On Fri, 2019-09-27 at 11:34 -0700, Linus Torvalds wrote:
> On Fri, Sep 27, 2019 at 6:08 AM Zhang Rui  wrote:
> > One thing to mention is that, all the patches have been tested in
> > linux-next for weeks, but there is a conflict detected, because
> > upstream has took commit eaf7b46083a7e34 ("docs: thermal: add it to
> > the
> > driver API") from jc-docs tree while I'm keeping a wrong version of
> > the
> > patch, so I just rebased my tree to fix this.
> 
> Why do I have to say this EVERY single release?

Because there are literally thousands of developers working on kernel
bits here and there, and you're swatting this particular fly one
developer at a time.

I might suggest that you need to speak with the git people and politely
ask them to add a warning to the rebase command itself so that it prints
out something like:

If you are doing linux kernel development, and you are doing a rebase,
please read Documentation/When_Not_To_Rebase.rst before rebasing your
code and sending it to Linus.  You've been warned.

Acknowledge receipt of warning and proceed with rebase? (y/N)

You would have free reign to put one of your more monumental yet funny
rants in place in the documentation.

You could also have a global git config to turn off the "Don't annoy
Linus with rebases" warning.  But only mention that global config at the
end of the kernel documentation so you know people have read it before
they turn the warning off.

Maybe that would help.

> A conflict is not a reason to rebase. Conflicts happen. They happen a
> lot. I deal with them, and it's usually trivial.
> 
> If you feel it's not trivial, just describe what the resolution is,
> rather than rebasing. Really.
> 
> Rebasing for a random conflict (particularly in documentation, for
> chrissake!) is like using an atomic bomb to swat a fly.  You have all
> those downsides, and there are basically _no_ upsides. It only makes
> for more work for me because I have to re-write this email for the
> millionth time, and that takes longer and is more aggravating than the
> conflict would have taken to just sort out.
> 
>Linus
-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

[PULL REQUEST] Please pull rdma.git

2019-08-30 Thread Doug Ledford

Hi Linus,

*Much* calmer week this week.  Just one -rc patch queued up.  The way
the siw driver was locking around the traversal of the list of ipv6
addresses on a device was causing a scheduling while atomic issue. 
Bernard straightened it out by using the rtnl_lock.

Here's the boiler plate:

The following changes since commit c536277e0db1ad2e9fbb9dfd940c3565a14d9c52:

  RDMA/siw: Fix 64/32bit pointer inconsistency (2019-08-23 12:08:27 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus

for you to fetch changes up to 531a64e4c35bb9844b0cf813a6c9a87e00be05ff:

  RDMA/siw: Fix IPv6 addr_list locking (2019-08-28 10:29:19 -0400)


Pull request for 5.3-rc6

- Fix locking on list traversal (siw)

Signed-off-by: Doug Ledford 


Bernard Metzler (1):
  RDMA/siw: Fix IPv6 addr_list locking

 drivers/infiniband/sw/siw/siw_cm.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PULL REQUEST] Please pull rdma.git

2019-08-23 Thread Doug Ledford

Hi Linus,

I didn't notice I was on my personal email identity when I sent the pull
request.  Sorry about that.  It's really me ;-)

On Fri, 2019-08-23 at 14:48 -0400, Doug Ledford wrote:
> Hi Linus,
> 
> No beating around the bush: this is a monster pull request for an -rc5
> kernel.  Intel hit me with a series of fixes for TID processing. 
> Mellanox hit me with a series for their UMR memory support.
> 
> And we had one fix for siw that fixes the 32bit build warnings and
> because of the number of casts that had to be changed to properly
> silence the warnings, that one patch alone is a full 40% of the LOC of
> this entire pull request.  Given that this is the initial release
> kernel
> for siw, I'm trying to fix anything in it that we can, so that adds to
> the impetus to take fixes for it like this one.
> 
> I had to do a rebase early in the week.  Jason had thought he put a
> patch on the rc queue that he needed to be there so he could base some
> work off of it, and it had actually not been placed there.  So he
> asked
> me (on Tuesday) to fix that up before pushing my wip branch to the
> official rc branch.  I did, and that's why the early patches look like
> they were all committed at the same time on Tuesday.  That bunch had
> been in my queue prior.
> 
> The various patches all pass my test for being legitimate fixes and
> not
> attempts to slide new features or development into a late rc.  Well,
> they were all fixes with the exception of a couple clean up patches
> people wrote for making the fixes they also wrote better (like a
> cleanup
> patch to move UMR checking into a function so that the remaining UMR
> fix
> patches can reference that function), so I left those in place too.
> 
> My apologies for the LOC count and the number of patches here, it's
> just
> how the cards fell this cycle.  I hope you agree with me that they're
> justified fixes.
> 
> Here's the boilerplate:
> 
> The following changes since commit
> d1abaeb3be7b5fa6d7a1fbbd2e14e3310005c4c1:
> 
>   Linux 5.3-rc5 (2019-08-18 14:31:08 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
> tags/for-linus
> 
> for you to fetch changes up to
> c536277e0db1ad2e9fbb9dfd940c3565a14d9c52:
> 
>   RDMA/siw: Fix 64/32bit pointer inconsistency (2019-08-23 12:08:27
> -0400)
> 
> 
> Pull request for 5.3-rc5
> 
> - Fix siw buffer mapping issue
> - Fix siw 32/64 casting issues
> - Fix a KASAN access issue in bnxt_re
> - Fix several memory leaks (hfi1, mlx4)
> - Fix a NULL deref in cma_cleanup
> - Fixes for UMR memory support in mlx5 (4 patch series)
> - Fix namespace check for restrack
> - Fixes for counter support
> - Fixes for hfi1 TID processing (5 patch series)
> - Fix potential NULL deref in siw
> - Fix memory page calculations in mlx5
> 
> Signed-off-by: Doug Ledford 
> 
> 
> Bernard Metzler (3):
>   RDMA/siw: Fix potential NULL de-ref
>   RDMA/siw: Fix SGL mapping issues
>   RDMA/siw: Fix 64/32bit pointer inconsistency
> 
> Ido Kalir (1):
>   IB/core: Fix NULL pointer dereference when bind QP to counter
> 
> Jason Gunthorpe (1):
>   RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB
> 
> Kaike Wan (5):
>   IB/hfi1: Drop stale TID RDMA packets
>   IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet
>   IB/hfi1: Add additional checks when handling TID RDMA READ RESP
> packet
>   IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA
> packet
>   IB/hfi1: Drop stale TID RDMA packets that cause TIDErr
> 
> Leon Romanovsky (2):
>   RDMA/counters: Properly implement PID checks
>   RDMA/restrack: Rewrite PID namespace check to be reliable
> 
> Moni Shoua (4):
>   IB/mlx5: Consolidate use_umr checks into single function
>   IB/mlx5: Report and handle ODP support properly
>   IB/mlx5: Fix MR re-registration flow to use UMR properly
>   IB/mlx5: Block MR WR if UMR is not possible
> 
> Selvin Xavier (1):
>   RDMA/bnxt_re: Fix stack-out-of-bounds in
> bnxt_qplib_rcfw_send_message
> 
> Wenwen Wang (3):
>   IB/mlx4: Fix memory leaks
>   infiniband: hfi1: fix a memory leak bug
>   infiniband: hfi1: fix memory leaks
> 
> zhengbin (1):
>   RDMA/cma: fix null-ptr-deref Read in cma_cleanup
> 
>  drivers/infiniband/core/cma.c  |  6 ++-
>  drivers/infiniband/core/counters.c | 10 ++--
>  drivers/infiniband/core/nldev.c|  3 +-
>

Re: [PATCH] infiniband: hfi1: fix memory leaks

2019-08-20 Thread Doug Ledford

On Sun, 2019-08-18 at 13:54 -0500, Wenwen Wang wrote:
> In fault_opcodes_write(), 'data' is allocated through kcalloc().
> However,
> it is not deallocated in the following execution if an error occurs,
> leading to memory leaks. To fix this issue, introduce the 'free_data'
> label
> to free 'data' before returning the error.
> 
> Signed-off-by: Wenwen Wang 

Applied to for-rc, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] infiniband: hfi1: fix a memory leak bug

2019-08-20 Thread Doug Ledford

On Sun, 2019-08-18 at 14:29 -0500, Wenwen Wang wrote:
> In fault_opcodes_read(), 'data' is not deallocated if
> debugfs_file_get()
> fails, leading to a memory leak. To fix this bug, introduce the
> 'free_data'
> label to free 'data' before returning the error.
> 
> Signed-off-by: Wenwen Wang 

Applied to for-rc, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/mlx4: Fix memory leaks

2019-08-20 Thread Doug Ledford

On Sun, 2019-08-18 at 15:23 -0500, Wenwen Wang wrote:
> In mlx4_ib_alloc_pv_bufs(), 'tun_qp->tx_ring' is allocated through
> kcalloc(). However, it is not always deallocated in the following
> execution
> if an error occurs, leading to memory leaks. To fix this issue, free
> 'tun_qp->tx_ring' whenever an error occurs.
> 
> Signed-off-by: Wenwen Wang 
> ---

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/hns: Fix some white space check_mtu_validate()

2019-08-20 Thread Doug Ledford

On Fri, 2019-08-16 at 14:39 +0300, Dan Carpenter wrote:
> This line was indented a bit too far.
> 
> Signed-off-by: Dan Carpenter 

Thanks, applied to for-next.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

[PULL REQUEST] Please pull rdma.git

2019-08-14 Thread Doug Ledford

Hi Linus,

Fairly small pull request for -rc3.  I'm out of town the rest of this
week, so I made sure to clean out as much as possible from patchworks in
enough time for 0-day to chew through it (Yay! for 0-day being back
online! :-)).  Jason might send through any emergency stuff that could
pop up, otherwise I'm back next week.

The only real thing of note is the siw ABI change.  Since we just merged
siw *this* release, there are no prior kernel releases to maintain
kernel ABI with.  I told Bernard that if there is anything else about
the siw ABI he thinks he might want to change before it goes set in
stone, he should get it in ASAP.  The siw module was around for several
years outside the kernel tree, and it had to be revamped considerably
for inclusion upstream, so we are making no attempts to be backward
compatible with the out of tree version.  Once 5.3 is actually released,
we will have our baseline ABI to maintain.

Here's the boiler plate:

The following changes since commit e21a712a9685488f5ce80495b37b9fdbe96c230d:

  Linux 5.3-rc3 (2019-08-04 18:40:12 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus

for you to fetch changes up to 2c8ccb37b08fe364f02a9914daca474d43151453:

  RDMA/siw: Change CQ flags from 64->32 bits (2019-08-13 12:22:06 -0400)


Pull request for 5.3-rc3

- Fix a memory registration release flow issue that was causing a
  WARN_ON (mlx5)
- If the counters for a port aren't allocated, then we can't do
  operations on the non-existent counters (core)
- Check the right variable for error code result (mlx5)
- Fix a use after free issue (mlx5)
- Fix an off by one memory leak (siw)
- Actually return an error code on error (core)
- Allow siw to be built on 32bit arches (siw, ABI change, but OK since
  siw was just merged this merge window and there is no prior released
  kernel to maintain compatibility with and we also updated the
  rdma-core user space package to match)

Signed-off-by: Doug Ledford 


Bernard Metzler (1):
  RDMA/siw: Change CQ flags from 64->32 bits

Dan Carpenter (3):
  IB/mlx5: Check the correct variable in error handling code
  RDMA/siw: Fix a memory leak in siw_init_cpulist()
  RDMA/core: Fix error code in stat_get_doit_qp()

Mark Zhang (1):
  RDMA/counter: Prevent QP counter binding if counters unsupported

Yishai Hadas (2):
  IB/mlx5: Fix implicit MR release flow
  IB/mlx5: Fix use-after-free error while accessing ev_file pointer

 drivers/infiniband/core/counters.c|  6 ++
 drivers/infiniband/core/nldev.c   |  8 ++--
 drivers/infiniband/core/umem_odp.c|  4 
 drivers/infiniband/hw/mlx5/devx.c | 11 ++-
 drivers/infiniband/hw/mlx5/odp.c  | 24 +---
 drivers/infiniband/sw/siw/Kconfig |  2 +-
 drivers/infiniband/sw/siw/siw.h   |  2 +-
 drivers/infiniband/sw/siw/siw_main.c  |  4 +---
 drivers/infiniband/sw/siw/siw_qp.c| 14 ++
 drivers/infiniband/sw/siw/siw_verbs.c | 16 +++-
 include/uapi/rdma/siw-abi.h   |  3 ++-
 11 files changed, 53 insertions(+), 41 deletions(-)

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/hns: remove obsolete Kconfig comment

2019-08-07 Thread Doug Ledford

On Wed, 2019-08-07 at 11:22 +0800, YueHaibing wrote:
> Since commit a07fc0bb483e ("RDMA/hns: Fix build error")
> these kconfig comment is obsolete, so just remove it.
> 
> Signed-off-by: YueHaibing 

Thanks, applied to for-next.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH V2] mlx5: Fix formats with line continuation whitespace

2019-08-02 Thread Doug Ledford

On Fri, 2019-08-02 at 18:32 +, Saeed Mahameed wrote:
> On Fri, 2019-08-02 at 11:09 -0700, Joe Perches wrote:
> > On Tue, 2018-11-06 at 16:34 -0500, Doug Ledford wrote:
> > > On Thu, 2018-11-01 at 09:34 +0200, Leon Romanovsky wrote:
> > > > On Thu, Nov 01, 2018 at 12:24:08AM -0700, Joe Perches wrote:
> > > > > The line continuations unintentionally add whitespace so
> > > > > instead use coalesced formats to remove the whitespace.
> > > > > 
> > > > > Signed-off-by: Joe Perches 
> > > > > ---
> > > > > 
> > > > > v2: Remove excess space after %u
> > > > > 
> > > > >  drivers/net/ethernet/mellanox/mlx5/core/rl.c | 6 ++
> > > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > 
> > > > 
> > > > Thanks,
> > > > Reviewed-by: Leon Romanovsky 
> > > 
> > > Applied, thanks.
> > 
> > Still not upstream.  How long does it take?
> > 
> 
> Doug, Leon, this patch still apply, let me know what happened here ?
> and if you want me to apply it to one of my branches.

I'm not entirely sure what happened here.  Obviously I said I had taken
it, which I don't do under my normal workflow until I've actually
applied and build tested the patch.  For it to not make it into the tree
means that I probably applied it to my wip/dl-for-next branch, but prior
to moving it to for-next, I might have had a rebase and it got lost in
the shuffle or something like that.  My apologies for letting it slip
through the cracks.  Anyway, I pulled the patch from patchworks, applied
it, and pushed it to k.o.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

[PULL REQUEST] Please pull rdma.git

2019-08-02 Thread Doug Ledford

Hi Linus,

Here's our second -rc pull request.  Nothing particularly special in
this one.  The client removal deadlock fix is kindy tricky, but we had
multiple eyes on it and no one could find a fault in it.  A couple
Spectre V1 fixes too.  Otherwise, all just normal -rc fodder.

Here's the boilerplate:

The following changes since commit b7165bd0d6cbb93732559be6ea8774653b204480:

  IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification 
(2019-07-25 11:45:48 -0300)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus

for you to fetch changes up to 020fb3bebc224dfe9353a56ecbe2d5fac499dffc:

  RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp() (2019-08-01 
12:53:53 -0400)


Pull request for 5.3-rc2

- A couple Spectre V1 fixes (umad, hfi1)
- Fix a tricky deadlock in the rdma core code with refcounting instead
  of locks (client removal patches)
- Build errors (hns)
- Fix a scheduling while atomic issue (mlx5)
- Use after free fix (mad)
- Fix error path return code (hns)
- Null deref fix (siw_crypto_hash)
- A few other misc. minor fixes


Bernard Metzler (1):
  Do not dereference 'siw_crypto_shash' before checking

Gal Pressman (1):
  RDMA/restrack: Track driver QP types in resource tracker

Gustavo A. R. Silva (1):
  IB/hfi1: Fix Spectre v1 vulnerability

Guy Levi (1):
  IB/mlx5: Fix MR registration flow to use UMR properly

Jack Morgenstein (1):
  IB/mad: Fix use-after-free in ib mad completion handling

Jason Gunthorpe (2):
  RDMA/devices: Do not deadlock during client removal
  RDMA/devices: Remove the lock around remove_client_context

Leon Romanovsky (1):
  RDMA/mlx5: Release locks during notifier unregister

Michal Kalderon (1):
  RDMA/qedr: Fix the hca_type and hca_rev returned in device attributes

Tony Luck (1):
  IB/core: Add mitigation for Spectre V1

Wei Yongjun (1):
  RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp()

YueHaibing (1):
  RDMA/hns: Fix build error

 drivers/infiniband/core/core_priv.h|   5 +-
 drivers/infiniband/core/device.c   | 102 +++--
 drivers/infiniband/core/mad.c  |  20 +++---
 drivers/infiniband/core/user_mad.c |   6 +-
 drivers/infiniband/hw/hfi1/verbs.c |   2 +
 drivers/infiniband/hw/hns/Kconfig  |   6 +-
 drivers/infiniband/hw/hns/Makefile |   8 +--
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c |   4 +-
 drivers/infiniband/hw/mlx5/main.c  |   7 +-
 drivers/infiniband/hw/mlx5/mr.c|  27 +++-
 drivers/infiniband/hw/qedr/main.c  |  10 ++-
 drivers/infiniband/sw/siw/siw_qp.c |   6 +-
 include/rdma/ib_verbs.h|   4 +-
 13 files changed, 124 insertions(+), 83 deletions(-)

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/hfi1: Fix Spectre v1 vulnerability

2019-08-01 Thread Doug Ledford

On Wed, 2019-07-31 at 12:54 -0500, Gustavo A. R. Silva wrote:
> sl is controlled by user-space, hence leading to a potential
> exploitation of the Spectre variant 1 vulnerability.
> 
> Fix this by sanitizing sl before using it to index ibp->sl_to_sc.
> 
> Notice that given that speculation windows are large, the policy is
> to kill the speculation on the first load and not worry if it can be
> completed with a dependent load/store [1].
> 
> [1] 
> https://lore.kernel.org/lkml/20180423164740.gy17...@dhcp22.suse.cz/
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 
> ---

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH -next] RDMA/hns: remove set but not used variable 'irq_num'

2019-08-01 Thread Doug Ledford

On Thu, 2019-08-01 at 13:10 +0300, Leon Romanovsky wrote:
> On Wed, Jul 31, 2019 at 03:37:48PM +0800, YueHaibing wrote:
> > Fixes gcc '-Wunused-but-set-variable' warning:
> > 
> > drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function
> > hns_roce_v2_cleanup_eq_table:
> > drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5920:6:
> >  warning: variable irq_num set but not used [-Wunused-but-set-
> > variable]
> > 
> > It is not used since
> > commit 33db6f94847c ("RDMA/hns: Refactor eq table init for hip08")
> > 
> > Reported-by: Hulk Robot 
> > Signed-off-by: YueHaibing 
> > ---
> >  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> 
> I'm hitting this error too.
> 
> Thanks,
> Reviewed-by: Leon Romanovsky 

Applied to for-next, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH V2] IB/core: Add mitigation for Spectre V1

2019-07-31 Thread Doug Ledford

On Wed, 2019-07-31 at 14:52 -0400, Doug Ledford wrote:
> On Wed, 2019-07-31 at 12:52 -0500, Gustavo A. R. Silva wrote:
> > This is insufficient. The speculation windows are large:
> > 
> > "Speculative  execution  on  modern  CPUs  can  run  several
> > hundred  instructions  ahead." [1]
> > 
> > [1] https://spectreattack.com/spectre.pdf
> 
> Thanks, I'll take a look at that.  That issue aside, returning without
> wasting time on two mutexes is still better IMO, so I like my patch
> more
> than the proposed one.  Tony, would you like to resubmit?
> 

Never mind, I took your V2 and fixed it up like I wanted.  Patch
applied, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH V2] IB/core: Add mitigation for Spectre V1

2019-07-31 Thread Doug Ledford

On Wed, 2019-07-31 at 12:52 -0500, Gustavo A. R. Silva wrote:
> This is insufficient. The speculation windows are large:
> 
> "Speculative  execution  on  modern  CPUs  can  run  several
> hundred  instructions  ahead." [1]
> 
> [1] https://spectreattack.com/spectre.pdf

Thanks, I'll take a look at that.  That issue aside, returning without
wasting time on two mutexes is still better IMO, so I like my patch more
than the proposed one.  Tony, would you like to resubmit?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH][next] RDMA/core: fix spelling mistake "Nelink" -> "Netlink"

2019-07-31 Thread Doug Ledford

On Wed, 2019-07-31 at 11:28 +0300, Leon Romanovsky wrote:
> On Wed, Jul 31, 2019 at 09:01:44AM +0100, Colin King wrote:
> > From: Colin Ian King 
> > 
> > There is a spelling mistake in a warning message, fix it.
> > 
> > Signed-off-by: Colin Ian King 
> > ---
> >  drivers/infiniband/core/netlink.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> 
> Thanks,
> Reviewed-by: Leon Romanovsky 

Thanks, applied to for-next.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v6 20/57] infiniband: Remove dev_err() usage after platform_get_irq()

2019-07-31 Thread Doug Ledford

On Tue, 2019-07-30 at 11:15 -0700, Stephen Boyd wrote:
> We don't need dev_err() messages when platform_get_irq() fails now
> that
> platform_get_irq() prints an error message itself when something goes
> wrong. Let's remove these prints with a simple semantic patch.
> 
> // 
> @@
> expression ret;
> struct platform_device *E;
> @@
> 
> ret =
> (
> platform_get_irq(E, ...)
> platform_get_irq_byname(E, ...)
> );
> 
> if ( \( ret < 0 \| ret <= 0 \) )
> {
> (
> -if (ret != -EPROBE_DEFER)
> -{ ...
> -dev_err(...);
> -... }
> ...
> -dev_err(...);
> )
> ...
> }
> // 
> 
> While we're here, remove braces on if statements that only have one
> statement (manually).
> 
> Cc: Doug Ledford 
> Cc: Jason Gunthorpe 
> Cc: linux-r...@vger.kernel.org
> Cc: Greg Kroah-Hartman 
> Signed-off-by: Stephen Boyd 
> ---
> 
> Please apply directly to subsystem trees
> 

Thanks for being clear about where you wanted these applied.  This patch
applied to rdma for-next, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH V2] IB/core: Add mitigation for Spectre V1

2019-07-31 Thread Doug Ledford

On Tue, 2019-07-30 at 21:39 -0700, Luck, Tony wrote:
> Some processors may mispredict an array bounds check and
> speculatively access memory that they should not. With
> a user supplied array index we like to play things safe
> by masking the value with the array size before it is
> used as an index.
> 
> Signed-off-by: Tony Luck 
> 
> ---
> V2: Mask the index *AFTER* the bounds check. Issue pointed
> out by Gustavo. Fix suggested by Ira.
> 
>  drivers/infiniband/core/user_mad.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/user_mad.c
> b/drivers/infiniband/core/user_mad.c
> index 9f8a48016b41..32cea5ed9ce1 100644
> --- a/drivers/infiniband/core/user_mad.c
> +++ b/drivers/infiniband/core/user_mad.c
> @@ -49,6 +49,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -888,7 +889,12 @@ static int ib_umad_unreg_agent(struct
> ib_umad_file *file, u32 __user *arg)
>   mutex_lock(&file->port->file_mutex);
>   mutex_lock(&file->mutex);
>  
> - if (id >= IB_UMAD_MAX_AGENTS || !__get_agent(file, id)) {
> + if (id >= IB_UMAD_MAX_AGENTS) {
> + ret = -EINVAL;
> + goto out;
> + }
> + id = array_index_nospec(id, IB_UMAD_MAX_AGENTS);
> + if (!__get_agent(file, id)) {
>   ret = -EINVAL;
>   goto out;
>   }

I'm not sure this is the best fix for this.  However, here is where I
get to admit that I largely ignored the whole Spectre V1 thing, so I'm
not sure I completely understand the vulnerability and the limits to
that.  But, looking at the function, it seems we can do an early return
without ever taking any of the mutexes in the function in the case of id
>= IB_UMAD_MAX_AGENTS, so if we did that, would that separate the
checking of id far enough from the usage of it as an array index that we
wouldn't need the clamp to prevent speculative prefetch?  About how far,
in code terms, does this speculative prefetch occur?

This is the patch I was thinking of:

diff --git a/drivers/infiniband/core/user_mad.c 
b/drivers/infiniband/core/user_mad.c
index 9f8a48016b41..1e507dd257ff 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -884,11 +885,18 @@ static int ib_umad_unreg_agent(struct ib_umad_file *file, 
u32 __user *arg)
 
if (get_user(id, arg))
return -EFAULT;
+   if (id >= IB_UMAD_MAX_AGENTS)
+   return -EINVAL;
 
mutex_lock(&file->port->file_mutex);
mutex_lock(&file->mutex);
 
-   if (id >= IB_UMAD_MAX_AGENTS || !__get_agent(file, id)) {
+   /*
+* Is our check of id far enough away, code wise, to prevent
+* speculative prefetch?
+*/
+   id = array_index_nospec(id, IB_UMAD_MAX_AGENTS);
+   if (!__get_agent(file, id)) {
ret = -EINVAL;
goto out;
}

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] rdma: siw: remove unused variable

2019-07-29 Thread Doug Ledford

On Mon, 2019-07-29 at 16:03 -0300, Jason Gunthorpe wrote:
> On Mon, Jul 29, 2019 at 02:19:35PM -0400, Doug Ledford wrote:
> > On Fri, 2019-07-26 at 11:25 +0200, Anders Roxell wrote:
> > > The variable 'p' si no longer used and the compiler rightly
> > > complains
> > > that it should be removed.
> > > 
> > > ../drivers/infiniband/sw/siw/siw_mem.c: In function
> > > ‘siw_free_plist’:
> > > ../drivers/infiniband/sw/siw/siw_mem.c:66:16: warning: unused
> > > variable
> > >  ‘p’ [-Wunused-variable]
> > >   struct page **p = chunk->plist;
> > > ^
> > > 
> > > Rework to remove unused variable.
> > > 
> > > Fixes: 8288d030447f ("mm/gup: add make_dirty arg to
> > > put_user_pages_dirty_lock()")
> > 
> > This commit hash and the commit subject does not exist in Linus'
> > tree as
> > of today.  What tree is this being merged through, and is it slated
> > to
> > merge soon or is this a for-next item?
> 
> This is though -mm, maybe John knows what is what

Hmmm...if it's through -mm, doesn't that mean that we can't rely on the
hash because the next time Andrew's tree rebases (using quilt or
whatever it is he does) that the hash will change?  It doesn't really
matter too much...we can't take the fix anyway, it should probably be
squashed into the patch that it's fixing, and if you follow Bernard's
advice, you fix the problem by eliminating this function and changing
the sole call site to just call put_user_pages_dirty_lock() directly.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] rdma: siw: remove unused variable

2019-07-29 Thread Doug Ledford

On Fri, 2019-07-26 at 11:25 +0200, Anders Roxell wrote:
> The variable 'p' si no longer used and the compiler rightly complains
> that it should be removed.
> 
> ../drivers/infiniband/sw/siw/siw_mem.c: In function ‘siw_free_plist’:
> ../drivers/infiniband/sw/siw/siw_mem.c:66:16: warning: unused variable
>  ‘p’ [-Wunused-variable]
>   struct page **p = chunk->plist;
> ^
> 
> Rework to remove unused variable.
> 
> Fixes: 8288d030447f ("mm/gup: add make_dirty arg to
> put_user_pages_dirty_lock()")

This commit hash and the commit subject does not exist in Linus' tree as
of today.  What tree is this being merged through, and is it slated to
merge soon or is this a for-next item?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] infiniband: hw: cxgb3: Fix a possible null-pointer dereference in connect_reply_upcall()

2019-07-29 Thread Doug Ledford

On Thu, 2019-07-25 at 20:15 +0800, Jia-Ju Bai wrote:
> In connect_reply_upcall(), there is an if statement on line 730 to
> check
> whether ep->com.cm_id is NULL:
> if (ep->com.cm_id)
> 
> When ep->com.cm_id is NULL, it is used on line 736:
> ep->com.cm_id->rem_ref(ep->com.cm_id);
> 
> Thus, a possible null-pointer dereference may occur.
> 
> To fix this bug, ep->com.cm_id is checked before being used.
> 
> This bug is found by a static analysis tool STCheck written by us.

I think this is one of those theoretical issues that is a non-issue in
practice.  The cxgb3 driver is older than dirt and only hangs around
because people out there are still using it in a few places.  We have no
reports of bugs in this function.  I looked through the code, but it
wasn't quickly obvious why this isn't an issue, but I suspect the real
answer is "we can never have a negative status and not have a cm_id" as
a result of the design of the code.  Verifying that with an audit is
more time than I want to spend though.  So, I'm going to drop this
patch.  If you can find a plausible path by which this is actually a
bug, then we can revisit it.  But I don't want to go around changing a
known working for years driver on the basis of "my new static code
checker found this thing" versus someone who reports an actual crash
(which is what this bug would cause if it exists).

> Signed-off-by: Jia-Ju Bai 
> ---
>  drivers/infiniband/hw/cxgb3/iwch_cm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c
> b/drivers/infiniband/hw/cxgb3/iwch_cm.c
> index 0bca72cb4d9a..2b31c4726d3e 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
> @@ -733,7 +733,8 @@ static void connect_reply_upcall(struct iwch_ep
> *ep, int status)
>   ep->com.cm_id->event_handler(ep->com.cm_id, &event);
>   }
>   if (status < 0) {
> - ep->com.cm_id->rem_ref(ep->com.cm_id);
> +     if (ep->com.cm_id)
> + ep->com.cm_id->rem_ref(ep->com.cm_id);
>   ep->com.cm_id = NULL;
>   ep->com.qp = NULL;
>   }

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/hns: Fix build error

2019-07-29 Thread Doug Ledford

On Wed, 2019-07-24 at 08:32 -0300, Jason Gunthorpe wrote:
> On Wed, Jul 24, 2019 at 02:54:43PM +0800, YueHaibing wrote:
> > If INFINIBAND_HNS_HIP08 is selected and HNS3 is m,
> > but INFINIBAND_HNS is y, building fails:
> > 
> > drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function
> > `hns_roce_hw_v2_exit':
> > hns_roce_hw_v2.c:(.exit.text+0xd): undefined reference to
> > `hnae3_unregister_client'
> > drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function
> > `hns_roce_hw_v2_init':
> > hns_roce_hw_v2.c:(.init.text+0xd): undefined reference to
> > `hnae3_register_client'
> > 
> > Also if INFINIBAND_HNS_HIP06 is selected and HNS_DSAF
> > is m, but INFINIBAND_HNS is y, building fails:
> > 
> > drivers/infiniband/hw/hns/hns_roce_hw_v1.o: In function
> > `hns_roce_v1_reset':
> > hns_roce_hw_v1.c:(.text+0x39fa): undefined reference to
> > `hns_dsaf_roce_reset'
> > hns_roce_hw_v1.c:(.text+0x3a25): undefined reference to
> > `hns_dsaf_roce_reset'
> > 
> > Reported-by: Hulk Robot 
> > Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08
> > RoCE")
> > Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce
> > driver")
> > Signed-off-by: YueHaibing 
> >  drivers/infiniband/hw/hns/Kconfig  | 6 +++---
> >  drivers/infiniband/hw/hns/Makefile | 8 ++--
> >  2 files changed, 5 insertions(+), 9 deletions(-)
> 
> did you test this approach with CONFIG_MODULES=n?

This version of the patch looks like the right fix.

Applying to for-rc, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

[PULL REQUEST] Please pull rdma.git

2019-06-21 Thread Doug Ledford

Hi Linus,

This is probably our last -rc pull request.  We don't have anything else
outstanding at the moment anyway, and with the summer months on us and
people taking trips, I expect the next weeks leading up to the merge
window to be pretty calm and sedate.

This has two simple, no brainer fixes for the EFA driver.

Then it has ten not quite so simple fixes for the hfi1 driver.  The
problem with them is that they aren't simply one liner typo fixes. 
They're still fixes, but they're more complex issues like livelock under
heavy load where the answer was to change work queue usage and spinlock
usage to resolve the problem, or issues with orphaned requests during
certain types of failures like link down which required some more
complex work to fix too.  They all look like legitimate fixes to me,
they just aren't small like I wish they were.

Here's the boilerplate:

The following changes since commit d1fdb6d8f6a4109a4263176c84b899076a5f8008:

  Linux 5.2-rc4 (2019-06-08 20:24:46 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus

for you to fetch changes up to 7a5834e456f7fb3eca9b63af2a6bc7f460ae482f:

  RDMA/efa: Handle mmap insertions overflow (2019-06-18 16:27:24 -0400)


Pull request for 5.1-rc5

- 2 minor EFA fixes
- 10 hfi1 fixes related to scaling issues


Gal Pressman (2):
  RDMA/efa: Fix success return value in case of error
  RDMA/efa: Handle mmap insertions overflow

Kaike Wan (1):
  IB/hfi1: Validate fault injection opcode user input

Mike Marciniszyn (9):
  IB/hfi1: Close PSM sdma_progress sleep window
  IB/hfi1: Correct tid qp rcd to match verbs context
  IB/hfi1: Avoid hardlockup with flushlist_lock
  IB/hfi1: Silence txreq allocation warnings
  IB/hfi1: Create inline to get extended headers
  IB/hfi1: Use aborts to trigger RC throttling
  IB/hfi1: Wakeup QPs orphaned on wait list after flush
  IB/hfi1: Handle wakeup of orphaned QPs for pio
  IB/hfi1: Handle port down properly in pio

 drivers/infiniband/hw/efa/efa_com_cmd.c  | 24 +++
 drivers/infiniband/hw/efa/efa_verbs.c| 21 ++---
 drivers/infiniband/hw/hfi1/chip.c| 13 
 drivers/infiniband/hw/hfi1/chip.h|  1 +
 drivers/infiniband/hw/hfi1/fault.c   |  5 +++
 drivers/infiniband/hw/hfi1/hfi.h | 31 +++
 drivers/infiniband/hw/hfi1/pio.c | 21 +++--
 drivers/infiniband/hw/hfi1/rc.c  | 53 +++-
 drivers/infiniband/hw/hfi1/sdma.c| 26 
 drivers/infiniband/hw/hfi1/tid_rdma.c|  4 +--
 drivers/infiniband/hw/hfi1/ud.c  |  4 +--
 drivers/infiniband/hw/hfi1/user_sdma.c   | 12 +++-
 drivers/infiniband/hw/hfi1/user_sdma.h   |  1 -
 drivers/infiniband/hw/hfi1/verbs.c   | 14 +
 drivers/infiniband/hw/hfi1/verbs.h   |  1 +
 drivers/infiniband/hw/hfi1/verbs_txreq.c |  2 +-
 drivers/infiniband/hw/hfi1/verbs_txreq.h |  3 +-
 17 files changed, 174 insertions(+), 62 deletions(-)

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: linux-next: manual merge of the rdma tree with Linus' tree

2019-06-19 Thread Doug Ledford

On Thu, 2019-06-20 at 12:10 +1000, Stephen Rothwell wrote:
>   2d3c72ed5041 ("rdma: Remove nes")

Yeah, not much you can do about tree wide patchsets conflicting with a
removal ;-)

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: linux-next: manual merge of the rdma tree with Linus' tree

2019-06-19 Thread Doug Ledford

On Thu, 2019-06-20 at 12:06 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the rdma tree got a conflict in:
> 
>   include/rdma/ib_verbs.h
> 
> between commit:
> 
>   dc1435c00fcd ("RDMA/srp: Rename SRP sysfs name after IB device
> rename trigger")
> 
> from Linus' tree and commit:
> 
>   0e2d00eb6fd4 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev
> discovery and autoload")
> 
> from the rdma tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your
> tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any
> particularly
> complex conflicts.
> 

Yep, this one was expected.  Thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH trivial] IB/hfi1: Spelling s/statisfied/satisfied/

2019-06-17 Thread Doug Ledford

On Mon, 2019-06-17 at 16:01 +0200, Geert Uytterhoeven wrote:
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/infiniband/hw/hfi1/tid_rdma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c
> b/drivers/infiniband/hw/hfi1/tid_rdma.c
> index 6fb93032fbefcb7e..24f30ff6b5fbc868 100644
> --- a/drivers/infiniband/hw/hfi1/tid_rdma.c
> +++ b/drivers/infiniband/hw/hfi1/tid_rdma.c
> @@ -477,7 +477,7 @@ static struct rvt_qp *first_qp(struct
> hfi1_ctxtdata *rcd,
>   * Must hold the qp s_lock and the exp_lock.
>   *
>   * Return:
> - * false if either of the conditions below are statisfied:
> + * false if either of the conditions below are satisfied:
>   * 1. The list is empty or
>   * 2. The indicated qp is at the head of the list and the
>   *HFI1_S_WAIT_TID_SPACE bit is set in qp->s_flags.

Thanks, applied to for-next.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
2FDD


signature.asc
Description: This is a digitally signed message part

Re: RDMA: Clean destroy CQ in drivers do not return errors

2019-06-14 Thread Doug Ledford

On Fri, 2019-06-14 at 14:59 +0100, Colin Ian King wrote:
> Hi,
> 
> Static analysis with Coverity reported an issue with the following
> commit:
> 
> commit a52c8e2469c30cf7ac453d624aed9c168b23d1af
> Author: Leon Romanovsky 
> Date:   Tue May 28 14:37:28 2019 +0300
> 
> RDMA: Clean destroy CQ in drivers do not return errors
> 
> In function bnxt_re_destroy_cq() contains the following:
> 
> if (!cq->umem)
> ib_umem_release(cq->umem);

Given that the original test that was replaced was:
if (!IS_ERR_OR_NULL(cq->umem))

we aren't really worried about a null cq, just that umem is valid.  So,
the logic is inverted on the test (or possibly we shouldn't have
replaced !IS_ERR_OR_NULL(cq->umem) at all).

But on closer inspection, the bnxt_re specific portion of this patch
appears to have another problem in that it no longer checks the result
of bnxt_qplib_destroy_cq() yet it does nothing to keep that function
from failing.

Leon, can you send a followup fix?

> Coverity detects this as a deference after null check on the null
> pointer cq->umem:
> 
> "var_deref_model: Passing null pointer cq->umem to ib_umem_release,
> which dereferences it"
> 
> Is the logic inverted on that null check?
> 
> Colin

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
2FDD

signature.asc
Description: This is a digitally signed message part

Re: linux-next: manual merge of the rdma tree with Linus' tree

2019-06-14 Thread Doug Ledford

On Fri, 2019-06-14 at 13:00 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the rdma tree got conflicts in:
> 
>   drivers/infiniband/core/uverbs_cmd.c
>   drivers/infiniband/core/uverbs_std_types_cq.c
> 
> between commit:
> 
>   6876aaedc8a1 ("RDMA/uverbs: Pass udata on uverbs error unwind")
> 
> from Linus' tree and commit:
> 
>   e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core
> responsibility")
> 
> from the rdma tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your
> tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any
> particularly
> complex conflicts.
> 

Thanks Stephen.  There will be at least one more coming too.  We've had
a number of -rc patches and -next patches that touch the same area of
code this cycle.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2] RDMA/cma: Make CM response timeout and # CM retries configurable

2019-06-13 Thread Doug Ledford

On Thu, 2019-06-13 at 18:58 +0200, Håkon Bugge wrote:
> > On 13 Jun 2019, at 16:25, Doug Ledford  wrote:
> > 
> > On Tue, 2019-02-26 at 08:57 +0100, Håkon Bugge wrote:
> > > During certain workloads, the default CM response timeout is too
> > > short, leading to excessive retries. Hence, make it configurable
> > > through sysctl. While at it, also make number of CM retries
> > > configurable.
> > > 
> > > The defaults are not changed.
> > > 
> > > Signed-off-by: Håkon Bugge 
> > > ---
> > > v1 -> v2:
> > >   * Added unregister_net_sysctl_table() in cma_cleanup()
> > > ---
> > > drivers/infiniband/core/cma.c | 52
> > > ++---
> > > --
> > > 1 file changed, 45 insertions(+), 7 deletions(-)
> > 
> > This has been sitting on patchworks since forever.  Presumably
> > because
> > Jason and I neither one felt like we really wanted it, but also
> > couldn't justify flat refusing it.
> 
> I thought the agreement was to use NL and iproute2. But I haven't had
> the capacity.

To be fair, the email thread was gone from my linux-rdma folder.  So, I
just had to review the entry in patchworks, and there was no captured
discussion there.  So, if the agreement was made, it must have been
face to face some time and if I was involed, I had certainly forgotten
by now.  But I still needed to clean up patchworks, hence my email ;-).

> >  Well, I've made up my mind, so
> > unless Jason wants to argue the other side, I'm rejecting this
> > patch. 
> > Here's why.  The whole concept of a timeout is to help recovery in
> > a
> > situation that overloads one end of the connection.  There is a
> > relationship between the max queue backlog on the one host and the
> > timeout on the other host.  
> 
> If you refer to the backlog parameter in rdma_listen(), I cannot see
> it being used at all for IB.

No, not exactly.  I was more referring to heavy load causing an
overflow in the mad packet receive processing.  We have
IB_MAD_QP_RECV_SIZE set to 512 by default, but it can be changed at
module load time of the ib_core module and that represents the maximum
number of backlogged mad packets we can have waiting to be processed
before we just drop them on the floor.  There can be other places to
drop them too, but this is the one I was referring to.

> For CX-3, which is paravirtualized wrt. MAD packets, it is the proxy
> UD receive queue length for the PF driver that can be construed as a
> backlog. Remember that any MAD packet being sent from a VF or the PF
> itself, is sent to a proxy UD QP in the PF. Those packets are then
> multiplexed out on the real QP0/1. Incoming MAD packets are
> demultiplexed and sent once more to the proxy QP in the VF.
> 
> > Generally, in order for a request to get
> > dropped and us to need to retransmit, the queue must already have a
> > full backlog.  So, how long does it take a heavily loaded system to
> > process a full backlog?  That, plus a fuzz for a margin of error,
> > should be our timeout.  We shouldn't be asking users to configure
> > it.
> 
> Customer configures #VMs and different workload may lead to way
> different number of CM connections. The proxying of MAD packet
> through the PF driver has a finite packet rate. With 64 VMs, 10.000
> QPs on each, all going down due to a switch failing or similar, you
> have 640.000 DREQs to be sent, and with the finite packet rate of MA
> packets through the PF, this takes more than the current CM timeout.
> And then you re-transmit and increase the burden of the PF proxying.
> 
> So, we can change the default to cope with this. But, a MAD packet is
> unreliable, we may have transient loss. In this case, we want a short
> timeout.
> 
> > However, if users change the default backlog queue on their
> > systems,
> > *then* it would make sense to have the users also change the
> > timeout
> > here, but I think guidance would be helpful.
> > 
> > So, to revive this patch, what I'd like to see is some attempt to
> > actually quantify a reasonable timeout for the default backlog
> > depth,
> > then the patch should actually change the default to that
> > reasonable
> > timeout, and then put in the ability to adjust the timeout with
> > some
> > sort of doc guidance on how to calculate a reasonable timeout based
> > on
> > configured backlog depth.
> 
> I can agree to this :-)
> 
> 
> Thxs, Håkon
> 
> > -- 
> > Doug Ledford 
> >GPG KeyID: B826A3330E572FDD
> >Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
> > 2FDD

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2] RDMA/cma: Make CM response timeout and # CM retries configurable

2019-06-13 Thread Doug Ledford

On Thu, 2019-06-13 at 08:28 -0700, Bart Van Assche wrote:
> On 6/13/19 7:25 AM, Doug Ledford wrote:
> > So, to revive this patch, what I'd like to see is some attempt to
> > actually quantify a reasonable timeout for the default backlog
> > depth,
> > then the patch should actually change the default to that
> > reasonable
> > timeout, and then put in the ability to adjust the timeout with
> > some
> > sort of doc guidance on how to calculate a reasonable timeout based
> > on
> > configured backlog depth.
> 
> How about following the approach of the SRP initiator driver? It
> derives 
> the CM timeout from the subnet manager timeout. The assumption
> behind 
> this is that in large networks the subnet manager timeout has to be
> set 
> higher than its default to make communication work. See also 
> srp_get_subnet_timeout().

Theoretically, the subnet manager needs a longer timeout in a bigger
network because it's handling more data as a single point of lookup for
the entire subnet.  Individual machines, on the other hand, have the
same backlog size (by default) regardless of the size of the network,
and there is no guarantee that if the admin increased the subnet
manager timeout, that they also increased the backlog queue depth size.
So, while I like things that auto-tune like you are suggesting, the
problem is that the one item does not directly correlate with the
other.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2] RDMA/cma: Make CM response timeout and # CM retries configurable

2019-06-13 Thread Doug Ledford

On Tue, 2019-02-26 at 08:57 +0100, Håkon Bugge wrote:
> During certain workloads, the default CM response timeout is too
> short, leading to excessive retries. Hence, make it configurable
> through sysctl. While at it, also make number of CM retries
> configurable.
> 
> The defaults are not changed.
> 
> Signed-off-by: Håkon Bugge 
> ---
> v1 -> v2:
>* Added unregister_net_sysctl_table() in cma_cleanup()
> ---
>  drivers/infiniband/core/cma.c | 52 ++---
> --
>  1 file changed, 45 insertions(+), 7 deletions(-)

This has been sitting on patchworks since forever.  Presumably because
Jason and I neither one felt like we really wanted it, but also
couldn't justify flat refusing it.  Well, I've made up my mind, so
unless Jason wants to argue the other side, I'm rejecting this patch. 
Here's why.  The whole concept of a timeout is to help recovery in a
situation that overloads one end of the connection.  There is a
relationship between the max queue backlog on the one host and the
timeout on the other host.  Generally, in order for a request to get
dropped and us to need to retransmit, the queue must already have a
full backlog.  So, how long does it take a heavily loaded system to
process a full backlog?  That, plus a fuzz for a margin of error,
should be our timeout.  We shouldn't be asking users to configure it.

However, if users change the default backlog queue on their systems,
*then* it would make sense to have the users also change the timeout
here, but I think guidance would be helpful.

So, to revive this patch, what I'd like to see is some attempt to
actually quantify a reasonable timeout for the default backlog depth,
then the patch should actually change the default to that reasonable
timeout, and then put in the ability to adjust the timeout with some
sort of doc guidance on how to calculate a reasonable timeout based on
configured backlog depth.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
2FDD

signature.asc
Description: This is a digitally signed message part

Re: linux-next: manual merge of the rdma-fixes tree with Linus' tree

2019-04-29 Thread Doug Ledford

On Tue, 2019-04-30 at 08:13 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the rdma-fixes tree got a conflict in:
> 
>   drivers/infiniband/core/uverbs_main.c
> 
> between commit:
> 
>   6a5c5d26c4c6 ("rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE 
> use")
> 
> from Linus' tree and commit:
> 
>   d79a26b99f5f ("RDMA/uverbs: Fix compilation error on s390 and mips 
> platforms")
> 
> from the rdma-fixes tree.
> 
> I fixed it up (I just used the version from Linus' tree) and can carry the
> fix as necessary. This is now fixed as far as linux-next is concerned,
> but any non trivial conflicts should be mentioned to your upstream
> maintainer when your tree is submitted for merging.  You may also want
> to consider cooperating with the maintainer of the conflicting tree to
> minimise any particularly complex conflicts.
> 

Sorry, I forgot to back that head commit out.  Once Linus committed his
fix the one in the rdma tree was superfluous (and wrong anyway).

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [GIT PULL] Please pull rdma.git

2019-04-29 Thread Doug Ledford

On Mon, 2019-04-29 at 21:00 +0300, Leon Romanovsky wrote:
> On Mon, Apr 29, 2019 at 01:13:01PM -0400, Doug Ledford wrote:
> > On Mon, 2019-04-29 at 09:48 -0700, Linus Torvalds wrote:
> > > On Mon, Apr 29, 2019 at 9:29 AM Doug Ledford  wrote:
> > > >  drivers/infiniband/core/uverbs_main.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > This trivial one-liner is actually incorrect.
> > > 
> > > It should use 'vmf->address', because the point of the ZERO_PAGE
> > > argument is to pick the page with the right virtual address alias for
> > > broken architectures that need those kinds.
> > > 
> > > I'm actually surprised s390 wants it, usually it's just MIPS that has
> > > the horribly broken virtual address translation stuff. But it looks
> > > like for s390 it's at least only a performance issue (ie it causes
> > > some aliases in L1 that cause cacheline ping-pong rather than anything
> > > else).
> > 
> > That's what I get for listening to Jason ;-)
> > 
> > Well, since you have just essentially re-written the patch to be
> > correct, you are now the developer of origin.  Do you want to commit the
> > fix directly or shall I respin it for you to pull?
> 
> Linus already committed it.
> 
> The whole breakage is mystery for me, the patch which introduced
> breakage, got successful build report from 0-build.

Yeah, successful 0-day build is a good indicator, but not a guarantee. 
You just can't test all possible permutations and sometimes things slip
by.  But it's still way better than 20 years ago when half the patches
weren't even compile tested before coming to Linus ;-)


-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [GIT PULL] Please pull rdma.git

2019-04-29 Thread Doug Ledford

On Mon, 2019-04-29 at 09:48 -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019 at 9:29 AM Doug Ledford  wrote:
> > 
> >  drivers/infiniband/core/uverbs_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> This trivial one-liner is actually incorrect.
> 
> It should use 'vmf->address', because the point of the ZERO_PAGE
> argument is to pick the page with the right virtual address alias for
> broken architectures that need those kinds.
> 
> I'm actually surprised s390 wants it, usually it's just MIPS that has
> the horribly broken virtual address translation stuff. But it looks
> like for s390 it's at least only a performance issue (ie it causes
> some aliases in L1 that cause cacheline ping-pong rather than anything
> else).

That's what I get for listening to Jason ;-)

Well, since you have just essentially re-written the patch to be
correct, you are now the developer of origin.  Do you want to commit the
fix directly or shall I respin it for you to pull?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [GIT PULL] Please pull RDMA subsystem changes

2019-04-29 Thread Doug Ledford

On Mon, 2019-04-29 at 11:42 -0400, Doug Ledford wrote:
> On Mon, 2019-04-29 at 08:40 +, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2019 at 08:09:47AM +0200, Heiko Carstens wrote:
> > > On Sun, Apr 28, 2019 at 11:52:12AM +, Jason Gunthorpe wrote:
> > > > Hi Linus,
> > > > 
> > > > Third rc pull request
> > > > 
> > > > Nothing particularly special here. There is a small merge conflict
> > > > with Adrea's mm_still_valid patches which is resolved as below:
> > > ...
> > > > Jason Gunthorpe (3):
> > > >   RDMA/mlx5: Do not allow the user to write to the clock page
> > > >   RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
> > > >   RDMA/ucontext: Fix regression with disassociate
> > > 
> > > This doesn't compile. The patch below would fix it, but not sure if
> > > this is what is intended:
> > > 
> > > drivers/infiniband/core/uverbs_main.c: In function 'rdma_umap_fault':
> > > drivers/infiniband/core/uverbs_main.c:898:28: error: 'struct vm_fault' 
> > > has no member named 'vm_start'
> > >vmf->page = ZERO_PAGE(vmf->vm_start);
> > > ^~
> > > diff --git a/drivers/infiniband/core/uverbs_main.c 
> > > b/drivers/infiniband/core/uverbs_main.c
> > > index 7843e89235c3..65fe89b3fa2d 100644
> > > +++ b/drivers/infiniband/core/uverbs_main.c
> > > @@ -895,7 +895,7 @@ static vm_fault_t rdma_umap_fault(struct vm_fault 
> > > *vmf)
> > >  
> > >   /* Read only pages can just use the system zero page. */
> > >   if (!(vmf->vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) {
> > > - vmf->page = ZERO_PAGE(vmf->vm_start);
> > > + vmf->page = ZERO_PAGE(vmf->vma->vm_start);
> > >   get_page(vmf->page);
> > >   return 0;
> > >   }
> > > 
> > 
> > Thanks Heiko, this looks right to me. 
> > 
> > I'm surprised to be seeing this at this point, these patches should
> > have been seen by 0 day for several days now, and they were in
> > linux-next already too..
> > 
> > Doug, can you send this to Linus today?
> 
> Yep.
> 

Done.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

[GIT PULL] Please pull rdma.git

2019-04-29 Thread Doug Ledford

Hi Linus,

As per the thread on our last pull request, this is the two line build
fix for s390 and mips.

The following changes since commit
2557fabd6e29f349bfa0ac13f38ac98aa5eafc74:

  RDMA/hns: Bugfix for mapping user db (2019-04-25 10:40:04 -0300)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-
linus

for you to fetch changes up to d79a26b99f5f40db6863b1973750fd1d134d99b4:

  RDMA/uverbs: Fix compilation error on s390 and mips platforms (2019-
04-29 11:47:55 -0400)


Pull request for 5.1-rc

- Build fix for the last pull request on s390


Leon Romanovsky (1):
  RDMA/uverbs: Fix compilation error on s390 and mips platforms

 drivers/infiniband/core/uverbs_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [GIT PULL] Please pull RDMA subsystem changes

2019-04-29 Thread Doug Ledford

On Mon, 2019-04-29 at 08:40 +, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 08:09:47AM +0200, Heiko Carstens wrote:
> > On Sun, Apr 28, 2019 at 11:52:12AM +, Jason Gunthorpe wrote:
> > > Hi Linus,
> > > 
> > > Third rc pull request
> > > 
> > > Nothing particularly special here. There is a small merge conflict
> > > with Adrea's mm_still_valid patches which is resolved as below:
> > ...
> > > Jason Gunthorpe (3):
> > >   RDMA/mlx5: Do not allow the user to write to the clock page
> > >   RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
> > >   RDMA/ucontext: Fix regression with disassociate
> > 
> > This doesn't compile. The patch below would fix it, but not sure if
> > this is what is intended:
> > 
> > drivers/infiniband/core/uverbs_main.c: In function 'rdma_umap_fault':
> > drivers/infiniband/core/uverbs_main.c:898:28: error: 'struct vm_fault' has 
> > no member named 'vm_start'
> >vmf->page = ZERO_PAGE(vmf->vm_start);
> > ^~
> > diff --git a/drivers/infiniband/core/uverbs_main.c 
> > b/drivers/infiniband/core/uverbs_main.c
> > index 7843e89235c3..65fe89b3fa2d 100644
> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -895,7 +895,7 @@ static vm_fault_t rdma_umap_fault(struct vm_fault *vmf)
> >  
> > /* Read only pages can just use the system zero page. */
> > if (!(vmf->vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) {
> > -   vmf->page = ZERO_PAGE(vmf->vm_start);
> > +   vmf->page = ZERO_PAGE(vmf->vma->vm_start);
> > get_page(vmf->page);
> >         return 0;
> > }
> > 
> 
> Thanks Heiko, this looks right to me. 
> 
> I'm surprised to be seeing this at this point, these patches should
> have been seen by 0 day for several days now, and they were in
> linux-next already too..
> 
> Doug, can you send this to Linus today?

Yep.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] cxgb4: fix undefined behavior in mem.c

2019-03-01 Thread Doug Ledford

On Thu, 2019-02-28 at 16:57 -0700, Shaobo He wrote:
> Good catch. But if we agree on that memory management functions are those 
> specified by the C standard, would it be OK to ignore so-called use after 
> free 
> or double free bugs for the kernel as C standard does not apply to kfree?

No, most kernel use-after-free bugs are real bugs.  This one might be
technically a bug by certain readings of the standard, but it's a non-
issue.  Real use-after-free bugs don't just look at the value of a local
stack variable to get the memory's old address (which is what this does,
and the same could be achieved and be totally in spec by doing this:

old_ptr = mhp;
kfree(mhp);
pr_debug("%p\n", old_ptr);)

Real use after free things would actually dereference the pointer to
either read or write from the old memory region.  That leads to data
corruption or kernel data leaks.  Plus, in this case, the purpose of
printing the literal value of mhp is simply to provide a unique name for
tracing purposes.  Since kfree() doesn't alter the local stack variable,
the name is still present in the local stack variable at the point you
call pr_debug().

It could be fixed.  It's not like this patch is wrong.  But I wouldn't
submit it this late in the -rc cycle, I'd just take it for next.

> On 2/28/19 4:33 PM, Bart Van Assche wrote:
> > On Thu, 2019-02-28 at 16:18 -0700, Shaobo He wrote:
> > > I can't afford a pdf version of the C standard. So I looked at the draft 
> > > version
> > > used in the link I put in the commit message. It says (in 6.2.4:2),
> > > 
> > > ```
> > > The lifetime of an object is the portion of program execution during which
> > > storage is guaranteed to be reserved for it. An object exists, has a 
> > > constant
> > > address, and retains its last-stored value throughout its lifetime. If an 
> > > object
> > > is referred to outside of its lifetime, the behavior is undefined. The 
> > > value of
> > > a pointer becomes indeterminate when the object it points to (or just 
> > > past)
> > > reaches the end of its lifetime.
> > > ```
> > > I couldn't find the definition of lifetime over a dynamically allocated 
> > > object
> > > in the draft of C standard. I refer to this link
> > > (https://en.cppreference.com/w/c/language/lifetime) which suggests that 
> > > the
> > > lifetime of an allocated object ends after the deallocation function is 
> > > called
> > > upon it.
> > > 
> > > I think maybe the more problematic issue is that the value of a freed 
> > > pointer is
> > > intermediate.
> > 
> > In another section of the same draft I found the following:
> > 
> > J.2 Undefined behavior [ ... ] The value of a pointer that refers to space
> > deallocated by a call to the free or realloc function is used (7.22.3).
> > 
> > Since the C standard explicitly refers to free() and realloc(), does that
> > mean that that statement about undefined behavior does not apply to munmap()
> > (for user space code) nor to kfree() (for kernel code)?
> > 
> > Bart.
> > 

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: linux-next: manual merge of the net-next tree with the rdma tree

2019-02-27 Thread Doug Ledford

On Wed, 2019-02-27 at 11:25 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   drivers/infiniband/hw/mlx4/Kconfig
> 
> between commit:
> 
>   6fa8f1afd337 ("IB/{core,uverbs}: Move ib_umem_xxx functions from ib_core to 
> ib_uverbs")
> 
> from the rdma tree and commit:
> 
>   f4b6bcc7002f ("net: devlink: turn devlink into a built-in")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 

Thanks Stephen, we'll add it to the (largish this release) list of
conflicts to bring to Linus' attention.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable

2019-02-22 Thread Doug Ledford



> On Feb 22, 2019, at 12:14 PM, Steve Wise  wrote:
> 
> 
> On 2/22/2019 10:36 AM, Jason Gunthorpe wrote:
>> On Sun, Feb 17, 2019 at 06:09:09PM +0100, Håkon Bugge wrote:
>>> During certain workloads, the default CM response timeout is too
>>> short, leading to excessive retries. Hence, make it configurable
>>> through sysctl. While at it, also make number of CM retries
>>> configurable.
>>> 
>>> The defaults are not changed.
>>> 
>>> Signed-off-by: Håkon Bugge 
>>> drivers/infiniband/core/cma.c | 51 ++-
>>> 1 file changed, 44 insertions(+), 7 deletions(-)
>>> 
>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>> index c43512752b8a..ce99e1cd1029 100644
>>> +++ b/drivers/infiniband/core/cma.c
>>> @@ -43,6 +43,7 @@
>>> #include 
>>> #include 
>>> #include 
>>> +#include 
>>> #include 
>>> 
>>> #include 
>>> @@ -68,13 +69,46 @@ MODULE_AUTHOR("Sean Hefty");
>>> MODULE_DESCRIPTION("Generic RDMA CM Agent");
>>> MODULE_LICENSE("Dual BSD/GPL");
>>> 
>>> -#define CMA_CM_RESPONSE_TIMEOUT 20
>>> #define CMA_QUERY_CLASSPORT_INFO_TIMEOUT 3000
>>> -#define CMA_MAX_CM_RETRIES 15
>>> #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24)
>>> #define CMA_IBOE_PACKET_LIFETIME 18
>>> #define CMA_PREFERRED_ROCE_GID_TYPE IB_GID_TYPE_ROCE_UDP_ENCAP
>>> 
>>> +#define CMA_DFLT_CM_RESPONSE_TIMEOUT 20
>>> +static int cma_cm_response_timeout = CMA_DFLT_CM_RESPONSE_TIMEOUT;
>>> +static int cma_cm_response_timeout_min = 8;
>>> +static int cma_cm_response_timeout_max = 31;
>>> +#undef CMA_DFLT_CM_RESPONSE_TIMEOUT
>>> +
>>> +#define CMA_DFLT_MAX_CM_RETRIES 15
>>> +static int cma_max_cm_retries = CMA_DFLT_MAX_CM_RETRIES;
>>> +static int cma_max_cm_retries_min = 1;
>>> +static int cma_max_cm_retries_max = 100;
>>> +#undef CMA_DFLT_MAX_CM_RETRIES
>>> +
>>> +static struct ctl_table_header *cma_ctl_table_hdr;
>>> +static struct ctl_table cma_ctl_table[] = {
>>> +   {
>>> +   .procname   = "cma_cm_response_timeout",
>>> +   .data   = &cma_cm_response_timeout,
>>> +   .maxlen = sizeof(cma_cm_response_timeout),
>>> +   .mode   = 0644,
>>> +   .proc_handler   = proc_dointvec_minmax,
>>> +   .extra1 = &cma_cm_response_timeout_min,
>>> +   .extra2 = &cma_cm_response_timeout_max,
>>> +   },
>>> +   {
>>> +   .procname   = "cma_max_cm_retries",
>>> +   .data   = &cma_max_cm_retries,
>>> +   .maxlen = sizeof(cma_max_cm_retries),
>>> +   .mode   = 0644,
>>> +   .proc_handler   = proc_dointvec_minmax,
>>> +   .extra1 = &cma_max_cm_retries_min,
>>> +   .extra2 = &cma_max_cm_retries_max,
>>> +   },
>>> +   { }
>>> +};
>> Is sysctl the right approach here? Should it be rdma tool instead?
>> 
>> Jason
> 
> There are other rdma sysctls currently:  net.rdma_ucm.max_backlog and
> net.iw_cm.default_backlog.  The core network stack seems to use sysctl
> and not ip tool to set basically globals.
> 
> To use rdma tool, we'd have to have some concept of a "module" object, I
> guess.  IE there's dev, link, and resource rdma tool objects currently.
> But these cma timeout settings are really not per dev, link, nor a
> resource.   Maybe we have just a "core" object:  rdma core set
> cma_max_cm_retries min 8 max 30.

I don’t know, I think you make a fairly good argument for leaving it as a 
sysctl.  We have infrastructure in place for admins to set persistent sysctl 
settings.  The per device/link settings need something different because link 
names and such can change.  Since these are globals, I’d leave them where they 
are.

--
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD









signature.asc
Description: Message signed with OpenPGP

Re: linux-next: build warning after merge of the rdma tree

2019-02-13 Thread Doug Ledford

On Thu, 2019-02-14 at 11:18 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the rdma tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
> 
> drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe':
> drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall 
> through [-Wimplicit-fallthrough=]
>hfi1_setup_tid_rdma_wqe(qp, wqe);
>^~~~
> drivers/infiniband/hw/hfi1/qp.c:329:2: note: here
>   case IB_QPT_UC:
>   ^~~~
> 
> Introduced by commit
> 
>   f1ab4efa6d32 ("IB/hfi1: Enable TID RDMA READ protocol")
> 
> I get this warning because I am building with -Wimplicit-fallthrough
> in attempt to catch new additions early.  The gcc warning can be turned
> off by adding a /* fall through */ comment at the point the fall through
> happens (assuming that the fall through is intentional).
> 

Thanks Stephen, we'll sort it and make an appropriate fixup patch.

Kaike?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-07 Thread Doug Ledford

I think I've finally wrapped my head around all of this.  Let's see if I
have this right:

* People are using filesystem DAX to expose byte addressable persistent
memory because putting a filesystem on the memory makes an easy way to
organize the data in the memory and share it between various processes. 
It's worth noting that this is not the only way to share this memory,
and arguably not even the best way, but it's what people are doing. 
However, to get byte level addressability on the remote side, we must
create files on the server side, mmap those files, then give a handle to
the memory region to the client side that the client then addresses on a
byte by byte basis.  This is because all of the normal kernel based
device sharing mechanisms are block based and don't provide byte level
addressability.

* People are asking for thin allocations, reflinks, deduplication,
whatever else because persistent memory is still small in terms of size
compared to the amount of data people want to put on it, so these
techniques stretch its usefulness.

* Because there is no kernel level mechanism for sharing byte
addressable memory, this only works with specific applications that have
been written to create files on byte addressable memory, mmap them, then
share them out via RDMA.  I bring this up because in the video linked in
this email, Oracle is gushing about how great this feature is.  But it's
important to understand that this only works because the Oracle
processes themselves are the filesystem sharing entity.  That means at
other points in this conversation where we've talked about the need for
forward progress, and non-ODP hardware, and the talk has come down to
sending SIGKILL to a process in order to free memory reservations, I
feel confident in saying that Oracle would *never* agree to this.  If
you kill an Oracle process to make forward progress, you are probably
also killing the very process that needed you to make progress in the
first place.  I'm pretty confident that Oracle will have no problem
what-so-ever saying that ODP capable hardware is a hard requirement for
using their software with DAX.

* So if Oracle is likely to demand ODP hardware, period, are there other
scenarios that might be more accepting of a more limited FS on top of
DAX that doesn't support reflinks and deduplication?  I can think of a
possible yes to that answer rather easily.  Message brokerage servers
(amqp, qpid) have strict requirements about receiving a message and then
making sure that it makes it once, and only once, to all subscribed
receivers.  A natural way of organizing this sort of thing is to create
a persistent ring buffer for incoming messages, one per each connecting
client that is sending messages.  Then a log file for each client you
are sending messages back out to.  Putting these files on persistent
memory and then mapping the ring buffer to the clients, and writing your
own transmission journals to the persistent memory, would allow the
program to be very robust in the face of a program or system crash. 
This sort of usage would not require any thin allocations, reflinks, or
other such features, and yet would still find the filesystem
organization useful.  Therefore I think the answer is yes, there are at
least some use cases that would find a less featureful filesystem that
works with persistent memory and RDMA but without ODP to be of value.

* Really though, as I said in my email to Tom Talpey, this entire
situation is simply screaming that we are doing DAX networking wrong. 
We shouldn't be writing the networking code once in every single
application that wants to do this.  If we had a memory segment that we
shared from server to client(s), and in that memory segment we
implemented a clustered filesystem, then applications would simply mmap
local files and be done with it.  If the file needed to move, the kernel
would update the mmap in the application, done.  If you ask me, it is
the attempt to do this the wrong way that is resulting in all this
heartache.  That said, for today, my recommendation would be to require
ODP hardware for XFS filesystem with the DAX option, but allow ext2
filesystems to mount DAX filesystems on non-ODP hardware, and go in and
modify the ext2 filesystem so that on DAX mounts, it disables hole punch
and ftrunctate any time they would result in the forced removal of an
established mmap.

On Wed, 2019-02-06 at 14:44 -0800, Dan Williams wrote:
> On Wed, Feb 6, 2019 at 2:25 PM Doug Ledford  wrote:
> > On Wed, 2019-02-06 at 15:08 -0700, Jason Gunthorpe wrote:
> > > On Thu, Feb 07, 2019 at 08:03:56AM +1100, Dave Chinner wrote:
> > > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote:
> > > > > On Wed, 6 Feb 2019, Doug Ledford wrote:
> > > > > 
> > > > > > > Most of the cases we want revoke for are things like truncate().
> >

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-07 Thread Doug Ledford

On Thu, 2019-02-07 at 10:41 -0500, Tom Talpey wrote:
> On 2/7/2019 10:37 AM, Doug Ledford wrote:
> > On Thu, 2019-02-07 at 10:28 -0500, Tom Talpey wrote:
> > > On 2/7/2019 10:04 AM, Chuck Lever wrote:
> > > > > On Feb 7, 2019, at 12:23 AM, Jason Gunthorpe  wrote:
> > > > > 
> > > > > On Thu, Feb 07, 2019 at 02:52:58PM +1100, Dave Chinner wrote:
> > > > > 
> > > > > > Requiring ODP capable hardware and applications that control RDMA
> > > > > > access to use file leases and be able to cancel/recall client side
> > > > > > delegations (like NFS is already able to do!) seems like a pretty
> > > > > 
> > > > > So, what happens on NFS if the revoke takes too long?
> > > > 
> > > > NFS distinguishes between "recall" and "revoke". Dave used "recall"
> > > > here, it means that the server recalls the client's delegation. If
> > > > the client doesn't respond, the server revokes the delegation
> > > > unilaterally and other users are allowed to proceed.
> > > 
> > > The SMB3 protocol has a similar "lease break" mechanism, btw.
> > > 
> > > SMB3 "push mode" has long-expected to allow DAX mapping of files
> > > only when an exclusive lease is held by the requesting client.
> > > The server may recall the lease if the DAX mapping needs to change.
> > > 
> > > Once local (MMU) and remote (RDMA) mappings are dropped, the
> > > client may re-request that the server reestablish them. No
> > > connection or process is terminated, and no data is silently lost.
> > 
> > Yeah, but you're referring to a situation where the communication agent
> > and the filesystem agent are one and the same and they work
> > cooperatively to resolve the issue.  With DAX under Linux, the
> > filesystem agent and the communication agent are separate, and right
> > now, to my knowledge, the filesystem agent doesn't tell the
> > communication agent about a broken lease, it want's to be able to do
> > things 100% transparently without any work on the communication agent's
> > part.  That works for ODP, but not for anything else.  If the filesystem
> > notified the communication agent of the need to drop the MMU region and
> > rebuild it, the communication agent could communicate that to the remote
> > host, and things would work.  But there's no POSIX message for "your
> > file is moving on media, redo your mmap".
> 
> Indeed, the MMU notifier and the filesystem need to be integrated.

And right now, the method of sharing this across the network is:

persistent memory in machine
  local filesystem supporting a DAX mount
custom application that knows how to mmap then rdma map files,
and can manage the connection long term

The point being that every single method of sharing this stuff is a one
off custom application (Oracle just being one).  I'm not really all that
thrilled about the idea of writing the same mmap/rdma map/oob-management 
code in every single app out there.  To me, this problem is screaming
for a more general purpose kernel solution, just like NVMe over Fabrics.
I'm thinking a clustered filesystem on top of a shared memory segment
between hosts is a much more natural fit.  Then applications just mmap
the files locally, and the kernel does the rest.

> I'm unmoved by the POSIX argument. This stuff didn't happen in 1990.
> 
> Tom.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-07 Thread Doug Ledford

On Thu, 2019-02-07 at 10:28 -0500, Tom Talpey wrote:
> On 2/7/2019 10:04 AM, Chuck Lever wrote:
> > 
> > > On Feb 7, 2019, at 12:23 AM, Jason Gunthorpe  wrote:
> > > 
> > > On Thu, Feb 07, 2019 at 02:52:58PM +1100, Dave Chinner wrote:
> > > 
> > > > Requiring ODP capable hardware and applications that control RDMA
> > > > access to use file leases and be able to cancel/recall client side
> > > > delegations (like NFS is already able to do!) seems like a pretty
> > > 
> > > So, what happens on NFS if the revoke takes too long?
> > 
> > NFS distinguishes between "recall" and "revoke". Dave used "recall"
> > here, it means that the server recalls the client's delegation. If
> > the client doesn't respond, the server revokes the delegation
> > unilaterally and other users are allowed to proceed.
> 
> The SMB3 protocol has a similar "lease break" mechanism, btw.
> 
> SMB3 "push mode" has long-expected to allow DAX mapping of files
> only when an exclusive lease is held by the requesting client.
> The server may recall the lease if the DAX mapping needs to change.
> 
> Once local (MMU) and remote (RDMA) mappings are dropped, the
> client may re-request that the server reestablish them. No
> connection or process is terminated, and no data is silently lost.

Yeah, but you're referring to a situation where the communication agent
and the filesystem agent are one and the same and they work
cooperatively to resolve the issue.  With DAX under Linux, the
filesystem agent and the communication agent are separate, and right
now, to my knowledge, the filesystem agent doesn't tell the
communication agent about a broken lease, it want's to be able to do
things 100% transparently without any work on the communication agent's
part.  That works for ODP, but not for anything else.  If the filesystem
notified the communication agent of the need to drop the MMU region and
rebuild it, the communication agent could communicate that to the remote
host, and things would work.  But there's no POSIX message for "your
file is moving on media, redo your mmap".

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 14:44 -0800, Dan Williams wrote:
> On Wed, Feb 6, 2019 at 2:25 PM Doug Ledford  wrote:
> > Can someone give me a real world scenario that someone is *actually*
> > asking for with this?
> 
> I'll point to this example. At the 6:35 mark Kodi talks about the
> Oracle use case for DAX + RDMA.
> 
> https://youtu.be/ywKPPIE8JfQ?t=395

I watched this, and I see that Oracle is all sorts of excited that their
storage machines can scale out, and they can access the storage and it
has basically no CPU load on the storage server while performing
millions of queries.  What I didn't hear in there is why DAX has to be
in the picture, or why Oracle couldn't do the same thing with a simple
memory region exported directly to the RDMA subsystem, or why reflink or
any of the other features you talk about are needed.  So, while these
things may legitimately be needed, this video did not tell me about
how/why they are needed, just that RDMA is really, *really* cool for
their use case and gets them 0% CPU utilization on their storage
servers.  I didn't watch the whole thing though.  Do they get into that
later on?  Do they get to that level of technical discussion, or is this
all higher level?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 14:44 -0800, Dan Williams wrote:
> On Wed, Feb 6, 2019 at 2:25 PM Doug Ledford  wrote:
> > On Wed, 2019-02-06 at 15:08 -0700, Jason Gunthorpe wrote:
> > > On Thu, Feb 07, 2019 at 08:03:56AM +1100, Dave Chinner wrote:
> > > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote:
> > > > > On Wed, 6 Feb 2019, Doug Ledford wrote:
> > > > > 
> > > > > > > Most of the cases we want revoke for are things like truncate().
> > > > > > > Shouldn't happen with a sane system, but we're trying to avoid 
> > > > > > > users
> > > > > > > doing awful things like being able to DMA to pages that are now 
> > > > > > > part of
> > > > > > > a different file.
> > > > > > 
> > > > > > Why is the solution revoke then?  Is there something besides 
> > > > > > truncate
> > > > > > that we have to worry about?  I ask because EBUSY is not currently
> > > > > > listed as a return value of truncate, so extending the API to 
> > > > > > include
> > > > > > EBUSY to mean "this file has pinned pages that can not be freed" is 
> > > > > > not
> > > > > > (or should not be) totally out of the question.
> > > > > > 
> > > > > > Admittedly, I'm coming in late to this conversation, but did I miss 
> > > > > > the
> > > > > > portion where that alternative was ruled out?
> > > > > 
> > > > > Coming in late here too but isnt the only DAX case that we are 
> > > > > concerned
> > > > > about where there was an mmap with the O_DAX option to do direct write
> > > > > though? If we only allow this use case then we may not have to worry 
> > > > > about
> > > > > long term GUP because DAX mapped files will stay in the physical 
> > > > > location
> > > > > regardless.
> > > > 
> > > > No, that is not guaranteed. Soon as we have reflink support on XFS,
> > > > writes will physically move the data to a new physical location.
> > > > This is non-negotiatiable, and cannot be blocked forever by a gup
> > > > pin.
> > > > 
> > > > IOWs, DAX on RDMA requires a) page fault capable hardware so that
> > > > the filesystem can move data physically on write access, and b)
> > > > revokable file leases so that the filesystem can kick userspace out
> > > > of the way when it needs to.
> > > 
> > > Why do we need both? You want to have leases for normal CPU mmaps too?
> > > 
> > > > Truncate is a red herring. It's definitely a case for revokable
> > > > leases, but it's the rare case rather than the one we actually care
> > > > about. We really care about making copy-on-write capable filesystems 
> > > > like
> > > > XFS work with DAX (we've got people asking for it to be supported
> > > > yesterday!), and that means DAX+RDMA needs to work with storage that
> > > > can change physical location at any time.
> > > 
> > > Then we must continue to ban longterm pin with DAX..
> > > 
> > > Nobody is going to want to deploy a system where revoke can happen at
> > > any time and if you don't respond fast enough your system either locks
> > > with some kind of FS meltdown or your process gets SIGKILL.
> > > 
> > > I don't really see a reason to invest so much design work into
> > > something that isn't production worthy.
> > > 
> > > It *almost* made sense with ftruncate, because you could architect to
> > > avoid ftruncate.. But just any FS op might reallocate? Naw.
> > > 
> > > Dave, you said the FS is responsible to arbitrate access to the
> > > physical pages..
> > > 
> > > Is it possible to have a filesystem for DAX that is more suited to
> > > this environment? Ie designed to not require block reallocation (no
> > > COW, no reflinks, different approach to ftruncate, etc)
> > 
> > Can someone give me a real world scenario that someone is *actually*
> > asking for with this?
> 
> I'll point to this example. At the 6:35 mark Kodi talks about the
> Oracle use case for DAX + RDMA.
> 
> https://youtu.be/ywKPPIE8JfQ?t=395

Thanks for the link, I'll review the panel.

> Currently the only way to

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 15:08 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 07, 2019 at 08:03:56AM +1100, Dave Chinner wrote:
> > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote:
> > > On Wed, 6 Feb 2019, Doug Ledford wrote:
> > > 
> > > > > Most of the cases we want revoke for are things like truncate().
> > > > > Shouldn't happen with a sane system, but we're trying to avoid users
> > > > > doing awful things like being able to DMA to pages that are now part 
> > > > > of
> > > > > a different file.
> > > > 
> > > > Why is the solution revoke then?  Is there something besides truncate
> > > > that we have to worry about?  I ask because EBUSY is not currently
> > > > listed as a return value of truncate, so extending the API to include
> > > > EBUSY to mean "this file has pinned pages that can not be freed" is not
> > > > (or should not be) totally out of the question.
> > > > 
> > > > Admittedly, I'm coming in late to this conversation, but did I miss the
> > > > portion where that alternative was ruled out?
> > > 
> > > Coming in late here too but isnt the only DAX case that we are concerned
> > > about where there was an mmap with the O_DAX option to do direct write
> > > though? If we only allow this use case then we may not have to worry about
> > > long term GUP because DAX mapped files will stay in the physical location
> > > regardless.
> > 
> > No, that is not guaranteed. Soon as we have reflink support on XFS,
> > writes will physically move the data to a new physical location.
> > This is non-negotiatiable, and cannot be blocked forever by a gup
> > pin.
> > 
> > IOWs, DAX on RDMA requires a) page fault capable hardware so that
> > the filesystem can move data physically on write access, and b)
> > revokable file leases so that the filesystem can kick userspace out
> > of the way when it needs to.
> 
> Why do we need both? You want to have leases for normal CPU mmaps too?
> 
> > Truncate is a red herring. It's definitely a case for revokable
> > leases, but it's the rare case rather than the one we actually care
> > about. We really care about making copy-on-write capable filesystems like
> > XFS work with DAX (we've got people asking for it to be supported
> > yesterday!), and that means DAX+RDMA needs to work with storage that
> > can change physical location at any time.
> 
> Then we must continue to ban longterm pin with DAX..
> 
> Nobody is going to want to deploy a system where revoke can happen at
> any time and if you don't respond fast enough your system either locks
> with some kind of FS meltdown or your process gets SIGKILL. 
> 
> I don't really see a reason to invest so much design work into
> something that isn't production worthy.
> 
> It *almost* made sense with ftruncate, because you could architect to
> avoid ftruncate.. But just any FS op might reallocate? Naw.
> 
> Dave, you said the FS is responsible to arbitrate access to the
> physical pages..
> 
> Is it possible to have a filesystem for DAX that is more suited to
> this environment? Ie designed to not require block reallocation (no
> COW, no reflinks, different approach to ftruncate, etc)

Can someone give me a real world scenario that someone is *actually*
asking for with this?  Are DAX users demanding xfs, or is it just the
filesystem of convenience?  Do they need to stick with xfs?  Are they
really trying to do COW backed mappings for the RDMA targets?  Or do
they want a COW backed FS but are perfectly happy if the specific RDMA
targets are *not* COW and are statically allocated?

> > And that's the real problem we need to solve here. RDMA has no trust
> > model other than "I'm userspace, I pinned you, trust me!". That's
> > not good enough for FS-DAX+RDMA
> 
> It is baked into the silicon, and I don't see much motion on this
> front right now. My best hope is that IOMMU PASID will get widely
> deployed and RDMA silicon will arrive that can use it. Seems to be
> years away, if at all.
> 
> At least we have one chip design that can work in a page faulting mode
> ..
> 
> Jason

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 13:04 -0800, Dan Williams wrote:
> On Wed, Feb 6, 2019 at 12:14 PM Doug Ledford  wrote:
> > On Wed, 2019-02-06 at 11:45 -0800, Dan Williams wrote:
> > > On Wed, Feb 6, 2019 at 10:52 AM Jason Gunthorpe  wrote:
> > > > On Wed, Feb 06, 2019 at 10:35:04AM -0800, Matthew Wilcox wrote:
> > > > 
> > > > > > Admittedly, I'm coming in late to this conversation, but did I miss 
> > > > > > the
> > > > > > portion where that alternative was ruled out?
> > > > > 
> > > > > That's my preferred option too, but the preponderance of opinion leans
> > > > > towards "We can't give people a way to make files un-truncatable".
> > > > 
> > > > I haven't heard an explanation why blocking ftruncate is worse than
> > > > giving people a way to break RDMA using process by calling ftruncate??
> > > > 
> > > > Isn't it exactly the same argument the other way?
> > > 
> > > If the
> > > RDMA application doesn't want it to happen, arrange for it by
> > > permissions or other coordination to prevent truncation,
> > 
> > I just argued the *exact* same thing, except from the other side: if you
> > want a guaranteed ability to truncate, then arrange the perms so the
> > RDMA or DAX capable things can't use the file.
> 
> That doesn't make sense. All we have to work with is rwx bits. It's
> possible to prevents writes / truncates. There's no permission bit for
> mmap, O_DIRECT and RDMA mappings, hence leases.

There's ownership.  What you can't open, you can't mmap or O_DIRECT or
whatever...

Regardless though, this is mostly moot as Dave's email makes it clear
the underlying issue that is the problem is not ftruncate, but other
things.
> > 

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 12:20 -0800, Matthew Wilcox wrote:
> On Wed, Feb 06, 2019 at 03:16:02PM -0500, Doug Ledford wrote:
> > On Wed, 2019-02-06 at 11:40 -0800, Matthew Wilcox wrote:
> > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote:
> > > > though? If we only allow this use case then we may not have to worry 
> > > > about
> > > > long term GUP because DAX mapped files will stay in the physical 
> > > > location
> > > > regardless.
> > > 
> > > ... except for truncate.  And now that I think about it, there was a
> > > desire to support hot-unplug which also needed revoke.
> > 
> > We already support hot unplug of RDMA devices.  But it is extreme.  How
> > does hot unplug deal with a program running from the device (something
> > that would have returned ETXTBSY)?
> 
> Not hot-unplugging the RDMA device but hot-unplugging an NV-DIMM.

Is an NV-DIMM the only thing we use DAX on?


-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 12:49 -0800, Matthew Wilcox wrote:
> On Wed, Feb 06, 2019 at 03:47:53PM -0500, Doug Ledford wrote:
> > On Wed, 2019-02-06 at 12:41 -0800, Matthew Wilcox wrote:
> > > On Wed, Feb 06, 2019 at 03:28:35PM -0500, Doug Ledford wrote:
> > > > On Wed, 2019-02-06 at 12:20 -0800, Matthew Wilcox wrote:
> > > > > Not hot-unplugging the RDMA device but hot-unplugging an NV-DIMM.
> 
> ^^^ I think you missed this line ^^^

Indeed, I did ;-)

> 
> > You said "now that I think about it, there was a desire to support hot-
> > unplug which also needed revoke".  For us, hot unplug is done at the
> > device level and means all connections must be torn down.  So in the
> > context of this argument, if people want revoke so DAX can migrate from
> > one NV-DIMM to another, ok.  But revoke does not help RDMA migrate.
> > 
> > If, instead, you mean that you want to support hot unplug of an NV-DIMM
> > that is currently the target of RDMA transfers, then I believe
> > Christoph's answer on this is correct.  It all boils down to which
> > device you are talking about doing the hot unplug on.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 12:41 -0800, Matthew Wilcox wrote:
> On Wed, Feb 06, 2019 at 03:28:35PM -0500, Doug Ledford wrote:
> > On Wed, 2019-02-06 at 12:20 -0800, Matthew Wilcox wrote:
> > > On Wed, Feb 06, 2019 at 03:16:02PM -0500, Doug Ledford wrote:
> > > > On Wed, 2019-02-06 at 11:40 -0800, Matthew Wilcox wrote:
> > > > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote:
> > > > > > though? If we only allow this use case then we may not have to 
> > > > > > worry about
> > > > > > long term GUP because DAX mapped files will stay in the physical 
> > > > > > location
> > > > > > regardless.
> > > > > 
> > > > > ... except for truncate.  And now that I think about it, there was a
> > > > > desire to support hot-unplug which also needed revoke.
> > > > 
> > > > We already support hot unplug of RDMA devices.  But it is extreme.  How
> > > > does hot unplug deal with a program running from the device (something
> > > > that would have returned ETXTBSY)?
> > > 
> > > Not hot-unplugging the RDMA device but hot-unplugging an NV-DIMM.
> > > 
> > > It's straightforward to migrate text pages from one DIMM to another;
> > > you remove the PTEs from the CPU's page tables, copy the data over and
> > > pagefaults put the new PTEs in place.  We don't have a way to do similar
> > > things to an RDMA device, do we?
> > 
> > We don't have a means of migration except in the narrowly scoped sense
> > of queue pair migration as defined by the IBTA and implemented on some
> > dual port IB cards.  This narrowly scoped migration even still involves
> > notification of the app.
> > 
> > Since there's no guarantee that any other port can connect to the same
> > machine as any port that's going away, it would always be a
> > disconnect/reconnect sequence in the app to support this, not an under
> > the covers migration.
> 
> I don't understand you.  We're not talking about migrating from one IB
> card to another, we're talking about changing the addresses that an STag
> refers to.

You said "now that I think about it, there was a desire to support hot-
unplug which also needed revoke".  For us, hot unplug is done at the
device level and means all connections must be torn down.  So in the
context of this argument, if people want revoke so DAX can migrate from
one NV-DIMM to another, ok.  But revoke does not help RDMA migrate.

If, instead, you mean that you want to support hot unplug of an NV-DIMM
that is currently the target of RDMA transfers, then I believe
Christoph's answer on this is correct.  It all boils down to which
device you are talking about doing the hot unplug on.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 12:20 -0800, Matthew Wilcox wrote:
> On Wed, Feb 06, 2019 at 03:16:02PM -0500, Doug Ledford wrote:
> > On Wed, 2019-02-06 at 11:40 -0800, Matthew Wilcox wrote:
> > > On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote:
> > > > though? If we only allow this use case then we may not have to worry 
> > > > about
> > > > long term GUP because DAX mapped files will stay in the physical 
> > > > location
> > > > regardless.
> > > 
> > > ... except for truncate.  And now that I think about it, there was a
> > > desire to support hot-unplug which also needed revoke.
> > 
> > We already support hot unplug of RDMA devices.  But it is extreme.  How
> > does hot unplug deal with a program running from the device (something
> > that would have returned ETXTBSY)?
> 
> Not hot-unplugging the RDMA device but hot-unplugging an NV-DIMM.
> 
> It's straightforward to migrate text pages from one DIMM to another;
> you remove the PTEs from the CPU's page tables, copy the data over and
> pagefaults put the new PTEs in place.  We don't have a way to do similar
> things to an RDMA device, do we?

We don't have a means of migration except in the narrowly scoped sense
of queue pair migration as defined by the IBTA and implemented on some
dual port IB cards.  This narrowly scoped migration even still involves
notification of the app.

Since there's no guarantee that any other port can connect to the same
machine as any port that's going away, it would always be a
disconnect/reconnect sequence in the app to support this, not an under
the covers migration.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 11:40 -0800, Matthew Wilcox wrote:
> On Wed, Feb 06, 2019 at 07:16:21PM +, Christopher Lameter wrote:
> > 
> > though? If we only allow this use case then we may not have to worry about
> > long term GUP because DAX mapped files will stay in the physical location
> > regardless.
> 
> ... except for truncate.  And now that I think about it, there was a
> desire to support hot-unplug which also needed revoke.

We already support hot unplug of RDMA devices.  But it is extreme.  How
does hot unplug deal with a program running from the device (something
that would have returned ETXTBSY)?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 11:45 -0800, Dan Williams wrote:
> On Wed, Feb 6, 2019 at 10:52 AM Jason Gunthorpe  wrote:
> > On Wed, Feb 06, 2019 at 10:35:04AM -0800, Matthew Wilcox wrote:
> > 
> > > > Admittedly, I'm coming in late to this conversation, but did I miss the
> > > > portion where that alternative was ruled out?
> > > 
> > > That's my preferred option too, but the preponderance of opinion leans
> > > towards "We can't give people a way to make files un-truncatable".
> > 
> > I haven't heard an explanation why blocking ftruncate is worse than
> > giving people a way to break RDMA using process by calling ftruncate??
> > 
> > Isn't it exactly the same argument the other way?
> 
> 
> If the
> RDMA application doesn't want it to happen, arrange for it by
> permissions or other coordination to prevent truncation,

I just argued the *exact* same thing, except from the other side: if you
want a guaranteed ability to truncate, then arrange the perms so the
RDMA or DAX capable things can't use the file.

>  but once the
> two conflicting / valid requests have arrived at the filesystem try to
> move the result forward to the user requested state not block and fail
> indefinitely.

Except this is wrong.  We already have ETXTBSY, and arguably it is much
easier for ETXTBSY to simply kill all of the running processes with
extreme prejudice.  But we don't do that.  We block indefinitely.  So,
no, there is no expectation that things will "move forward to the user
requested state".  Not when pages are in use by the kernel, and very
arguably pages being used for direct I/O are absolutely in use by the
kernel, then truncate blocks.

There is a major case of dissonant cognitive behavior here if the
syscall supports ETXTBSY, even though the ability to kill apps using the
text pages is trivial, but thinks supporting EBUSY is out of the
question.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 10:35 -0800, Matthew Wilcox wrote:
> On Wed, Feb 06, 2019 at 01:32:04PM -0500, Doug Ledford wrote:
> > On Wed, 2019-02-06 at 09:52 -0800, Matthew Wilcox wrote:
> > > On Wed, Feb 06, 2019 at 10:31:14AM -0700, Jason Gunthorpe wrote:
> > > > On Wed, Feb 06, 2019 at 10:50:00AM +0100, Jan Kara wrote:
> > > > 
> > > > > MM/FS asks for lease to be revoked. The revoke handler agrees with the
> > > > > other side on cancelling RDMA or whatever and drops the page pins. 
> > > > 
> > > > This takes a trip through userspace since the communication protocol
> > > > is entirely managed in userspace.
> > > > 
> > > > Most existing communication protocols don't have a 'cancel operation'.
> > > > 
> > > > > Now I understand there can be HW / communication failures etc. in
> > > > > which case the driver could either block waiting or make sure future
> > > > > IO will fail and drop the pins. 
> > > > 
> > > > We can always rip things away from the userspace.. However..
> > > > 
> > > > > But under normal conditions there should be a way to revoke the
> > > > > access. And if the HW/driver cannot support this, then don't let it
> > > > > anywhere near DAX filesystem.
> > > > 
> > > > I think the general observation is that people who want to do DAX &
> > > > RDMA want it to actually work, without data corruption, random process
> > > > kills or random communication failures.
> > > > 
> > > > Really, few users would actually want to run in a system where revoke
> > > > can be triggered.
> > > > 
> > > > So.. how can the FS/MM side provide a guarantee to the user that
> > > > revoke won't happen under a certain system design?
> > > 
> > > Most of the cases we want revoke for are things like truncate().
> > > Shouldn't happen with a sane system, but we're trying to avoid users
> > > doing awful things like being able to DMA to pages that are now part of
> > > a different file.
> > 
> > Why is the solution revoke then?  Is there something besides truncate
> > that we have to worry about?  I ask because EBUSY is not currently
> > listed as a return value of truncate, so extending the API to include
> > EBUSY to mean "this file has pinned pages that can not be freed" is not
> > (or should not be) totally out of the question.
> > 
> > Admittedly, I'm coming in late to this conversation, but did I miss the
> > portion where that alternative was ruled out?
> 
> That's my preferred option too, but the preponderance of opinion leans
> towards "We can't give people a way to make files un-truncatable".

Has anyone looked at the laundry list of possible failures truncate
already has?  Among others, ETXTBSY is already in the list, and it
allows someone to make a file un-truncatable by running it.  There's
EPERM for multiple failures.  In order for someone to make a file
untruncatable using this, they would have to have perms to the file
already anyway as well as perms to get the direct I/O pin.  I see no
reason why, if they have the perms to do it, that you don't allow them
to.  If you don't want someone else to make a file untruncatable that
you want to truncate, then don't share file perms with them.  What's the
difficulty here?  Really, creating this complex revoke thing to tear
down I/O when people really *don't* want that I/O getting torn down
seems like forcing a bad API on I/O to satisfy not doing what is an
entirely natural extension to an existing API.  You *shouldn't* have the
right to truncate a file that is busy, and ETXTBSY is a perfect example
of that, and an example of the API done right.  This other

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-06 Thread Doug Ledford

On Wed, 2019-02-06 at 09:52 -0800, Matthew Wilcox wrote:
> On Wed, Feb 06, 2019 at 10:31:14AM -0700, Jason Gunthorpe wrote:
> > On Wed, Feb 06, 2019 at 10:50:00AM +0100, Jan Kara wrote:
> > 
> > > MM/FS asks for lease to be revoked. The revoke handler agrees with the
> > > other side on cancelling RDMA or whatever and drops the page pins. 
> > 
> > This takes a trip through userspace since the communication protocol
> > is entirely managed in userspace.
> > 
> > Most existing communication protocols don't have a 'cancel operation'.
> > 
> > > Now I understand there can be HW / communication failures etc. in
> > > which case the driver could either block waiting or make sure future
> > > IO will fail and drop the pins. 
> > 
> > We can always rip things away from the userspace.. However..
> > 
> > > But under normal conditions there should be a way to revoke the
> > > access. And if the HW/driver cannot support this, then don't let it
> > > anywhere near DAX filesystem.
> > 
> > I think the general observation is that people who want to do DAX &
> > RDMA want it to actually work, without data corruption, random process
> > kills or random communication failures.
> > 
> > Really, few users would actually want to run in a system where revoke
> > can be triggered.
> > 
> > So.. how can the FS/MM side provide a guarantee to the user that
> > revoke won't happen under a certain system design?
> 
> Most of the cases we want revoke for are things like truncate().
> Shouldn't happen with a sane system, but we're trying to avoid users
> doing awful things like being able to DMA to pages that are now part of
> a different file.

Why is the solution revoke then?  Is there something besides truncate
that we have to worry about?  I ask because EBUSY is not currently
listed as a return value of truncate, so extending the API to include
EBUSY to mean "this file has pinned pages that can not be freed" is not
(or should not be) totally out of the question.

Admittedly, I'm coming in late to this conversation, but did I miss the
portion where that alternative was ruled out?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

[PULL REQUEST] Please pull rdma.git

2018-12-13 Thread Doug Ledford

Hi Linus,

We have 5 small fixes for this pull request.  One is a performance
regression, so not necessarily strictly a fix, but it was small and
reasonable and claimed to avoid thrashing in the scheduler, so I took
it.  The remaining are all legitimate fixes that match the "we take
fixes any time" criteria.

Here's the boilerplate:

The following changes since commit 7bca603a69c0c239654a8f0bcb99e1a60b30040c:

  RDMA/mlx5: Initialize return variable in case pagefault was skipped 
(2018-11-29 15:16:45 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus

for you to fetch changes up to 37fbd834b4e492dc41743830cbe435f35120abd8:

  IB/core: Fix oops in netdev_next_upper_dev_rcu() (2018-12-12 12:14:49 -0500)


Pull request for 4.20-rc

- One performance regression for hfi1
- One kasan fix for hfi1
- A couple mlx5 fixes
- A core oops fix


Artemy Kovalyov (1):
  IB/mlx5: Fix implicit ODP interrupted page fault

Mark Zhang (1):
  IB/core: Fix oops in netdev_next_upper_dev_rcu()

Michael J. Ruhl (1):
  IB/hfi1: Fix a latency issue for small messages

Piotr Stankiewicz (1):
  IB/hfi1: Fix an out-of-bounds access in get_hw_stats

Yishai Hadas (1):
  IB/mlx5: Block DEVX umem from the non applicable cases

 drivers/infiniband/core/roce_gid_mgmt.c | 3 +++
 drivers/infiniband/hw/hfi1/chip.c   | 3 ++-
 drivers/infiniband/hw/hfi1/hfi.h| 2 ++
 drivers/infiniband/hw/hfi1/qp.c | 7 +++
 drivers/infiniband/hw/hfi1/verbs.c  | 2 +-
 drivers/infiniband/hw/mlx5/devx.c   | 4 +++-
 drivers/infiniband/hw/mlx5/odp.c| 9 -
 7 files changed, 22 insertions(+), 8 deletions(-)



-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: linux-next: manual merge of the mlx5-next tree with the rdma tree

2018-12-04 Thread Doug Ledford

FWIW this will go away in a day or two.  I merged mlx5-next into rdma
for-next in order to take a series that depended on it.

On Wed, 2018-12-05 at 12:07 +1100, Stephen Rothwell wrote:
> Hi Leon,
> 
> Today's linux-next merge of the mlx5-next tree got a conflict in:
> 
>   drivers/infiniband/hw/mlx5/main.c
>   drivers/infiniband/hw/mlx5/mlx5_ib.h
> 
> between commit:
> 
>   36e235c88299 ("RDMA/mlx5: Use the uapi disablement APIs instead of code")
> 
> from the rdma tree and commit:
> 
>   81773ce5f07f ("RDMA/mlx5: Use stages for callback to setup and release 
> DEVX")
> 
> from the mlx5-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/uverbs: fix ptr_ret.cocci warnings

2018-12-03 Thread Doug Ledford

On Thu, 2018-11-29 at 16:37 -0700, Jason Gunthorpe wrote:
> On Wed, Nov 28, 2018 at 07:21:30AM +0800, kbuild test robot wrote:
> > From: kbuild test robot 
> > 
> > drivers/infiniband/core/uverbs_cmd.c:1095:1-3: WARNING: PTR_ERR_OR_ZERO can 
> > be used
> > 
> > 
> >  Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> > 
> > Generated by: scripts/coccinelle/api/ptr_ret.cocci
> > 
> > Fixes: 7106a9769715 ("RDMA/uverbs: Make write() handlers return 0 on 
> > success")
> > Signed-off-by: kbuild test robot 
> > ---
> 
> applied to for-next, thanks
> 
> Jason

This caused a conflict with your make write() handlers use a consistent
flow series, which I fixed up during git am run.  Just FYI in case you
want to check out the conflict spot as a double check (but it was a
simple fixup).

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/hns: Use 64-bit arithmetic instead of 32-bit

2018-10-18 Thread Doug Ledford

On Thu, 2018-10-18 at 14:01 +0300, Leon Romanovsky wrote:
> On Thu, Oct 18, 2018 at 10:02:58AM +0200, Gustavo A. R. Silva wrote:
> > Cast *max_num_sg* to u64 in order to give the compiler complete
> > information about the proper arithmetic to use.
> > 
> > Notice that such variable is used in a context that expects an
> > expression of type u64 (64 bits, unsigned) and the following
> > expression is currently being evaluated using 32-bit
> > arithmetic:
> 
> And what is wrong with that?
> Please fix static analyzer tool instead of fixing proper C code.

Judging on the static analyzer tool's message, I don't see anything
wrong with it.  The code contains a potential unintentional overflow
error.  The author might have been well aware of the overflow and not
cared and in that case this is valid C, but the analyzer has no way of
knowing that, so it flags it for review.  To silence the checker you
could either cast the arithmetic to u64, or cast length to u32.  Either
would clear up the ambiguity.  I guess I'm not seeing why you would
blame the static checker in this case, it did the best it is possible
for it to do.

> Thanks
> 
> > 
> > length = max_num_sg * page_size;
> > 
> > Addresses-Coverity-ID: 1474517 ("Unintentional integer overflow")
> > Signed-off-by: Gustavo A. R. Silva 
> > ---
> >  drivers/infiniband/hw/hns/hns_roce_mr.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c 
> > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > index 521ad2a..d479d5e 100644
> > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > @@ -1219,7 +1219,7 @@ struct ib_mr *hns_roce_alloc_mr(struct ib_pd *pd, 
> > enum ib_mr_type mr_type,
> > int ret;
> > 
> > page_size = 1 << (hr_dev->caps.pbl_buf_pg_sz + PAGE_SHIFT);
> > -   length = max_num_sg * page_size;
> > +   length = (u64)max_num_sg * page_size;
> > 
> > if (mr_type != IB_MR_TYPE_MEM_REG)
> > return ERR_PTR(-EINVAL);
> > --
> > 2.7.4
> > 

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v3 0/2] IB/mlx4: Enable debug print of SMPs and enhance output for MADs

2018-10-16 Thread Doug Ledford

On Tue, 2018-10-09 at 15:27 +0200, Håkon Bugge wrote:
> SMPs were not printed at all. Added printing of port number and TID to
> all MADs
> 
> 
> Håkon Bugge (2):
>   IB/mlx4: Enable debug print of SMPs
>   IB/mlx4: Add port and TID to MAD debug print
> 
>  drivers/infiniband/hw/mlx4/mad.c | 20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> --
> 
> v1 -> v2
> * Fixed incorrect format for printing the TID for the second patch
> v2 -> v3
>* Made v3 consistently for cover-letter and the two patches, my v2
>  was partly original submission plus a v2. That probably fooled
>  both the kbuild bot and patchwork
> 
> 2.14.3

Thanks, applied.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/ucma: Fix Spectre v1 vulnerability

2018-10-16 Thread Doug Ledford

On Tue, 2018-10-16 at 16:59 +0200, Gustavo A. R. Silva wrote:
> hdr.cmd can be indirectly controlled by user-space, hence leading to
> a potential exploitation of the Spectre variant 1 vulnerability.
> 
> This issue was detected with the help of Smatch:
> 
> drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential
> spectre issue 'ucma_cmd_table' [r] (local cap)
> 
> Fix this by sanitizing hdr.cmd before using it to index
> ucm_cmd_table.
> 
> Notice that given that speculation windows are large, the policy is
> to kill the speculation on the first load and not worry if it can be
> completed with a dependent load/store [1].
> 
> [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/ucm: Fix Spectre v1 vulnerability

2018-10-16 Thread Doug Ledford

On Tue, 2018-10-16 at 16:32 +0200, Gustavo A. R. Silva wrote:
> hdr.cmd can be indirectly controlled by user-space, hence leading to
> a potential exploitation of the Spectre variant 1 vulnerability.
> 
> This issue was detected with the help of Smatch:
> 
> drivers/infiniband/core/ucm.c:1127 ib_ucm_write() warn: potential
> spectre issue 'ucm_cmd_table' [r] (local cap)
> 
> Fix this by sanitizing hdr.cmd before using it to index
> ucm_cmd_table.
> 
> Notice that given that speculation windows are large, the policy is
> to kill the speculation on the first load and not worry if it can be
> completed with a dependent load/store [1].
> 
> [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH rdma-next 00/18] RDMA: Add support for ib_device_ops

2018-10-09 Thread Doug Ledford

On Tue, 2018-10-09 at 14:46 -0400, Jason Gunthorpe wrote:
> 
> 
> On Tue., Oct. 9, 2018, 2:44 p.m. Kamal Heib,  wrote:
> > On Tue, Oct 09, 2018 at 02:31:27PM -0400, Doug Ledford wrote:
> > > On Tue, 2018-10-09 at 19:27 +0300, Kamal Heib wrote:
> > > > This patchset introduce a new structure that will contain all the
> > > > infiniband device operations, the structure will be used by the
> > > > providers to initialize their supported operations. This patchset also
> > > > includes the required changes in the core and ulps to start using it.
> > > > 
> > > > Thanks,
> > > > Kamal
> > > 
> > > Hi Kamal,
> > > 
> > > Please repost this to linux-r...@vger.kernel.org as that's how this gets
> > > into patchworks.
> > >
> > 
> > Hi Doug,
> > 
> > Oops, My bad instead of picking the linux-rdma email from get_maintainer.pl 
> > output
> > I picked the linux-kernel email, I'll repost them soon and thanks for your 
> > reply.
> 
> Wait for comments before reposting such a large series..

It didn't go to linux-rdma at all, so many of the people that might
comment on it don't even know it exists, so I'm not sure how many
comments he ought to wait for ;-)

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH rdma-next 00/18] RDMA: Add support for ib_device_ops

2018-10-09 Thread Doug Ledford

On Tue, 2018-10-09 at 19:27 +0300, Kamal Heib wrote:
> This patchset introduce a new structure that will contain all the
> infiniband device operations, the structure will be used by the
> providers to initialize their supported operations. This patchset also
> includes the required changes in the core and ulps to start using it.
> 
> Thanks,
> Kamal

Hi Kamal,

Please repost this to linux-r...@vger.kernel.org as that's how this gets
into patchworks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/nes: Remove unnecessary parentheses

2018-09-21 Thread Doug Ledford

On Wed, 2018-09-19 at 20:29 -0700, Nathan Chancellor wrote:
> Clang warns when more than one set of parentheses are used in single
> conditional statements.
> 
> drivers/infiniband/hw/nes/nes_hw.c:1446:27: warning: equality comparison
> with extraneous parentheses [-Wparentheses-equality]
> } while ((temp_phy_data2 == temp_phy_data));
>   ~~~^~~~
> drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: remove extraneous
> parentheses around the comparison to silence this warning
> } while ((temp_phy_data2 == temp_phy_data));
>  ~   ^   ~
> drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: use '=' to turn this
> equality comparison into an assignment
> } while ((temp_phy_data2 == temp_phy_data));
>  ^~
>  =
> 
> Remove the unnecessary parentheses to silence this warning.
> 
> Reported-by: Nick Desaulniers 
> Signed-off-by: Nathan Chancellor 

Applied to for-next, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/cxgb4: remove redundant null pointer check before kfree_skb

2018-09-21 Thread Doug Ledford

On Thu, 2018-09-20 at 17:52 +0800, zhong jiang wrote:
> kfree_skb has taken the null pointer into account. hence it is safe
> to remove the redundant null pointer check before kfree_skb.
> 
> Signed-off-by: zhong jiang 

Applied to for-next, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/mlx4: Remove unnecessary parentheses

2018-09-21 Thread Doug Ledford

On Wed, 2018-09-19 at 20:32 -0700, Nathan Chancellor wrote:
> Clang warns when more than one set of parentheses are used in single
> conditional statements.
> 
> drivers/infiniband/hw/mlx4/mcg.c:676:16: warning: equality comparison
> with extraneous parentheses [-Wparentheses-equality]
> if ((method == IB_MGMT_METHOD_GET_RESP)) {
>  ~~~^~
> drivers/infiniband/hw/mlx4/mcg.c:676:16: note: remove extraneous
> parentheses around the comparison to silence this warning
> if ((method == IB_MGMT_METHOD_GET_RESP)) {
> ~   ^ ~
> drivers/infiniband/hw/mlx4/mcg.c:676:16: note: use '=' to turn this
> equality comparison into an assignment
> if ((method == IB_MGMT_METHOD_GET_RESP)) {
> ^~
> =
> 
> Remove the unnecessary parentheses to silence this warning.
> 
> Reported-by: Nick Desaulniers 
> Signed-off-by: Nathan Chancellor 

Applied to for-next, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] ucma: fix a use-after-free in ucma_resolve_ip()

2018-09-13 Thread Doug Ledford

On Wed, 2018-09-12 at 16:27 -0700, Cong Wang wrote:
> There is a race condition between ucma_close() and ucma_resolve_ip():
> 
> CPU0CPU1
> ucma_resolve_ip():  ucma_close():
> 
> ctx = ucma_get_ctx(file, cmd.id);
> 
> list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
> mutex_lock(&mut);
> idr_remove(&ctx_idr, ctx->id);
> mutex_unlock(&mut);
> ...
> mutex_lock(&mut);
> if (!ctx->closing) {
> mutex_unlock(&mut);
> rdma_destroy_id(ctx->cm_id);
> ...
> ucma_free_ctx(ctx);
> 
> ret = rdma_resolve_addr();
> ucma_put_ctx(ctx);
> 
> Before idr_remove(), ucma_get_ctx() could still find the ctx
> and after rdma_destroy_id(), rdma_resolve_addr() may still
> access id_priv pointer. Also, ucma_put_ctx() may use ctx after
> ucma_free_ctx() too.
> 
> ucma_close() should call ucma_put_ctx() too which tests the
> refcnt and waits for the last one releasing it. The similar
> pattern is already used by ucma_destroy_id().
> 
> Reported-and-tested-by: syzbot+da2591e115d57a9cb...@syzkaller.appspotmail.com
> Reported-by: syzbot+cfe3c1e8ef634ba89...@syzkaller.appspotmail.com
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Leon Romanovsky 
> Signed-off-by: Cong Wang 

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH -next] RDMA: Remove duplicated include from ib_addr.h

2018-09-13 Thread Doug Ledford

On Thu, 2018-09-13 at 18:58 +0300, Leon Romanovsky wrote:
> On Thu, Sep 13, 2018 at 09:47:52PM +0800, YueHaibing wrote:
> > Remove duplicated include.
> > 
> > Signed-off-by: YueHaibing 
> > ---
> >  include/rdma/ib_addr.h | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> 
> Thanks,
> Reviewed-by: Leon Romanovsky 

Thanks, applied to for-next.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] infiniband: nes: add unlikely() to assert()

2018-09-07 Thread Doug Ledford

On Wed, 2018-09-05 at 09:01 +0300, Leon Romanovsky wrote:
> On Fri, Aug 31, 2018 at 10:24:18PM +0300, Igor Stoppa wrote:
> > Typically the assert is expected to not fail.
> 
> This whole assert can be removed.
> 
> > 
> > Signed-off-by: Igor Stoppa 
> > Acked-by: Doug Ledford 
> 
> Most probably that I missed discussion, but anyway, does this
> "Acked-by" come from internal or external discussion?

This patch was part of a larger series on lkml.  In that context, I
acked it so that the series could be applied by whomever took it (it
didn't belong on rdma-list as a series since only one patch out of some
large number touched rdma files).  Now it is being resent as not part of
a series, but my ack was preserved.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 01/23] infiniband: nes: add unlikely() to assert()

2018-08-31 Thread Doug Ledford

On Fri, 2018-08-31 at 01:34 +0300, Igor Stoppa wrote:
> Typically the assert is expected to not fail.
> 
> Signed-off-by: Igor Stoppa 
> Cc: Chien Tung 
> Cc: Roland Dreier 
> Cc: Faisal Latif 
> Cc: Doug Ledford 
> Cc: Jason Gunthorpe 

Acked-by: Doug Ledford 

> ---
>  drivers/infiniband/hw/nes/nes.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
> index bedaa02749fb..d2d0098f38e0 100644
> --- a/drivers/infiniband/hw/nes/nes.h
> +++ b/drivers/infiniband/hw/nes/nes.h
> @@ -151,7 +151,7 @@ do { \
>  
>  #define assert(expr) \
>  do { \
> - if (!(expr)) { \
> + if (unlikely(!(expr))) { \
>   printk(KERN_ERR PFX "Assertion failed! %s, %s, %s, line %d\n", \
>      #expr, __FILE__, __func__, __LINE__); \
>   } \

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid (4)

2018-08-23 Thread Doug Ledford

On Thu, 2018-08-23 at 16:39 +, Parav Pandit wrote:
> > -Original Message-
> > From: Jason Gunthorpe 
> > Sent: Thursday, August 23, 2018 9:55 AM
> > To: Eric Biggers 
> > Cc: Doug Ledford ; linux-r...@vger.kernel.org;
> > dasaratharaman.chandramo...@intel.com; Leon Romanovsky
> > ; linux-kernel@vger.kernel.org; Mark Bloch
> > ; Moni Shoua ; Parav Pandit
> > ; syzkaller-b...@googlegroups.com; syzbot
> > 
> > Subject: Re: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid
> > (4)
> > 
> > On Wed, Aug 22, 2018 at 11:16:31PM -0700, Eric Biggers wrote:
> > > Hello RDMA / InfiniBand maintainers,
> > > 
> > > This is an RDMA bug and it still occurs on Linus' tree as of today
> > > (commit 815f0ddb346c1960).
> > > 
> > > I've also simplified the reproducer for it; see below after the original 
> > > report.
> > > Apparently it involves a race between RDMA_USER_CM_CMD_RESOLVE_IP
> > 
> > and
> > > RDMA_USER_CM_CMD_LISTEN.
> > 
> > That is an amazing reproducer!
> > 
> > I have a feeling this is the same cause as all the other syzkaller bugs in 
> > this code:
> > lack of any sane locking at all :\
> > 
> > We've talked about chucking a big lock around this whole thing, but nobody 
> > has
> > done it yet.. It isn't so simple.
> > 
> 
> I had some code in which reduces three locks (handler_lock, qp_mutex, 
> id_lock) to single mutex to protect the cm_id and protects every exported 
> symbol of rdmacm which works on cm_id.
> But not ready enough to post it as patch yet. Lot of tests required before I 
> get there and some refactor too before that.

Does it finally address the fact that the rdmacm code was written so
that it was always synchronous but RoCE src gid (I think that's what it
was, I'm typing this from long ago memory) lookup broke that assumption?

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency

2018-05-31 Thread Doug Ledford

On Wed, 2018-05-30 at 21:03 -0700, Greg Thelen wrote:
> On Wed, May 30, 2018 at 4:01 PM Jason Gunthorpe  wrote:
> 
> > On Thu, May 31, 2018 at 12:40:54AM +0200, Arnd Bergmann wrote:
> > > > On 5/30/2018 5:25 PM, Jason Gunthorpe wrote:
> > > > > On Wed, May 30, 2018 at 05:10:35PM -0500, Steve Wise wrote:
> > > > > > 
> > > > > > On 5/30/2018 5:04 PM, Jason Gunthorpe wrote:
> > > > > > > On Wed, May 30, 2018 at 11:58:18PM +0200, Arnd Bergmann wrote:
> > > > > > > > The newly added fill_res_ep_entry function fails to link if
> > > > > > > > CONFIG_INFINIBAND_ADDR_TRANS is not set:
> > > > > > > > 
> > > > > > > > drivers/infiniband/hw/cxgb4/restrack.o: In function
> 
> `fill_res_ep_entry':
> > > > > > > > restrack.c:(.text+0x3cc): undefined reference to 
> > > > > > > > `rdma_res_to_id'
> > > > > > > > restrack.c:(.text+0x3d0): undefined reference to `rdma_iw_cm_id'
> > > > > > > > 
> > > > > > > > This adds a Kconfig dependency for the driver.
> > > > > > > > 
> > > > > > > > Fixes: 116aeb887371 ("iw_cxgb4: provide detailed
> 
> provider-specific CM_ID information")
> > > > > > > > Signed-off-by: Arnd Bergmann 
> > > > > > > >  drivers/infiniband/hw/cxgb4/Kconfig | 1 +
> > > > > > > >  1 file changed, 1 insertion(+)
> > > > > > > 
> > > > > > > Oh, I think we need to solve this with maybe a header fill null
> 
> stub
> > > > > > > instead..
> > > > > > > 
> > > > > > > We don't want to disable drivers just because a user interface is
> > > > > > > disabled.
> > > > > > > 
> > > > > > 
> > > > > > Why does CONFIG_INFINIBAND_ADDR_TRANS disable building rdma_cm.ko?
> 
> That
> > > > > > is not correct.
> > > > > 
> > > > > That seems like a reasonable thing to do..
> > > > 
> > > > rdma_ucm.ko is for usermode users, rdma_cm.ko is for kernel users, and
> > > > is required for iwarp drivers.  It seems rdma_cm.ko is not being
> > > > compiled if ADDR_TRANS is not set.
> > I think the intention was to completely disable rdma-cm, including all
> > support for rx'ing remote packets? Greg?
> 
> Yes.  That's my goal when INFINIBAND_ADDR_TRANS is unset.
> 
> > If this is required for iwarp then Arnd's patch is probably the right
> > way to go..
> > Jason
> 
> Agreed.
> Acked-by: Greg Thelen 

If that's the case, then there should be a NOTE: in the Kconfig that
disabling the connection manager completely disables iWARP hardware.

I don't really think I like this approach though.  At a minimum if you
are going to make iWARP totally dependent on rdmacm, then there is zero
reason that iw_cm.o should be part of the obj-$(CONFIG_INFINIBAND)
Makefile recipe when ADDR_TRANS is disabled.

We can take this patch as a band-aid, but IMO it's either incomplete or
simply not the right solution.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/qedr: fix spelling mistake: "adrresses" -> "addresses"

2018-05-31 Thread Doug Ledford

On Wed, 2018-05-30 at 10:40 +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in DP_ERR error message
> 
> Signed-off-by: Colin Ian King 

Thanks, applied.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 6/6] infiniband: qplib_fp: Use dev_fmt

2018-05-15 Thread Doug Ledford

On Wed, 2018-05-09 at 08:15 -0700, Joe Perches wrote:
> Convert the embedded dev_ uses to use the new dev_fmt
> prefix and remove the prefix from the formats.
> 
> Miscellanea:
> 
> o Add missing terminating newlines to avoid possible interleaving
> o Realign arguments
> 
> Signed-off-by: Joe Perches 

I don't see any problem with this patch.  For the IB part:

Acked-by: Doug Ledford 

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'

2018-05-09 Thread Doug Ledford

On Tue, 2018-05-08 at 07:44 +0200, Christophe JAILLET wrote:
> The error handling path of 'c4iw_get_dma_mr()' does not free resources
> in the correct order.
> If an error occures, it can leak 'mhp->wr_waitp'.
> 
> Fixes: a3f12da0e99a ("iw_cxgb4: allocate wait object for each memory object")
> Signed-off-by: Christophe JAILLET 

Applied to for-rc, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH v4] nvmet,rxe: defer ip datagram sending to tasklet

2018-05-09 Thread Doug Ledford

On Tue, 2018-05-08 at 11:02 +0200, Alexandru Moise wrote:
> This addresses 3 separate problems:
> 
> 1. When using NVME over Fabrics we may end up sending IP
> packets in interrupt context, we should defer this work
> to a tasklet.
> 
> [   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 
> __local_bh_enable_ip+0x1f/0xa0
> [   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G
> W 4.17.0-rc3-ARCH+ #104
> [   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.11.0-20171110_100015-anatol 04/01/2014
> [   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
> [   50.949631] RSP: 0018:88009c183900 EFLAGS: 00010006
> [   50.951029] RAX: 80010403 RBX: 0200 RCX: 
> 0001
> [   50.952636] RDX:  RSI: 0200 RDI: 
> 817e04ec
> [   50.954278] RBP: 88009c183910 R08: 0001 R09: 
> 0614
> [   50.956000] R10: ea00021d5500 R11: 0001 R12: 
> 817e04ec
> [   50.957779] R13:  R14: 88009566f400 R15: 
> 8800956c7000
> [   50.959402] FS:  () GS:88009c18() 
> knlGS:
> [   50.961552] CS:  0010 DS:  ES:  CR0: 80050033
> [   50.963798] CR2: 55c4ec0ccac0 CR3: 02209001 CR4: 
> 000606e0
> [   50.966121] Call Trace:
> [   50.966845]  
> [   50.967497]  __dev_queue_xmit+0x62d/0x690
> [   50.968722]  dev_queue_xmit+0x10/0x20
> [   50.969894]  neigh_resolve_output+0x173/0x190
> [   50.971244]  ip_finish_output2+0x2b8/0x370
> [   50.972527]  ip_finish_output+0x1d2/0x220
> [   50.973785]  ? ip_finish_output+0x1d2/0x220
> [   50.975010]  ip_output+0xd4/0x100
> [   50.975903]  ip_local_out+0x3b/0x50
> [   50.976823]  rxe_send+0x74/0x120
> [   50.977702]  rxe_requester+0xe3b/0x10b0
> [   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
> [   50.980260]  rxe_do_task+0x85/0x100
> [   50.981386]  rxe_run_task+0x2f/0x40
> [   50.982470]  rxe_post_send+0x51a/0x550
> [   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
> [   50.985024]  __nvmet_req_complete+0x95/0xa0
> [   50.986287]  nvmet_req_complete+0x15/0x60
> [   50.987469]  nvmet_bio_done+0x2d/0x40
> [   50.988564]  bio_endio+0x12c/0x140
> [   50.989654]  blk_update_request+0x185/0x2a0
> [   50.990947]  blk_mq_end_request+0x1e/0x80
> [   50.991997]  nvme_complete_rq+0x1cc/0x1e0
> [   50.993171]  nvme_pci_complete_rq+0x117/0x120
> [   50.994355]  __blk_mq_complete_request+0x15e/0x180
> [   50.995988]  blk_mq_complete_request+0x6f/0xa0
> [   50.997304]  nvme_process_cq+0xe0/0x1b0
> [   50.998494]  nvme_irq+0x28/0x50
> [   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
> [   51.000986]  handle_irq_event_percpu+0x32/0x80
> [   51.002356]  handle_irq_event+0x3c/0x60
> [   51.003463]  handle_edge_irq+0x1c9/0x200
> [   51.004473]  handle_irq+0x23/0x30
> [   51.005363]  do_IRQ+0x46/0xd0
> [   51.006182]  common_interrupt+0xf/0xf
> [   51.007129]  
> 
> 2. Work must always be offloaded to tasklet for rxe_post_send_kernel()
> when using NVMEoF in order to solve lock ordering between neigh->ha_lock
> seqlock and the nvme queue lock:
> 
> [   77.833783]  Possible interrupt unsafe locking scenario:
> [   77.833783]
> [   77.835831]CPU0CPU1
> [   77.837129]
> [   77.838313]   lock(&(&n->ha_lock)->seqcount);
> [   77.839550]local_irq_disable();
> [   77.841377]lock(&(&nvmeq->q_lock)->rlock);
> [   77.843222]lock(&(&n->ha_lock)->seqcount);
> [   77.845178]   
> [   77.846298] lock(&(&nvmeq->q_lock)->rlock);
> [   77.847986]
> [   77.847986]  *** DEADLOCK ***
> 
> 3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:
> 
> [   47.634271]  Possible interrupt unsafe locking scenario:
> [   47.634271]
> [   47.636452]CPU0CPU1
> [   47.637861]
> [   47.639285]   lock(&(&sch->q.lock)->rlock);
> [   47.640654]local_irq_disable();
> [   47.642451]lock(&(&nvmeq->q_lock)->rlock);
> [   47.644521]lock(&(&sch->q.lock)->rlock);
> [   47.646480]   
> [   47.647263] lock(&(&nvmeq->q_lock)->rlock);
> [   47.648492]
> [   47.648492]  *** DEADLOCK ***
> 
> Using NVMEoF after this patch seems to finally be stable, without it,
> rxe eventually deadlocks the whole system and causes RCU stalls.
> 
> Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com>

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB: remove redundant INFINIBAND kconfig dependencies

2018-05-09 Thread Doug Ledford

On Thu, 2018-05-03 at 20:29 -0700, Greg Thelen wrote:
> INFINIBAND_ADDR_TRANS depends on INFINIBAND.  So there's no need for
> options which depend INFINIBAND_ADDR_TRANS to also depend on INFINIBAND.
> Remove the unnecessary INFINIBAND depends.
> 
> Signed-off-by: Greg Thelen 
> ---
>  drivers/infiniband/ulp/srpt/Kconfig | 2 +-
>  drivers/nvme/host/Kconfig   | 2 +-
>  drivers/nvme/target/Kconfig | 2 +-
>  drivers/staging/lustre/lnet/Kconfig | 2 +-
>  fs/cifs/Kconfig | 2 +-
>  net/9p/Kconfig  | 2 +-
>  net/rds/Kconfig | 2 +-
>  net/sunrpc/Kconfig  | 2 +-
>  8 files changed, 8 insertions(+), 8 deletions(-)

Thanks.  Since the original changes you submitted went through -rc, I'm
sending this cleanup there too.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: linux-next: manual merge of the rdma tree with the rdma-fixes tree

2018-05-02 Thread Doug Ledford

On Wed, 2018-05-02 at 13:22 +0300, Leon Romanovsky wrote:
> On Mon, Apr 30, 2018 at 08:55:35PM -0400, Doug Ledford wrote:
> > On Tue, 2018-05-01 at 10:10 +1000, Stephen Rothwell wrote:
> > > Hi all,
> > > 
> > > Today's linux-next merge of the rdma tree got a conflict in:
> > > 
> > >   drivers/infiniband/sw/rxe/rxe_resp.c
> > > 
> > > between commit:
> > > 
> > >   9fd4350ba895 ("B/rxe: avoid double kfree_skb")
> > > 
> > > from the rdma-fixes tree and commit:
> > > 
> > >   2e47350789eb ("IB/rxe: optimize the function duplicate_request")
> > > 
> > > from the rdma tree.
> > > 
> > > I fixed it up (I think - see below) and can carry the fix as necessary.
> > > This is now fixed as far as linux-next is concerned, but any non trivial
> > > conflicts should be mentioned to your upstream maintainer when your tree
> > > is submitted for merging.  You may also want to consider cooperating
> > > with the maintainer of the conflicting tree to minimise any particularly
> > > complex conflicts.
> > > 
> > 
> > We will probably merge the for-rc branch into the for-next branch in the
> > next few days, at which point we will do the conflict resolution
> > ourselves and your need to carry anything should drop out.
> 
> Isn't "rdma/wip/for-testing" branch intended for this?

Not really.  It's there to provide a pre-merged branch for people to
test.  But, I've rarely seen a release cycle where, *sometime*, we
didn't get a patch set in the for-next that depends on changes in the
for-rc area, and in that case, you need to merge for-rc into for-next. 
If we don't have that this cycle, then you're right, I won't merge for-
rc into for-next and for-testing will be the throwaway merge branch.  On
occasion, if the merge fixups needed between for-rc and for-next get too
difficult for a non-RDMA person to sus out, then we will do a merge of
for-rc into for-next simply so we can provide the right merge fixup, but
I doubt this merge fixup rises to that level.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 3/5] ib_srpt: depend on INFINIBAND_ADDR_TRANS

2018-05-01 Thread Doug Ledford

On Tue, 2018-05-01 at 03:08 +, Greg Thelen wrote:
> On Mon, Apr 30, 2018 at 4:35 PM Jason Gunthorpe  wrote:
> 
> > On Wed, Apr 25, 2018 at 03:33:39PM -0700, Greg Thelen wrote:
> > > INFINIBAND_SRPT code depends on INFINIBAND_ADDR_TRANS provided symbols.
> > > So declare the kconfig dependency.  This is necessary to allow for
> > > enabling INFINIBAND without INFINIBAND_ADDR_TRANS.
> > > 
> > > Signed-off-by: Greg Thelen 
> > > Cc: Tarick Bedeir 
> > >  drivers/infiniband/ulp/srpt/Kconfig | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/infiniband/ulp/srpt/Kconfig
> 
> b/drivers/infiniband/ulp/srpt/Kconfig
> > > index 31ee83d528d9..fb8b7182f05e 100644
> > > +++ b/drivers/infiniband/ulp/srpt/Kconfig
> > > @@ -1,6 +1,6 @@
> > >  config INFINIBAND_SRPT
> > >   tristate "InfiniBand SCSI RDMA Protocol target support"
> > > - depends on INFINIBAND && TARGET_CORE
> > > + depends on INFINIBAND && INFINIBAND_ADDR_TRANS && TARGET_CORE
> > Isn't INFINIBAND && INFINIBAND_ADDR_TRANS a bit redundant? Can't have
> > INFINIBAND_ADDR_TRANS without INFINIBAND.
> 
> By kconfig INFINIBAND_ADDR_TRANS depends on INFINIBAND.  So yes, it seems
> redundant.  I don't know if anyone has designs to break this dependency and
> allow for ADDR_TRANS without INFINIBAND.

No, not at the moment (and I'm not sure we ever would, it would only
happen if the subsystem itself became something like RDMA and INFINIBAND
specifically related only to the INFINIBAND link layer support, in which
case you might want to enable RoCE without INFINIBAND or something like
that, but no one has plans to do that as far as I'm aware).

>   Assuming not, I'd be willing to
> amend my series removing redundant INFINIBAND and a followup series to
> remove it from similar depends.  Though I'm not familiar with rdma dev tree
> lifecycle.  Is rdma/for-rc a throw away branch (akin to linux-next), or
> will it be merged into linus/master?  If throwaway, then we can amend
> its patches, otherwise followups will be needed.

Followups will be needed.

> Let me know what you'd prefer.  Thanks.
> 
> FYI from v4.17-rc3:
> drivers/staging/lustre/lnet/Kconfig:  depends on LNET && PCI && INFINIBAND
> && INFINIBAND_ADDR_TRANS
> net/9p/Kconfig:   depends on INET && INFINIBAND && INFINIBAND_ADDR_TRANS
> net/rds/Kconfig:  depends on RDS && INFINIBAND && INFINIBAND_ADDR_TRANS
> net/sunrpc/Kconfig:   depends on SUNRPC && INFINIBAND &&
> INFINIBAND_ADDR_TRANS

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/qedr: fix spelling mistake: "failes" -> "fails"

2018-05-01 Thread Doug Ledford

On Tue, 2018-05-01 at 09:25 +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in DP_ERR error message
> 
> Signed-off-by: Colin Ian King 

Thanks, applied.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/cxgb4: use skb_put_zero()/__skb_put_zero

2018-05-01 Thread Doug Ledford

On Sat, 2018-04-28 at 06:53 -0500, Steve Wise wrote:
> > -Original Message-
> > From: YueHaibing 
> > Sent: Saturday, April 28, 2018 2:31 AM
> > To: sw...@chelsio.com; dledf...@redhat.com; j...@ziepe.ca;
> > mo...@mellanox.com
> > Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org; YueHaibing
> > 
> > Subject: [PATCH] IB/cxgb4: use skb_put_zero()/__skb_put_zero
> > 
> > Use the recently introduced helper to replace the pattern of
> > skb_put_zero/__skb_put() && memset().
> > 
> > Signed-off-by: YueHaibing 
> 
> Looks ok.
> 
> Reviewed-by: Steve Wise 
> 
> 

Applied to for-next, thanks.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: linux-next: manual merge of the rdma tree with the rdma-fixes tree

2018-04-30 Thread Doug Ledford

On Tue, 2018-05-01 at 10:10 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the rdma tree got a conflict in:
> 
>   drivers/infiniband/sw/rxe/rxe_resp.c
> 
> between commit:
> 
>   9fd4350ba895 ("B/rxe: avoid double kfree_skb")
> 
> from the rdma-fixes tree and commit:
> 
>   2e47350789eb ("IB/rxe: optimize the function duplicate_request")
> 
> from the rdma tree.
> 
> I fixed it up (I think - see below) and can carry the fix as necessary.
> This is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 

We will probably merge the for-rc branch into the for-next branch in the
next few days, at which point we will do the conflict resolution
ourselves and your need to carry anything should drop out.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/core: Make ib_mad_client_id atomic

2018-04-30 Thread Doug Ledford

On Mon, 2018-04-30 at 08:49 -0600, Jason Gunthorpe wrote:
> On Mon, Apr 23, 2018 at 10:16:18PM +0300, jackm wrote:
> 
> > > > TIDs need to be globally unique on the entire machine.  
> > Jason, that is not exactly correct.
> 
> The expecation for /dev/umad users is that they all receive locally
> unique TID prefixes. The kernel may be OK to keep things port-specific
> but it is slightly breaking the API we are presenting to userspace to
> allow them to alias..
> 
> Jason

Would people be happier with this commit message then:

IB/core: Make ib_mad_client_id atomic

Currently, the kernel protects access to the agent ID allocator on a per
port basis using a spinlock, so it is impossible for two apps/threads on
the same port to get the same TID, but it is entirely possible for two
threads on different ports to end up with the same TID.  

As this can be confusing (regardless of it being legal according to the
IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID usage),
and as the rdma-core user space API for /dev/umad devices implies unique
TIDs even across ports, make the TID an atomic type so that no two
allocations, regardless of port number, will be the same.

Signed-off-by: Håkon Bugge 
Reviewed-by: Jack Morgenstein 
Reviewed-by: Ira Weiny 
Reviewed-by: Zhu Yanjun 
Signed-off-by: Doug Ledford 

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/core: Make ib_mad_client_id atomic

2018-04-27 Thread Doug Ledford

On Thu, 2018-04-26 at 20:51 +0200, Håkon Bugge wrote:
> > Jason is out this week.  I'll end up processing this one (probably later
> > today).  But I’ll fix up the commit message to suit my tastes when I do.
> 
> Thank you, Doug and Jack,

I reworded the commit message, let me know if you think I worded it
wrong:

commit 69f01b81539c62f3dd96f9f02138ad7b839a0c70 (HEAD -> k.o/wip/dl-for-rc)
Author: Håkon Bugge 
Date:   Wed Apr 18 16:24:50 2018 +0200

IB/core: Make ib_mad_client_id atomic

Currently, kernel protects access to the agent ID allocator on a per
port basis using a spinlock, so it is impossible for two apps/threads on
the same port to get the same TID, but it is entirely possible for two
threads on different ports to end up with the same TID.  As this can be
confusing (regardless of it being legal according to the IB Spec 1.3,
C13-18.1.1, in section 13.4.6.4 - TransactionID usage: "Then initiating
a new operation, MADHeader:TransactionID shall be set to such a value
that within that MAD the combination of TIG, SGID, and MgmtClass is
different from that of any other currently executing operation. If the
MAD does not have a GRH, its SLID is used in the combination in place of
an SGID." which guarantees we are legal because our different ports will
have different SGID/SLID creating a unique tuple even if the TIDs are
identical), and as we might want to open the TID allocator up to more
parallel usage later, make the TID an atomic type so that no two
allocations, regardless of port number, will be the same.

Signed-off-by: Håkon Bugge 
Reviewed-by: Jack Morgenstein 
Reviewed-by: Ira Weiny 
Reviewed-by: Zhu Yanjun 
    Signed-off-by: Doug Ledford 

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV

2018-04-27 Thread Doug Ledford

On Thu, 2018-04-26 at 12:02 +0800, Yanjun Zhu wrote:
> On 2018/4/26 11:52, Jianchao Wang wrote:
> > w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
> > will not be updated in update_wqe_psn, and the corresponding
> > wqe will not be acked in rxe_completer due to its last_psn is
> > zero. Finally, the other wqe will also not be able to be acked,
> > because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
> > is still there. This causes large amount of io timeout when
> > nvmeof is over rxe.
> > 
> > Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.
> > 
> > Signed-off-by: Jianchao Wang 
> 
> Reviewed-by: Zhu Yanjun 

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] RDMA/iwpm: fix memory leak on map_info

2018-04-27 Thread Doug Ledford

On Wed, 2018-04-25 at 17:24 +0100, Colin King wrote:
> From: Colin Ian King 
> 
> In the cases where iwpm_hash_bucket is NULL and where function
> get_mapinfo_hash_bucket returns NULL then the map_info is never added
> to hash_bucket_head and hence there is a leak of map_info. Fix this
> by nullifying hash_bucket_head and if that is null we know that
> that map_info was not added to hash_bucket_head and hence map_info
> should be free'd.
> 
> Detected by CoverityScan, CID#1222481 ("Resource Leak")
> 
> Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space 
> service")
> Signed-off-by: Colin Ian King 

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/ipoib: fix ipoib_start_xmit()'s return type

2018-04-27 Thread Doug Ledford

On Wed, 2018-04-25 at 18:35 +0300, Leon Romanovsky wrote:
> On Tue, Apr 24, 2018 at 03:15:47PM +0200, Luc Van Oostenryck wrote:
> > The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> > which is a typedef for an enum type, but the implementation in this
> > driver returns an 'int'.
> > 
> > Fix this by returning 'netdev_tx_t' in this driver too.
> > 
> > Signed-off-by: Luc Van Oostenryck 
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> 
> Thanks,
> Reviewed-by: Leon Romanovsky 

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IB/nes: fix nes_netdev_start_xmit()'s return type

2018-04-27 Thread Doug Ledford

On Wed, 2018-04-25 at 18:35 +0300, Leon Romanovsky wrote:
> On Tue, Apr 24, 2018 at 08:36:12PM +0300, Leon Romanovsky wrote:
> > On Tue, Apr 24, 2018 at 03:15:45PM +0200, Luc Van Oostenryck wrote:
> > > The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> > > which is a typedef for an enum type, but the implementation in this
> > > driver returns an 'int'.
> > > 
> > > Fix this by returning 'netdev_tx_t' in this driver too.
> > > 
> > > Signed-off-by: Luc Van Oostenryck 
> > > ---
> > >  drivers/infiniband/hw/nes/nes_nic.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > ipoib_start_xmit() needs the same fix.
> > 
> 
> Thanks,
> Reviewed-by: Leon Romanovsky 

Thanks, applied to for-rc.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] infiniband: hw: qib: Change return type to vm_fault_t

2018-04-27 Thread Doug Ledford

On Tue, 2018-04-17 at 20:04 +0530, Souptick Joarder wrote:
> Use new return type vm_fault_t for fault handler. For
> now, this is just documenting that the function returns
> a VM_FAULT value rather than an errno. Once all instances
> are converted, vm_fault_t will become a distinct type.
> 
> Reference id -> 1c8f422059ae ("mm: change return type to
> vm_fault_t")

Thanks, applied to for-next.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] infiniband: hw: hfi1: Change return type to vm_fault_t

2018-04-27 Thread Doug Ledford

On Tue, 2018-04-17 at 19:53 +0530, Souptick Joarder wrote:
> Use new return type vm_fault_t for fault handler. For
> now, this is just documenting that the function returns
> a VM_FAULT value rather than an errno. Once all instances
> are converted, vm_fault_t will become a distinct type.
> 
> Reference id -> 1c8f422059ae ("mm: change return type to
> vm_fault_t")
> 
> Signed-off-by: Souptick Joarder 

Thanks, applied to for-next.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 0/6] IB: make INFINIBAND_ADDR_TRANS configurable

2018-04-27 Thread Doug Ledford

On Thu, 2018-04-26 at 11:19 -0700, Greg Thelen wrote:
> This series allows for CONFIG_INFINIBAND without
> CONFIG_INFINIBAND_ADDR_TRANS (aka RDMA communication manager).
> Fuzzing has been finding fair number of CM bugs.
> So provide an option to disable it in systems which don't need it.
> 
> Changes since last posting (https://lkml.org/lkml/2018/4/25/1266):
> - added ("ib_srp: depend on INFINIBAND_ADDR_TRANS") patch

[ snip explanation of series ]

Thanks for taking the time to research this out.  Series applied.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

Re: [PATCH for-rc] uapi: Fix SPDX tags for files referring to the 'OpenIB.org' license

2018-04-23 Thread Doug Ledford

On Sun, 2018-04-22 at 21:08 -0400, David Miller wrote:
> From: Jason Gunthorpe 
> Date: Fri, 20 Apr 2018 09:49:10 -0600
> 
> > Based on discussion with Kate Stewart this license is not a
> > BSD-2-Clause, but is now formally identified as Linux-OpenIB
> > by SPDX.
> > 
> > The key difference between the licenses is in the 'warranty'
> > paragraph.
> > 
> > if_infiniband.h refers to the 'OpenIB.org' license, but
> > does not include the text, instead it links to an obsolete
> > web site that contains a license that matches the BSD-2-Clause
> > SPX. There is no 'three clause' version of the OpenIB.org
> > license.
> > 
> > Signed-off-by: Jason Gunthorpe 
> 
> Acked-by: David S. Miller 
> 

Thanks, applied.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: This is a digitally signed message part

1 2 3 4 5 6 >

1 - 100 of 502 matches

Mail list logo