Re: ASMedia ASM2142 USB host controller tries to DMA to address zero when doing bulk reads from multiple devices

2020-08-01 Thread Forest Crossman
On Wed, Jul 29, 2020 at 8:22 AM Oliver O'Halloran  wrote:
>
> On Tue, Jul 21, 2020 at 3:51 PM Forest Crossman  wrote:
> >
> > Hello, again!
> >
> > After fixing the issue in my previous thread using this patch[1], I
> > decided to do some stress-testing of the controller to make sure it
> > could handle my intended workloads and that there were no further DMA
> > address issues that would need to be fixed. Unfortunately, it looks
> > like there's still more work to be done: when I try to do long bulk
> > reads from multiple devices simultaneously, eventually the host
> > controller sends a DMA write to address zero, which then triggers EEH
> > in my POWER9 system, causing the controller card to get hotplug-reset,
> > which of course kills the disk-reading processes. For more details on
> > the EEH errors, you can see my kernel's EEH message log[2].
>
> Take the logged address with a grain of salt. If an error occurs while
> translating the DMA address the PHB logs all zeros as the "DMA
> Address" because it only keeps around the bits that it needs to fetch
> the next level of the TCE table. The EEH dump says the error is due to
> a TCE permission mis-match so odds the ASmedia controller is writing
> to an address that's already been DMA unmapped, hence the logged
> address being zeros.

Interesting, that's good to know. I saw that the RXE_TCE_FESR had the
"TCE Response Page Access Error" bit set, and had originally assumed
that just meant the DMA to address zero was triggering that error
since it wasn't in a mapped page, but after reading that bit's
description again I think I understand it now.

> Sorry, I probably should have mentioned that quirk in the last mail.
>
> > The results of the various tests I performed are listed below.
> >
> > Test results (all failures are due to DMA writes to address zero, all
> > hubs are USB 3.0/3.1 Gen1 only, and all disks are accessed via the
> > usb-storage driver):
> > - Reading simultaneously from two or more disks behind a hub connected
> > to one port on the host controller:
> >   - FAIL after 20-50 GB of data transferred for each device.
> > - Reading simultaneously from two disks, each connected directly to
> > one port on the host controller:
> >   - FAIL after about 800 GB of data transferred for each device.
> > - Reading from one disk behind a hub connected to one port on the host
> > controller:
> >   - OK for at least 2.7 TB of data transferred (I didn't test the
> > whole 8 TB disk).
> > - Writing simultaneously to two FL2000 dongles (using osmo-fl2k's
> > "fl2k_test"), each connected directly to one port on the host
> > controller:
> >   - OK, was able to write several dozen terabytes to each device over
> > the course of a little over 21 hours.
> >
> > Seeing how simultaneous writes to multiple devices and reads from
> > single devices both seem to work fine, I assume that means this is
> > being caused by some race condition in the host controller firmware
> > when it responds to multiple read requests.
>
> Most likely. It's possible it's a platform specific race with DMA
> map/unmap too, but I think we would be seeing similar issues with
> other devices if it was.

Yeah, I have several other xHCI controllers connected to this system,
and I've never experienced this issue with any of them. If the problem
was a POWER-specific DMA map/unmap race I would expect to be having
problems with those controllers as well.

> > I also assume we're not
> > going to be able to convince ASMedia to both fix the bug in their
> > firmware and release the details on how to flash it from Linux, so I
> > guess we'll just have to figure out how to make the driver talk to the
> > controller in a way that avoids triggering the bad DMA write. As
> > before, I decided to try a little kernel hacking of my own before
> > sending this email, and tried separately enabling the
> > XHCI_BROKEN_STREAMS and XHCI_ASMEDIA_MODIFY_FLOWCONTROL quirks in an
> > attempt to fix this. As you might expect since you're reading this
> > message, neither of those quirks fixed the issue, nor did they even
> > make the transfers last any longer before failing.
> >
> > So now I've reached the limits of my understanding, and I need some
> > help devising a fix. If anyone has any comments to that effect, or any
> > questions about my hardware configuration, testing methodology, etc.,
> > please don't hesitate to air them. Also, if anyone needs me to perform
> > additional tests, or collect more log information, I'd be happy to do
> > that as well.
>
> I started writing a tool a while ago to use the internal trace bus to
> log incoming TLPs. Something like that might allow you to get a better
> idea what the faulting access pattern is, but you would still need to
> find a way to mitigate the issue. I'm not all that familiar with USB3
> so I'm not much help on that front.

Oh, interesting, I remember seeing the trace registers in the PHB4
spec, but I wasn't sure how to access them without writing a 

[powerpc:merge] BUILD SUCCESS 3f68564f1f5aca55654fda237fc01495bf050ce9

2020-08-01 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 3f68564f1f5aca55654fda237fc01495bf050ce9  Automatic merge of 
'master', 'next' and 'fixes' (2020-07-31 22:52)

elapsed time: 1494m

configs tested: 69
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
xtensaxip_kc705_defconfig
arm   versatile_defconfig
arm ezx_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a005-20200731
i386 randconfig-a004-20200731
i386 randconfig-a006-20200731
i386 randconfig-a002-20200731
i386 randconfig-a001-20200731
i386 randconfig-a003-20200731
x86_64   randconfig-a015-20200731
x86_64   randconfig-a014-20200731
x86_64   randconfig-a016-20200731
x86_64   randconfig-a012-20200731
x86_64   randconfig-a013-20200731
x86_64   randconfig-a011-20200731
i386 randconfig-a016-20200731
i386 randconfig-a012-20200731
i386 randconfig-a014-20200731
i386 randconfig-a015-20200731
i386 randconfig-a011-20200731
i386 randconfig-a013-20200731
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()

2020-08-01 Thread Hari Bathini




On 01/08/20 3:48 pm, Mike Rapoport wrote:

On Thu, Jul 30, 2020 at 10:15:13PM +1000, Michael Ellerman wrote:

Mike Rapoport  writes:

From: Mike Rapoport 

fadump_reserve_crash_area() reserves memory from a specified base address
till the end of the RAM.

Replace iteration through the memblock.memory with a single call to
memblock_reserve() with appropriate  that will take care of proper memory

  ^
  parameters?

reservation.

Signed-off-by: Mike Rapoport 
---
  arch/powerpc/kernel/fadump.c | 20 +---
  1 file changed, 1 insertion(+), 19 deletions(-)


I think this looks OK to me, but I don't have a setup to test it easily.
I've added Hari to Cc who might be able to.

But I'll give you an ack in the hope that it works :)


Actually, I did some digging in the git log and the traversal was added
there on purpose by the commit b71a693d3db3 ("powerpc/fadump: exclude
memory holes while reserving memory in second kernel")


I was about to comment on the same :)
memblock_reserve() was being used until we ran into the issue talked 
about in the above commit...



Presuming this is still reqruired I'm going to drop this patch and will


Yeah, it is still required..


simply replace for_each_memblock() with for_each_mem_range() in v2.


Sounds right.

  

Acked-by: Michael Ellerman 



diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 78ab9a6ee6ac..2446a61e3c25 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1658,25 +1658,7 @@ int __init fadump_reserve_mem(void)
  /* Preserve everything above the base address */
  static void __init fadump_reserve_crash_area(u64 base)
  {
-   struct memblock_region *reg;
-   u64 mstart, msize;
-
-   for_each_memblock(memory, reg) {
-   mstart = reg->base;
-   msize  = reg->size;
-
-   if ((mstart + msize) < base)
-   continue;
-
-   if (mstart < base) {
-   msize -= (base - mstart);
-   mstart = base;
-   }
-
-   pr_info("Reserving %lluMB of memory at %#016llx for preserving crash 
data",
-   (msize >> 20), mstart);
-   memblock_reserve(mstart, msize);
-   }
+   memblock_reserve(base, memblock_end_of_DRAM() - base);
  }
  
  unsigned long __init arch_reserved_kernel_pages(void)

--
2.26.2




Thanks
Hari


Re: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()

2020-08-01 Thread Mike Rapoport
On Thu, Jul 30, 2020 at 10:15:13PM +1000, Michael Ellerman wrote:
> Mike Rapoport  writes:
> > From: Mike Rapoport 
> >
> > fadump_reserve_crash_area() reserves memory from a specified base address
> > till the end of the RAM.
> >
> > Replace iteration through the memblock.memory with a single call to
> > memblock_reserve() with appropriate  that will take care of proper memory
>  ^
>  parameters?
> > reservation.
> >
> > Signed-off-by: Mike Rapoport 
> > ---
> >  arch/powerpc/kernel/fadump.c | 20 +---
> >  1 file changed, 1 insertion(+), 19 deletions(-)
> 
> I think this looks OK to me, but I don't have a setup to test it easily.
> I've added Hari to Cc who might be able to.
> 
> But I'll give you an ack in the hope that it works :)

Actually, I did some digging in the git log and the traversal was added
there on purpose by the commit b71a693d3db3 ("powerpc/fadump: exclude
memory holes while reserving memory in second kernel")
Presuming this is still reqruired I'm going to drop this patch and will
simply replace for_each_memblock() with for_each_mem_range() in v2.
 
> Acked-by: Michael Ellerman 
> 
> 
> > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> > index 78ab9a6ee6ac..2446a61e3c25 100644
> > --- a/arch/powerpc/kernel/fadump.c
> > +++ b/arch/powerpc/kernel/fadump.c
> > @@ -1658,25 +1658,7 @@ int __init fadump_reserve_mem(void)
> >  /* Preserve everything above the base address */
> >  static void __init fadump_reserve_crash_area(u64 base)
> >  {
> > -   struct memblock_region *reg;
> > -   u64 mstart, msize;
> > -
> > -   for_each_memblock(memory, reg) {
> > -   mstart = reg->base;
> > -   msize  = reg->size;
> > -
> > -   if ((mstart + msize) < base)
> > -   continue;
> > -
> > -   if (mstart < base) {
> > -   msize -= (base - mstart);
> > -   mstart = base;
> > -   }
> > -
> > -   pr_info("Reserving %lluMB of memory at %#016llx for preserving 
> > crash data",
> > -   (msize >> 20), mstart);
> > -   memblock_reserve(mstart, msize);
> > -   }
> > +   memblock_reserve(base, memblock_end_of_DRAM() - base);
> >  }
> >  
> >  unsigned long __init arch_reserved_kernel_pages(void)
> > -- 
> > 2.26.2

-- 
Sincerely yours,
Mike.


Re: [PATCH v2 15/16] powerpc/powernv/sriov: Make single PE mode a per-BAR setting

2020-08-01 Thread Nathan Chancellor
On Wed, Jul 22, 2020 at 04:57:14PM +1000, Oliver O'Halloran wrote:
> Using single PE BARs to map an SR-IOV BAR is really a choice about what
> strategy to use when mapping a BAR. It doesn't make much sense for this to
> be a global setting since a device might have one large BAR which needs to
> be mapped with single PE windows and another smaller BAR that can be mapped
> with a regular segmented window. Make the segmented vs single decision a
> per-BAR setting and clean up the logic that decides which mode to use.
> 
> Signed-off-by: Oliver O'Halloran 
> ---
> v2: Dropped unused total_vfs variables in pnv_pci_ioda_fixup_iov_resources()
> Dropped bar_no from pnv_pci_iov_resource_alignment()
> Minor re-wording of comments.
> ---
>  arch/powerpc/platforms/powernv/pci-sriov.c | 131 ++---
>  arch/powerpc/platforms/powernv/pci.h   |  11 +-
>  2 files changed, 73 insertions(+), 69 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
> b/arch/powerpc/platforms/powernv/pci-sriov.c
> index ce8ad6851d73..76215d01405b 100644
> --- a/arch/powerpc/platforms/powernv/pci-sriov.c
> +++ b/arch/powerpc/platforms/powernv/pci-sriov.c
> @@ -146,21 +146,17 @@
>  static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  {
>   struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
> - const resource_size_t gate = phb->ioda.m64_segsize >> 2;
>   struct resource *res;
>   int i;
> - resource_size_t size, total_vf_bar_sz;
> + resource_size_t vf_bar_sz;
>   struct pnv_iov_data *iov;
> - int mul, total_vfs;
> + int mul;
>  
>   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
>   if (!iov)
>   goto disable_iov;
>   pdev->dev.archdata.iov_data = iov;
> -
> - total_vfs = pci_sriov_get_totalvfs(pdev);
>   mul = phb->ioda.total_pe_num;
> - total_vf_bar_sz = 0;
>  
>   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>   res = >resource[i + PCI_IOV_RESOURCES];
> @@ -173,50 +169,50 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> pci_dev *pdev)
>   goto disable_iov;
>   }
>  
> - total_vf_bar_sz += pci_iov_resource_size(pdev,
> - i + PCI_IOV_RESOURCES);
> + vf_bar_sz = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>  
>   /*
> -  * If bigger than quarter of M64 segment size, just round up
> -  * power of two.
> +  * Generally, one segmented M64 BAR maps one IOV BAR. However,
> +  * if a VF BAR is too large we end up wasting a lot of space.
> +  * If each VF needs more than 1/4 of the default m64 segment
> +  * then each VF BAR should be mapped in single-PE mode to reduce
> +  * the amount of space required. This does however limit the
> +  * number of VFs we can support.
>*
> -  * Generally, one M64 BAR maps one IOV BAR. To avoid conflict
> -  * with other devices, IOV BAR size is expanded to be
> -  * (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
> -  * segment size , the expanded size would equal to half of the
> -  * whole M64 space size, which will exhaust the M64 Space and
> -  * limit the system flexibility.  This is a design decision to
> -  * set the boundary to quarter of the M64 segment size.
> +  * The 1/4 limit is arbitrary and can be tweaked.
>*/
> - if (total_vf_bar_sz > gate) {
> - mul = roundup_pow_of_two(total_vfs);
> - dev_info(>dev,
> - "VF BAR Total IOV size %llx > %llx, roundup to 
> %d VFs\n",
> - total_vf_bar_sz, gate, mul);
> - iov->m64_single_mode = true;
> - break;
> - }
> - }
> + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
> + /*
> +  * On PHB3, the minimum size alignment of M64 BAR in
> +  * single mode is 32MB. If this VF BAR is smaller than
> +  * 32MB, but still too large for a segmented window
> +  * then we can't map it and need to disable SR-IOV for
> +  * this device.
> +  */
> + if (vf_bar_sz < SZ_32M) {
> + pci_err(pdev, "VF BAR%d: %pR can't be mapped in 
> single PE mode\n",
> + i, res);
> + goto disable_iov;
> + }
>  
> - for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> - res = >resource[i + PCI_IOV_RESOURCES];
> - if (!res->flags || res->parent)
> + iov->m64_single_mode[i] = true;
>   continue;
> + }
>  
> -