Re: ASMedia ASM2142 USB host controller tries to DMA to address zero when doing bulk reads from multiple devices
On Wed, Jul 29, 2020 at 8:22 AM Oliver O'Halloran wrote: > > On Tue, Jul 21, 2020 at 3:51 PM Forest Crossman wrote: > > > > Hello, again! > > > > After fixing the issue in my previous thread using this patch[1], I > > decided to do some stress-testing of the controller to make sure it > > could handle my intended workloads and that there were no further DMA > > address issues that would need to be fixed. Unfortunately, it looks > > like there's still more work to be done: when I try to do long bulk > > reads from multiple devices simultaneously, eventually the host > > controller sends a DMA write to address zero, which then triggers EEH > > in my POWER9 system, causing the controller card to get hotplug-reset, > > which of course kills the disk-reading processes. For more details on > > the EEH errors, you can see my kernel's EEH message log[2]. > > Take the logged address with a grain of salt. If an error occurs while > translating the DMA address the PHB logs all zeros as the "DMA > Address" because it only keeps around the bits that it needs to fetch > the next level of the TCE table. The EEH dump says the error is due to > a TCE permission mis-match so odds the ASmedia controller is writing > to an address that's already been DMA unmapped, hence the logged > address being zeros. Interesting, that's good to know. I saw that the RXE_TCE_FESR had the "TCE Response Page Access Error" bit set, and had originally assumed that just meant the DMA to address zero was triggering that error since it wasn't in a mapped page, but after reading that bit's description again I think I understand it now. > Sorry, I probably should have mentioned that quirk in the last mail. > > > The results of the various tests I performed are listed below. > > > > Test results (all failures are due to DMA writes to address zero, all > > hubs are USB 3.0/3.1 Gen1 only, and all disks are accessed via the > > usb-storage driver): > > - Reading simultaneously from two or more disks behind a hub connected > > to one port on the host controller: > > - FAIL after 20-50 GB of data transferred for each device. > > - Reading simultaneously from two disks, each connected directly to > > one port on the host controller: > > - FAIL after about 800 GB of data transferred for each device. > > - Reading from one disk behind a hub connected to one port on the host > > controller: > > - OK for at least 2.7 TB of data transferred (I didn't test the > > whole 8 TB disk). > > - Writing simultaneously to two FL2000 dongles (using osmo-fl2k's > > "fl2k_test"), each connected directly to one port on the host > > controller: > > - OK, was able to write several dozen terabytes to each device over > > the course of a little over 21 hours. > > > > Seeing how simultaneous writes to multiple devices and reads from > > single devices both seem to work fine, I assume that means this is > > being caused by some race condition in the host controller firmware > > when it responds to multiple read requests. > > Most likely. It's possible it's a platform specific race with DMA > map/unmap too, but I think we would be seeing similar issues with > other devices if it was. Yeah, I have several other xHCI controllers connected to this system, and I've never experienced this issue with any of them. If the problem was a POWER-specific DMA map/unmap race I would expect to be having problems with those controllers as well. > > I also assume we're not > > going to be able to convince ASMedia to both fix the bug in their > > firmware and release the details on how to flash it from Linux, so I > > guess we'll just have to figure out how to make the driver talk to the > > controller in a way that avoids triggering the bad DMA write. As > > before, I decided to try a little kernel hacking of my own before > > sending this email, and tried separately enabling the > > XHCI_BROKEN_STREAMS and XHCI_ASMEDIA_MODIFY_FLOWCONTROL quirks in an > > attempt to fix this. As you might expect since you're reading this > > message, neither of those quirks fixed the issue, nor did they even > > make the transfers last any longer before failing. > > > > So now I've reached the limits of my understanding, and I need some > > help devising a fix. If anyone has any comments to that effect, or any > > questions about my hardware configuration, testing methodology, etc., > > please don't hesitate to air them. Also, if anyone needs me to perform > > additional tests, or collect more log information, I'd be happy to do > > that as well. > > I started writing a tool a while ago to use the internal trace bus to > log incoming TLPs. Something like that might allow you to get a better > idea what the faulting access pattern is, but you would still need to > find a way to mitigate the issue. I'm not all that familiar with USB3 > so I'm not much help on that front. Oh, interesting, I remember seeing the trace registers in the PHB4 spec, but I wasn't sure how to access them without writing a
[powerpc:merge] BUILD SUCCESS 3f68564f1f5aca55654fda237fc01495bf050ce9
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: 3f68564f1f5aca55654fda237fc01495bf050ce9 Automatic merge of 'master', 'next' and 'fixes' (2020-07-31 22:52) elapsed time: 1494m configs tested: 69 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig xtensaxip_kc705_defconfig arm versatile_defconfig arm ezx_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig parisc allyesconfig s390defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386defconfig mips allyesconfig mips allmodconfig powerpc defconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a005-20200731 i386 randconfig-a004-20200731 i386 randconfig-a006-20200731 i386 randconfig-a002-20200731 i386 randconfig-a001-20200731 i386 randconfig-a003-20200731 x86_64 randconfig-a015-20200731 x86_64 randconfig-a014-20200731 x86_64 randconfig-a016-20200731 x86_64 randconfig-a012-20200731 x86_64 randconfig-a013-20200731 x86_64 randconfig-a011-20200731 i386 randconfig-a016-20200731 i386 randconfig-a012-20200731 i386 randconfig-a014-20200731 i386 randconfig-a015-20200731 i386 randconfig-a011-20200731 i386 randconfig-a013-20200731 riscvallyesconfig riscv allnoconfig riscv defconfig riscvallmodconfig x86_64 rhel x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 kexec --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
Re: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()
On 01/08/20 3:48 pm, Mike Rapoport wrote: On Thu, Jul 30, 2020 at 10:15:13PM +1000, Michael Ellerman wrote: Mike Rapoport writes: From: Mike Rapoport fadump_reserve_crash_area() reserves memory from a specified base address till the end of the RAM. Replace iteration through the memblock.memory with a single call to memblock_reserve() with appropriate that will take care of proper memory ^ parameters? reservation. Signed-off-by: Mike Rapoport --- arch/powerpc/kernel/fadump.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) I think this looks OK to me, but I don't have a setup to test it easily. I've added Hari to Cc who might be able to. But I'll give you an ack in the hope that it works :) Actually, I did some digging in the git log and the traversal was added there on purpose by the commit b71a693d3db3 ("powerpc/fadump: exclude memory holes while reserving memory in second kernel") I was about to comment on the same :) memblock_reserve() was being used until we ran into the issue talked about in the above commit... Presuming this is still reqruired I'm going to drop this patch and will Yeah, it is still required.. simply replace for_each_memblock() with for_each_mem_range() in v2. Sounds right. Acked-by: Michael Ellerman diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 78ab9a6ee6ac..2446a61e3c25 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1658,25 +1658,7 @@ int __init fadump_reserve_mem(void) /* Preserve everything above the base address */ static void __init fadump_reserve_crash_area(u64 base) { - struct memblock_region *reg; - u64 mstart, msize; - - for_each_memblock(memory, reg) { - mstart = reg->base; - msize = reg->size; - - if ((mstart + msize) < base) - continue; - - if (mstart < base) { - msize -= (base - mstart); - mstart = base; - } - - pr_info("Reserving %lluMB of memory at %#016llx for preserving crash data", - (msize >> 20), mstart); - memblock_reserve(mstart, msize); - } + memblock_reserve(base, memblock_end_of_DRAM() - base); } unsigned long __init arch_reserved_kernel_pages(void) -- 2.26.2 Thanks Hari
Re: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()
On Thu, Jul 30, 2020 at 10:15:13PM +1000, Michael Ellerman wrote: > Mike Rapoport writes: > > From: Mike Rapoport > > > > fadump_reserve_crash_area() reserves memory from a specified base address > > till the end of the RAM. > > > > Replace iteration through the memblock.memory with a single call to > > memblock_reserve() with appropriate that will take care of proper memory > ^ > parameters? > > reservation. > > > > Signed-off-by: Mike Rapoport > > --- > > arch/powerpc/kernel/fadump.c | 20 +--- > > 1 file changed, 1 insertion(+), 19 deletions(-) > > I think this looks OK to me, but I don't have a setup to test it easily. > I've added Hari to Cc who might be able to. > > But I'll give you an ack in the hope that it works :) Actually, I did some digging in the git log and the traversal was added there on purpose by the commit b71a693d3db3 ("powerpc/fadump: exclude memory holes while reserving memory in second kernel") Presuming this is still reqruired I'm going to drop this patch and will simply replace for_each_memblock() with for_each_mem_range() in v2. > Acked-by: Michael Ellerman > > > > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c > > index 78ab9a6ee6ac..2446a61e3c25 100644 > > --- a/arch/powerpc/kernel/fadump.c > > +++ b/arch/powerpc/kernel/fadump.c > > @@ -1658,25 +1658,7 @@ int __init fadump_reserve_mem(void) > > /* Preserve everything above the base address */ > > static void __init fadump_reserve_crash_area(u64 base) > > { > > - struct memblock_region *reg; > > - u64 mstart, msize; > > - > > - for_each_memblock(memory, reg) { > > - mstart = reg->base; > > - msize = reg->size; > > - > > - if ((mstart + msize) < base) > > - continue; > > - > > - if (mstart < base) { > > - msize -= (base - mstart); > > - mstart = base; > > - } > > - > > - pr_info("Reserving %lluMB of memory at %#016llx for preserving > > crash data", > > - (msize >> 20), mstart); > > - memblock_reserve(mstart, msize); > > - } > > + memblock_reserve(base, memblock_end_of_DRAM() - base); > > } > > > > unsigned long __init arch_reserved_kernel_pages(void) > > -- > > 2.26.2 -- Sincerely yours, Mike.
Re: [PATCH v2 15/16] powerpc/powernv/sriov: Make single PE mode a per-BAR setting
On Wed, Jul 22, 2020 at 04:57:14PM +1000, Oliver O'Halloran wrote: > Using single PE BARs to map an SR-IOV BAR is really a choice about what > strategy to use when mapping a BAR. It doesn't make much sense for this to > be a global setting since a device might have one large BAR which needs to > be mapped with single PE windows and another smaller BAR that can be mapped > with a regular segmented window. Make the segmented vs single decision a > per-BAR setting and clean up the logic that decides which mode to use. > > Signed-off-by: Oliver O'Halloran > --- > v2: Dropped unused total_vfs variables in pnv_pci_ioda_fixup_iov_resources() > Dropped bar_no from pnv_pci_iov_resource_alignment() > Minor re-wording of comments. > --- > arch/powerpc/platforms/powernv/pci-sriov.c | 131 ++--- > arch/powerpc/platforms/powernv/pci.h | 11 +- > 2 files changed, 73 insertions(+), 69 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c > b/arch/powerpc/platforms/powernv/pci-sriov.c > index ce8ad6851d73..76215d01405b 100644 > --- a/arch/powerpc/platforms/powernv/pci-sriov.c > +++ b/arch/powerpc/platforms/powernv/pci-sriov.c > @@ -146,21 +146,17 @@ > static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) > { > struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus); > - const resource_size_t gate = phb->ioda.m64_segsize >> 2; > struct resource *res; > int i; > - resource_size_t size, total_vf_bar_sz; > + resource_size_t vf_bar_sz; > struct pnv_iov_data *iov; > - int mul, total_vfs; > + int mul; > > iov = kzalloc(sizeof(*iov), GFP_KERNEL); > if (!iov) > goto disable_iov; > pdev->dev.archdata.iov_data = iov; > - > - total_vfs = pci_sriov_get_totalvfs(pdev); > mul = phb->ioda.total_pe_num; > - total_vf_bar_sz = 0; > > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > res = >resource[i + PCI_IOV_RESOURCES]; > @@ -173,50 +169,50 @@ static void pnv_pci_ioda_fixup_iov_resources(struct > pci_dev *pdev) > goto disable_iov; > } > > - total_vf_bar_sz += pci_iov_resource_size(pdev, > - i + PCI_IOV_RESOURCES); > + vf_bar_sz = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); > > /* > - * If bigger than quarter of M64 segment size, just round up > - * power of two. > + * Generally, one segmented M64 BAR maps one IOV BAR. However, > + * if a VF BAR is too large we end up wasting a lot of space. > + * If each VF needs more than 1/4 of the default m64 segment > + * then each VF BAR should be mapped in single-PE mode to reduce > + * the amount of space required. This does however limit the > + * number of VFs we can support. >* > - * Generally, one M64 BAR maps one IOV BAR. To avoid conflict > - * with other devices, IOV BAR size is expanded to be > - * (total_pe * VF_BAR_size). When VF_BAR_size is half of M64 > - * segment size , the expanded size would equal to half of the > - * whole M64 space size, which will exhaust the M64 Space and > - * limit the system flexibility. This is a design decision to > - * set the boundary to quarter of the M64 segment size. > + * The 1/4 limit is arbitrary and can be tweaked. >*/ > - if (total_vf_bar_sz > gate) { > - mul = roundup_pow_of_two(total_vfs); > - dev_info(>dev, > - "VF BAR Total IOV size %llx > %llx, roundup to > %d VFs\n", > - total_vf_bar_sz, gate, mul); > - iov->m64_single_mode = true; > - break; > - } > - } > + if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) { > + /* > + * On PHB3, the minimum size alignment of M64 BAR in > + * single mode is 32MB. If this VF BAR is smaller than > + * 32MB, but still too large for a segmented window > + * then we can't map it and need to disable SR-IOV for > + * this device. > + */ > + if (vf_bar_sz < SZ_32M) { > + pci_err(pdev, "VF BAR%d: %pR can't be mapped in > single PE mode\n", > + i, res); > + goto disable_iov; > + } > > - for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > - res = >resource[i + PCI_IOV_RESOURCES]; > - if (!res->flags || res->parent) > + iov->m64_single_mode[i] = true; > continue; > + } > > -