Re:[PATCH 00/13] Modernize Loongson64 Machine
Hi, Jiaxun, 1, To describe CPU I prefer "loongson" to "ls" because "ls" is confusing, and in future we will use ls2h/ls7a to describe Loongson's bridge. 2, I think it is better to use loongson64c/loongson64g than loongson2ef/loongson64. As we disscussed, we will use PRID_IMP_LOONGSON_64C/PRID_IMP_LOONGSON_64G to describe 0x6300/0xc000. Huacai -- Original -- From: "Jiaxun Yang";Date: Tue, Aug 27, 2019 04:52 PMTo: "linux-mips"; Cc: "chenhc"; "paul.burton"; "tglx"; "jason"; "maz"; "linux-kernel"; "robh+dt"; "mark.rutland"; "devicetree"; "Jiaxun Yang"; Subject: [PATCH 00/13] Modernize Loongson64 Machine Loongson have a long history of contributing their code to mainline kernel. However, it seems like recent years, they are focusing on maintain a kernel by themselves rather than contribute there code to the community. Kernel is progress rapidly too. Their code slept in mainline for a long peroid without proper maintainance and became outdated. This patchset brings modern DeviceTree and irqchip support to the Loongson64 machine, and leaves Loongson 2e/f alone since they are too legacy to touch. Jiaxun Yang (13): MIPS: Loongson64: Rename CPU TYPES MIPS: Loongson64: Sepreate loongson2ef/loongson64 code MAINTAINERS: Fix entries for new loongson64 path irqchip: Add driver for Loongson-3 I/O interrupt controller dt-bindings: interrupt-controller: Add Loongson-3 IOINTC irqchip: Add driver for Loongson-3 HyperTransport interrupt controller dt-bindings: interrupt-controller: Add Loongson-3 HTINTC irqchip: i8259: Add plat-poll support irqchip: mips-cpu: Convert to simple domain MIPS: Loongson64: Drop legacy IRQ code dt-bindings: mips: Add loongson cpus & boards MIPS: Loongson64: Add generic dts MIPS: Loongson64: Load built-in dtbs .../loongson,ls3-htintc.yaml | 53 + .../loongson,ls3-iointc.yaml | 61 + .../bindings/mips/loongson/cpus.yaml | 38 +++ .../bindings/mips/loongson/devices.yaml | 64 ++ MAINTAINERS | 9 +- arch/mips/Kbuild.platforms| 1 + arch/mips/Kconfig | 83 +-- arch/mips/boot/dts/Makefile | 1 + arch/mips/boot/dts/loongson/Makefile | 8 + arch/mips/boot/dts/loongson/ls3-2nodes.dtsi | 8 + arch/mips/boot/dts/loongson/ls3-4nodes.dtsi | 15 ++ arch/mips/boot/dts/loongson/ls3-cpus.dtsi | 150 arch/mips/boot/dts/loongson/ls3-gs464.dtsi| 18 ++ arch/mips/boot/dts/loongson/ls3-gs464e.dtsi | 18 ++ .../boot/dts/loongson/ls3-rs780e-pch.dtsi | 35 +++ arch/mips/boot/dts/loongson/ls3a-package.dtsi | 59 + .../boot/dts/loongson/ls3a1000_780e_1way.dts | 12 + .../boot/dts/loongson/ls3a1000_780e_2way.dts | 13 ++ .../boot/dts/loongson/ls3a1000_780e_4way.dts | 13 ++ .../boot/dts/loongson/ls3a2000_780e_1way.dts | 12 + .../boot/dts/loongson/ls3a2000_780e_2way.dts | 13 ++ .../boot/dts/loongson/ls3a2000_780e_4way.dts | 13 ++ .../boot/dts/loongson/ls3a3000_780e_1way.dts | 12 + .../boot/dts/loongson/ls3a3000_780e_2way.dts | 13 ++ .../boot/dts/loongson/ls3a3000_780e_4way.dts | 13 ++ arch/mips/boot/dts/loongson/ls3b-package.dtsi | 59 + .../mips/boot/dts/loongson/ls3b_780e_1way.dts | 13 ++ .../mips/boot/dts/loongson/ls3b_780e_2way.dts | 13 ++ arch/mips/include/asm/bootinfo.h | 1 - arch/mips/include/asm/cop2.h | 2 +- arch/mips/include/asm/cpu-type.h | 6 +- arch/mips/include/asm/cpu.h | 4 +- arch/mips/include/asm/hazards.h | 2 +- arch/mips/include/asm/io.h| 2 +- arch/mips/include/asm/irqflags.h | 2 +- .../mach-loongson2ef/cpu-feature-overrides.h | 45 .../cs5536/cs5536.h | 0 .../cs5536/cs5536_mfgpt.h | 0 .../cs5536/cs5536_pci.h | 0 .../cs5536/cs5536_vsm.h | 0 .../loongson2ef.h}| 31 +-- .../machine.h | 6 - .../mc146818rtc.h | 5 +- .../mem.h | 6 +- arch/mips/include/asm/mach-loongson2ef/pci.h | 43 .../include/asm/mach-loongson2ef/spaces.h | 10 + .../asm/mach-loongson64/builtin_dtbs.h| 26 +++ .../mach-loongson64/cpu-feature-overrides.h | 3 - arch/mips/include/asm/mach-loongson64/irq.h | 6 +- .../asm/mach-loongson64/kernel-entry-init.h | 74 -- .../include/asm/mach-loongson64/loongson64.h | 50 .../mips/include/asm/mach-loongson64/mmzone.h | 16 -- arch/mips/include/asm/mach-loongson64/pci.h | 41 +--- .../include/asm/mach-loongson64/workarounds.h | 4 +- arch/mips/include/asm/module.h| 8 +- arch/mips/include/asm/pgtable-bits.h | 2 +-
Re:Re:Re: Re:[PATCH 4.9 81/96] MIPS:Loongson: Introduce and use loongson_llsc_mb()
Hi, Greg, Patch for 4.9 (and below 4.9) is here: https://patchwork.linux-mips.org/patch/21375/ Huacai -- Original -- From: "陈华才"; Date: Thu, Mar 14, 2019 06:55 AM To: "gregkh"; Cc: "linux-kernel"; "stable"; "huangpei"; "Paul Burton"; "Ralf Baechle"; "ambrosehua"; "Steven J . Hill"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "Li Xuefeng"; "Xu Chenghua"; "Sasha Levin"; Subject: Re:Re: Re:[PATCH 4.9 81/96] MIPS:Loongson: Introduce and use loongson_llsc_mb() Hi, Greg, Just 4.9(and below) need to modify spinlock.h, 4.14, 4.19 does not need to do this because they have converted to qspinlock. And, sorry for my poor reply because I can only use mobile phone now. ---原始邮件--- 发件人:"Greg Kroah-Hartman" 发送时间:2019年3月14日(星期四) 凌晨4:58 收件人:"陈华才"; 主题:Re:[PATCH 4.9 81/96] MIPS:Loongson: Introduce and use loongson_llsc_mb() On Wed, Mar 13, 2019 at 09:17:15PM +0800, 陈华才 wrote: > Hi, GREG, > > 4.9 need to modify spinlock.h, please wait my patch. > > > > ---原始邮件--- > 发件人:"Greg Kroah-Hartman" > 发送时间:2019年3月13日(星期三) 凌晨1:10 > 收件人:"linux-kernel"; > 主题:[PATCH 4.9 81/96] MIPS: Loongson: Introduce and use loongson_llsc_mb() > 4.9-stable review patch. If anyone has any objections, please let me know. > > -- > > [ Upstream commit e02e07e3127d8aec1f4bcdfb2fc52a2d99b4859e ] > > On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and > lld/scd is very weak ordering. We should add sync instructions "before > each ll/lld" and "at the branch-target between ll/sc" to workaround. > Otherwise, this flaw will cause deadlock occasionally (e.g. when doing > heavy load test with LTP). > > Below is the explaination of CPU designer: > > "For Loongson 3 family, when a memory access instruction (load, store, > or prefetch)'ecuting occurs between the execution of LL and SC, the > success or failure of SC is not predictable. Although programmer would > not insert memory access instructions between LL and SC, the memory > instructions before LL in program-order, may dynamically executed > between the execution of LL/SC, so a memory fence (SYNC) is needed > before LL/LLD to avoid this situation. > > Since Loongson-3A R2 (3A2000), we have improved our hardware design to > handle this case. But we later deduce a rarely circumstance that some > speculatively executed memory instructions due to branch misprediction > between LL/SC still fall into the above case, so a memory fence (SYNC) > at branch-target (if its target is not between LL/SC) is needed for > Loongson 3A1000, 3B1500, 3A2000 and 3A3000. > > Our processor is continually evolving and we aim to to remove all these > workaround-SYNCs around LL/SC for new-come processor." > > Here is an example: > > Both cpu1 and cpu2 simutaneously run atomic_add by 1 on same atomic var, > this bug cause both '' by two cpus (in atomic_add) succeed at same > time(''urn 1), and the variable is only *added by 1*, sometimes, > which is wrong and unacceptable(it should be added by 2). > > Why disable fix-loongson3-llsc in compiler? > Because compiler fix will cause problems in kernel'ex_table section. > > This patch fix all the cases in kernel, but: > > +. the fix at the end of futex_atomic_cmpxchg_inatomic is for branch-target > of ''ere other cases which smp_mb__before_llsc() and smp_llsc_mb() fix > the ll and branch-target coincidently such as atomic_sub_if_positive/ > cmpxchg/xchg, just like this one. > > +. Loongson 3 does support CONFIG_EDAC_ATOMIC_SCRUB, so no need to touch > edac.h > > +. local_ops and cmpxchg_local should not be affected by this bug since > only the owner can write. > > +. mips_atomic_set for syscall.c is deprecated and rarely used, just let > it go > > Signed-off-by: Huacai Chen > Signed-off-by: Huang Pei > [paul.bur...@mips.com: > - Simplify the addition of -mno-fix-loongson3-llsc to cflags, and add > a comment describing why it'ere. > - Make loongson_llsc_mb() a no-op when > CONFIG_CPU_LOONGSON3_WORKAROUNDS=n, rather than a compiler memory > barrier. > - Add a comment describing the bug & how loongson_llsc_mb() helps > in asm/barrier.h.] > Signed-off-by: Paul Burton > Cc: Ralf Baechle > Cc: ambrose...@gmail.com > Cc: Steven J . Hill > Cc: linux-m...@linux-mips.org > Cc: Fuxin Zhang > Cc: Zhangjin Wu > Cc: Li Xuefeng > Cc: Xu Chenghua > Signed-off-by: Sasha Levin > --- > arch/mips/Kconfig | 15 ++ > arch/mips/include/asm/atomic.h | 6 ++ > arch/mips/include/asm/barrier.h | 36 + > arch/mips/include/asm/bitops.h | 5 + > arch/mips/include/asm/futex.h | 3 +++ > arch/mips/include/asm/pgtable.h | 2 ++ > arch/mips/loongson64/Platform | 23 + > arch/mips/mm/tlbex.c| 10 + > 8 files changed, 100 insertions(+) Ok, I will go drop this from all stable queues now, thanks! greg k-h
Re: [PATCH V3 1/5] genirq/affinity: don't mark 'affd' as const
My lemote.com email is too weak, I'll stop top post because this time I've cc-ed my gmail. I've tested, this patch can fix the nvme problem, but it can't be applied to 4.19 because of different context. And, I still think my original solution (genirq/affinity: Assign default affinity to pre/post vectors) is correct. There may be similar problems except nvme. Huacai -- Original -- From: "Thomas Gleixner"; Date: Thu, Feb 14, 2019 09:31 PM To: "陈华才"; Cc: "Keith Busch"; "Bjorn Helgaas"; "Jens Axboe"; "Sagi Grimberg"; "linux-pci"; "LKML"; "linux-nvme"; "Ming Lei"; "linux-block"; "Christoph Hellwig"; Subject: Re: [PATCH V3 1/5] genirq/affinity: don't mark 'affd' as const On Thu, 14 Feb 2019, 陈华才 wrote: Please do not top post > I'll test next week, but 4.19 has the same problem, how to fix that for 4.19? By applying the very same patch perhaps? Thanks, tglx
Re: [PATCH V3 1/5] genirq/affinity: don't mark 'affd' as const
I'll test next week, but 4.19 has the same problem, how to fix that for 4.19? Huacai -- Original -- From: "Thomas Gleixner"; Date: Thu, Feb 14, 2019 04:50 PM To: "Keith Busch"; Cc: "Bjorn Helgaas"; "Jens Axboe"; "Sagi Grimberg"; "linux-pci"; "LKML"; "linux-nvme"; "Ming Lei"; "linux-block"; "Christoph Hellwig"; "Huacai Chen"; Subject: Re: [PATCH V3 1/5] genirq/affinity: don't mark 'affd' as const On Wed, 13 Feb 2019, Keith Busch wrote: Cc+ Huacai Chen > On Wed, Feb 13, 2019 at 10:41:55PM +0100, Thomas Gleixner wrote: > > Btw, while I have your attention. There popped up an issue recently related > > to that affinity logic. > > > > The current implementation fails when: > > > > /* > > * If there aren't any vectors left after applying the pre/post > > * vectors don't bother with assigning affinity. > > */ > > if (nvecs == affd->pre_vectors + affd->post_vectors) > > return NULL; > > > > Now the discussion arised, that in that case the affinity sets are not > > allocated and filled in for the pre/post vectors, but somehow the > > underlying device still works and later on triggers the warning in the > > blk-mq code because the MSI entries do not have affinity information > > attached. > > > > Sure, we could make that work, but there are several issues: > > > > 1) irq_create_affinity_masks() has another reason to return NULL: > >memory allocation fails. > > > > 2) Does it make sense at all. > > > > Right now the PCI allocator ignores the NULL return and proceeds without > > setting any affinities. As a consequence nothing is managed and everything > > happens to work. > > > > But that happens to work is more by chance than by design and the warning > > is bogus if this is an expected mode of operation. > > > > We should address these points in some way. > > Ah, yes, that's a mistake in the nvme driver. It is assuming IO queues are > always on managed interrupts, but that's not true if when only 1 vector > could be allocated. This should be an appropriate fix to the warning: Looks correct. Chen, can you please test that? > --- > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 022ea1ee63f8..f2ccebe1c926 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -506,7 +506,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set) > * affinity), so use the regular blk-mq cpu mapping > */ > map->queue_offset = qoff; > - if (i != HCTX_TYPE_POLL) > + if (i != HCTX_TYPE_POLL && dev->num_vecs > 1) > blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset); > else > blk_mq_map_queues(map); > -- >
Re: [PATCH] genirq/affinity: Assign default affinity to pre/post vectors
陈华才 江苏中科梦兰电子科技有限公司/自主安全事业部/软件部 江苏常熟虞山镇梦兰村 -- Original -- From: "Thomas Gleixner"; Date: Wed, Jan 16, 2019 05:26 PM To: "陈华才"; Cc: "linux-kernel"; "Fuxin Zhang"; "wuzhangjin"; "stable"; "Christoph Hellwig"; "Michael Hernandez"; Subject: Re: [PATCH] genirq/affinity: Assign default affinity to pre/post vectors Chen, On Wed, 16 Jan 2019, 陈华才 wrote: > please do not top-post and use line breaks around 78 char. > I'm not removing all return NULL of irq_create_affinity_masks(), so the .. > > Moved content to the place where it belongs so the context is preserved. > -- Original -- > From: "Thomas Gleixner"; > Date: Wed, Jan 16, 2019 03:10 AM > To: "Huacai Chen"; > Cc: "linux-kernel"; "Fuxin > Zhang"; "wuzhangjin"; > "stable"; "Christoph Hellwig"; "Michael > Hernandez"; > Subject: Re: [PATCH] genirq/affinity: Assign default affinity to pre/post > vectors > And please configure your e-mail client to NOT copy the full headers into > the reply. > > > On Mon, 31 Dec 2018, Huacai Chen wrote: > > > > > > > Generally, irq_create_affinity_masks() assign default affinity to pre/ > > > > post vectors correctly. However, it ignore the case that there are only > > > > pre/post vectors (when nvecs == affd->pre_vectors + affd->post_vectors) > > > > and return NULL. This case usually happens when nvecs = 1 (e.g. in nvme > > > > driver when MSI-X is unavailable and fallback to MSI) and will trigger > > > > the warning in pci_irq_get_affinity(). This patch fix the corner case. > > > > > > Errm. This is just wrong. When this function returns NULL, then it has > > > failed and the caller or any subsequent code is not supposed to use the > > > result. > > > > > > The function can return NULL for other reasons, e.g. when the memory > > > allocation failed. How are you going to duct tape that one? > > > > I'm not removing all return NULL of irq_create_affinity_masks(), so the > > memory allocation failure still return NULL. I just handle the case that > > there are not enough irq vectors. E.g. in nvme driver, the caller may call > > irq_create_affinity_masks() with nvecs=1,pre_vectors=1,post_vectors=0. In > > this case, the only one vector's default affinity assigning is skipped. > > I did not say that you removed all NULL returns. I said that this function > can return NULL for other reasons and then the same situation will happen. > > If the masks pointer returned is NULL then the calling code or any > subsequent usage needs to handle it properly. Yes, I understand that this > change makes the warning go away for that particular case, but that's not > making it any more correct. Hi, Thomas, I don't think "nvecs == affd->pre_vectors + affd->post_vectors" is an ERROR, so it should be different with "return NULL for other reasons" to the caller. If the caller fallback from MSI-X to MSI, it is probably "nvecs=1,pre_vectors=1, post_vectors=0". The caller can work perfectly, if pre/post vectors are filled with the default affinity. > Thanks, > tglx
Re: [PATCH] genirq/affinity: Assign default affinity to pre/post vectors
Hi, Thomas, I'm not removing all return NULL of irq_create_affinity_masks(), so the memory allocation failure still return NULL. I just handle the case that there are not enough irq vectors. E.g. in nvme driver, the caller may call irq_create_affinity_masks() with nvecs=1,pre_vectors=1,post_vectors=0. In this case, the only one vector's default affinity assigning is skipped. Huacai -- Original -- From: "Thomas Gleixner"; Date: Wed, Jan 16, 2019 03:10 AM To: "Huacai Chen"; Cc: "linux-kernel"; "Fuxin Zhang"; "wuzhangjin"; "stable"; "Christoph Hellwig"; "Michael Hernandez"; Subject: Re: [PATCH] genirq/affinity: Assign default affinity to pre/post vectors On Mon, 31 Dec 2018, Huacai Chen wrote: > Generally, irq_create_affinity_masks() assign default affinity to pre/ > post vectors correctly. However, it ignore the case that there are only > pre/post vectors (when nvecs == affd->pre_vectors + affd->post_vectors) > and return NULL. This case usually happens when nvecs = 1 (e.g. in nvme > driver when MSI-X is unavailable and fallback to MSI) and will trigger > the warning in pci_irq_get_affinity(). This patch fix the corner case. Errm. This is just wrong. When this function returns NULL, then it has failed and the caller or any subsequent code is not supposed to use the result. The function can return NULL for other reasons, e.g. when the memory allocation failed. How are you going to duct tape that one? Thanks, tglx
Re: [[PATCH]] mips: Fix switch to NO_BOOTMEM for SGI-IP27/loongons3 NUMA
Hi, It seems the patch below can solve many problems after switched to NO_BOOTMEM, because the memory allocation behavior is more similar as before. diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 070234b..7a449d9 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -839,6 +839,7 @@ static void __init arch_mem_init(char **cmdline_p) /* call board setup routine */ plat_mem_setup(); + memblock_set_bottom_up(true); /* * Make sure all kernel memory is in the maps. The "UP" and diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index 0f852e1..15e103c 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2260,10 +2260,8 @@ void __init trap_init(void) unsigned long size = 0x200 + VECTORSPACING*64; phys_addr_t ebase_pa; - memblock_set_bottom_up(true); ebase = (unsigned long) memblock_alloc_from(size, 1 << fls(size), 0); - memblock_set_bottom_up(false); /* * Try to ensure ebase resides in KSeg0 if possible. @@ -2307,6 +2305,7 @@ void __init trap_init(void) if (board_ebase_setup) board_ebase_setup(); per_cpu_trap_init(true); + memblock_set_bottom_up(false); /* * Copy the generic exception handlers to their final destination. -- Original -- From: "Mike Rapoport"; Date: Fri, Nov 9, 2018 02:01 AM To: "Thomas Bogendoerfer"; Cc: "Ralf Baechle"; "Paul Burton"; "James Hogan"; "Huacai Chen"; "linux-mips"; "linux-kernel"; "rppt"; Subject: Re: [[PATCH]] mips: Fix switch to NO_BOOTMEM for SGI-IP27/loongons3 NUMA On November 8, 2018 6:52:17 PM GMT+02:00, Thomas Bogendoerfer wrote: >On Thu, 8 Nov 2018 18:18:23 +0200 >Mike Rapoport wrote: > >> On Thu, Nov 08, 2018 at 03:44:28PM +0100, Thomas Bogendoerfer wrote: >> > Commit bcec54bf3118 ("mips: switch to NO_BOOTMEM") broke SGI-IP27 >> > and NUMA enabled loongson3 by doing memblock_set_current_limit() >> > before max_low_pfn has been evaluated. Both platforms need to do >the >> > memblock_set_current_limit() in platform specific code. For >> > consistency the call to memblock_set_current_limit() is moved >> > to the common bootmem_init(), where max_low_pfn is calculated >> > for non NUMA enabled platforms. >> [..] >> >> As for SGI-IP27, the initialization of max_low_pfn as late as in >> paging_init() seems to be broken because it's value is used in >> arch_mem_init() and in finalize_initrd() anyway. > >well, the patch is tested on real hardware and the first caller of >a memblock_alloc* function is in a function called by >free_area_init_nodes(). Then, apparently, I've missed something else. The Onyx2 I worked on is dead for a couple of years now ;-) >> AFAIU, both platforms set max_low_pfn to last available pfn, so it >seems we >> can simply do >> >> max_low_pfn = PFN_PHYS(memblock_end_of_DRAM()) >> Should have been PHYS_PFN, sorry. >> in the prom_meminit() function for both platforms and drop the loop >> evaluating max_low_pfn in paging_init(). > >sounds like a better plan. I'll prepare a new patch. > >Thomas. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [[PATCH]] mips: Fix switch to NO_BOOTMEM for SGI-IP27/loongons3 NUMA
Hi, It seems the patch below can solve many problems after switched to NO_BOOTMEM, because the memory allocation behavior is more similar as before. diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 070234b..7a449d9 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -839,6 +839,7 @@ static void __init arch_mem_init(char **cmdline_p) /* call board setup routine */ plat_mem_setup(); + memblock_set_bottom_up(true); /* * Make sure all kernel memory is in the maps. The "UP" and diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index 0f852e1..15e103c 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2260,10 +2260,8 @@ void __init trap_init(void) unsigned long size = 0x200 + VECTORSPACING*64; phys_addr_t ebase_pa; - memblock_set_bottom_up(true); ebase = (unsigned long) memblock_alloc_from(size, 1 << fls(size), 0); - memblock_set_bottom_up(false); /* * Try to ensure ebase resides in KSeg0 if possible. @@ -2307,6 +2305,7 @@ void __init trap_init(void) if (board_ebase_setup) board_ebase_setup(); per_cpu_trap_init(true); + memblock_set_bottom_up(false); /* * Copy the generic exception handlers to their final destination. -- Original -- From: "Mike Rapoport"; Date: Fri, Nov 9, 2018 02:01 AM To: "Thomas Bogendoerfer"; Cc: "Ralf Baechle"; "Paul Burton"; "James Hogan"; "Huacai Chen"; "linux-mips"; "linux-kernel"; "rppt"; Subject: Re: [[PATCH]] mips: Fix switch to NO_BOOTMEM for SGI-IP27/loongons3 NUMA On November 8, 2018 6:52:17 PM GMT+02:00, Thomas Bogendoerfer wrote: >On Thu, 8 Nov 2018 18:18:23 +0200 >Mike Rapoport wrote: > >> On Thu, Nov 08, 2018 at 03:44:28PM +0100, Thomas Bogendoerfer wrote: >> > Commit bcec54bf3118 ("mips: switch to NO_BOOTMEM") broke SGI-IP27 >> > and NUMA enabled loongson3 by doing memblock_set_current_limit() >> > before max_low_pfn has been evaluated. Both platforms need to do >the >> > memblock_set_current_limit() in platform specific code. For >> > consistency the call to memblock_set_current_limit() is moved >> > to the common bootmem_init(), where max_low_pfn is calculated >> > for non NUMA enabled platforms. >> [..] >> >> As for SGI-IP27, the initialization of max_low_pfn as late as in >> paging_init() seems to be broken because it's value is used in >> arch_mem_init() and in finalize_initrd() anyway. > >well, the patch is tested on real hardware and the first caller of >a memblock_alloc* function is in a function called by >free_area_init_nodes(). Then, apparently, I've missed something else. The Onyx2 I worked on is dead for a couple of years now ;-) >> AFAIU, both platforms set max_low_pfn to last available pfn, so it >seems we >> can simply do >> >> max_low_pfn = PFN_PHYS(memblock_end_of_DRAM()) >> Should have been PHYS_PFN, sorry. >> in the prom_meminit() function for both platforms and drop the loop >> evaluating max_low_pfn in paging_init(). > >sounds like a better plan. I'll prepare a new patch. > >Thomas. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH] cacheinfo: Keep the old value if of_property_read_u32fails
Hi, Sudeep, I use MIPS, and there is no "size" in /sys/devices/system/cpu/cpuX/cache/indexX/ file after your patch. Because the DT node only has "next-level-cache = <>;" but has no "size" information. Huacai -- Original -- From: "Sudeep Holla"; Date: Thu, Oct 18, 2018 05:15 PM To: "Huacai Chen"; Cc: "Greg Kroah-Hartman"; "Rafael J . Wysocki"; "LKML"; "zhangfx"; "wuzhangjin"; "Sudeep Holla"; Subject: Re: [PATCH] cacheinfo: Keep the old value if of_property_read_u32fails Hi Huacai, On Thu, Oct 18, 2018 at 09:28:11AM +0800, Huacai Chen wrote: > Hi, Sudeep, > > Please see this call-graph: > > static int detect_cache_attributes(unsigned int cpu) > > ret = populate_cache_leaves(cpu); > > ret = cache_shared_cpu_map_setup(cpu); --> > cache_setup_of_node() --> cache_of_set_props() --> > cache_size()/cache_nr_sets > > populate_cache_leaves() is arch-specific, All archs except arm64 have > provide initial values to cacheinfo:size and cacheinfo:number_of_sets. > > Related files: > arch/nds32/kernel/cacheinfo.c > arch/mips/kernel/cacheinfo.c Only above 2 could be affected, but I want to be sure. I wasn't aware of MIPS arch setting values elsewhere, assumed DT, my bad. You have still not answered my question. Which arch are you facing issue ? Or are you proposing this by code inspection ? I changes look fine, but want to be sure if the issue you are seeing is in above architectures. -- Regards, Sudeep
Re: [PATCH] cacheinfo: Keep the old value if of_property_read_u32fails
Hi, Sudeep, I use MIPS, and there is no "size" in /sys/devices/system/cpu/cpuX/cache/indexX/ file after your patch. Because the DT node only has "next-level-cache = <>;" but has no "size" information. Huacai -- Original -- From: "Sudeep Holla"; Date: Thu, Oct 18, 2018 05:15 PM To: "Huacai Chen"; Cc: "Greg Kroah-Hartman"; "Rafael J . Wysocki"; "LKML"; "zhangfx"; "wuzhangjin"; "Sudeep Holla"; Subject: Re: [PATCH] cacheinfo: Keep the old value if of_property_read_u32fails Hi Huacai, On Thu, Oct 18, 2018 at 09:28:11AM +0800, Huacai Chen wrote: > Hi, Sudeep, > > Please see this call-graph: > > static int detect_cache_attributes(unsigned int cpu) > > ret = populate_cache_leaves(cpu); > > ret = cache_shared_cpu_map_setup(cpu); --> > cache_setup_of_node() --> cache_of_set_props() --> > cache_size()/cache_nr_sets > > populate_cache_leaves() is arch-specific, All archs except arm64 have > provide initial values to cacheinfo:size and cacheinfo:number_of_sets. > > Related files: > arch/nds32/kernel/cacheinfo.c > arch/mips/kernel/cacheinfo.c Only above 2 could be affected, but I want to be sure. I wasn't aware of MIPS arch setting values elsewhere, assumed DT, my bad. You have still not answered my question. Which arch are you facing issue ? Or are you proposing this by code inspection ? I changes look fine, but want to be sure if the issue you are seeing is in above architectures. -- Regards, Sudeep
Re: [PATCH] MIPS: Change definition of cpu_relax() for Loongson-3
Hi, Paul, SFB can improve the memory bandwidth as much as 30%, and we are planning to enable SFB by default. So, we want to control cpu_relax() under CONFIG_CPU_LOONGSON3, not under CONFIG_LOONGSON3_ENHANCEMENT. Huacai -- Original -- From: "Paul Burton"; Date: Fri, Jul 20, 2018 05:15 AM To: "Huacai Chen"; Cc: "Ralf Baechle"; "James Hogan"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "stable"; "Alan Stern"; "Andrea Parri"; "Will Deacon"; "Peter Zijlstra"; "Boqun Feng"; "Nicholas Piggin"; "David Howells"; "Jade Alglave"; "Luc Maranget"; "Paul E. McKenney"; "Akira Yokosawa"; "LKML"; Subject: Re: [PATCH] MIPS: Change definition of cpu_relax() for Loongson-3 Hi Huacai, On Wed, Jul 18, 2018 at 09:15:46AM +0800, Huacai Chen wrote: > >> diff --git a/arch/mips/include/asm/processor.h > >> b/arch/mips/include/asm/processor.h > >> index af34afb..a8c4a3a 100644 > >> --- a/arch/mips/include/asm/processor.h > >> +++ b/arch/mips/include/asm/processor.h > >> @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); > >> #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) > >> #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) > >> > >> +#ifdef CONFIG_CPU_LOONGSON3 > >> +/* > >> + * Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a > >> read > >> + * loop. Since spin loops of any kind should have a cpu_relax() in them, > >> force > >> + * a Store-Fill-Buffer flush from cpu_relax() such that any pending > >> writes will > >> + * become available as expected. > >> + */ > > > > I think "may starve writes" or "may queue writes indefinitely" would be > > clearer than "may get starved". > > Need I change the comment and resend? Or you change the comment and get > merged? I'm happy to fix up the comment - but have a couple more questions. Looking into the history, would it be fair to say that this is only a problem after commit 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT") when CONFIG_LOONGSON3_ENHANCEMENT=y, which adds code to enable the SFB? If so would it make sense to use CONFIG_LOONGSON3_ENHANCEMENT to select the use of smp_mb()? How much does performance gain does enabling the SFB give you? Would it be reasonable to just disable it, rather than using this workaround? Thanks, Paul
Re: [PATCH] MIPS: Change definition of cpu_relax() for Loongson-3
Hi, Paul, SFB can improve the memory bandwidth as much as 30%, and we are planning to enable SFB by default. So, we want to control cpu_relax() under CONFIG_CPU_LOONGSON3, not under CONFIG_LOONGSON3_ENHANCEMENT. Huacai -- Original -- From: "Paul Burton"; Date: Fri, Jul 20, 2018 05:15 AM To: "Huacai Chen"; Cc: "Ralf Baechle"; "James Hogan"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "stable"; "Alan Stern"; "Andrea Parri"; "Will Deacon"; "Peter Zijlstra"; "Boqun Feng"; "Nicholas Piggin"; "David Howells"; "Jade Alglave"; "Luc Maranget"; "Paul E. McKenney"; "Akira Yokosawa"; "LKML"; Subject: Re: [PATCH] MIPS: Change definition of cpu_relax() for Loongson-3 Hi Huacai, On Wed, Jul 18, 2018 at 09:15:46AM +0800, Huacai Chen wrote: > >> diff --git a/arch/mips/include/asm/processor.h > >> b/arch/mips/include/asm/processor.h > >> index af34afb..a8c4a3a 100644 > >> --- a/arch/mips/include/asm/processor.h > >> +++ b/arch/mips/include/asm/processor.h > >> @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); > >> #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) > >> #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) > >> > >> +#ifdef CONFIG_CPU_LOONGSON3 > >> +/* > >> + * Loongson-3's SFB (Store-Fill-Buffer) may get starved when stuck in a > >> read > >> + * loop. Since spin loops of any kind should have a cpu_relax() in them, > >> force > >> + * a Store-Fill-Buffer flush from cpu_relax() such that any pending > >> writes will > >> + * become available as expected. > >> + */ > > > > I think "may starve writes" or "may queue writes indefinitely" would be > > clearer than "may get starved". > > Need I change the comment and resend? Or you change the comment and get > merged? I'm happy to fix up the comment - but have a couple more questions. Looking into the history, would it be fair to say that this is only a problem after commit 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT") when CONFIG_LOONGSON3_ENHANCEMENT=y, which adds code to enable the SFB? If so would it make sense to use CONFIG_LOONGSON3_ENHANCEMENT to select the use of smp_mb()? How much does performance gain does enabling the SFB give you? Would it be reasonable to just disable it, rather than using this workaround? Thanks, Paul
Re: [PATCH 4.9 03/32] MIPS: Use asyncIPIsforarch_trigger_cpumask_backtrace()
Hi, Greg, I have resend patches and cc stable list. This patch is needed for 4.4.y but need restruction, as Paul said: Strictly speaking the faulty MIPS implementation can be traced further back to commit 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function") in v3.19 Huacai -- Original -- From: "Greg Kroah-Hartman"; Date: Tue, Jul 17, 2018 03:20 PM To: "陈华才"; Cc: "linux-kernel"; "stable"; "Paul Burton"; "James Hogan"; Subject: Re: [PATCH 4.9 03/32] MIPS: Use asyncIPIsforarch_trigger_cpumask_backtrace() On Tue, Jul 17, 2018 at 02:53:38PM +0800, 陈华才 wrote: > Hi, Greg, > > I have backported and tested it to 4.9 and 4.4: > https://patchwork.linux-mips.org/patch/19862/ > https://patchwork.linux-mips.org/patch/19863/ I can't do anything with random web links. Please properly cc: the stable list with the backports and I can take them from there. And are you sure they are needed for 4.4.y? thanks, greg k-h
Re: [PATCH 4.9 03/32] MIPS: Use asyncIPIsforarch_trigger_cpumask_backtrace()
Hi, Greg, I have resend patches and cc stable list. This patch is needed for 4.4.y but need restruction, as Paul said: Strictly speaking the faulty MIPS implementation can be traced further back to commit 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function") in v3.19 Huacai -- Original -- From: "Greg Kroah-Hartman"; Date: Tue, Jul 17, 2018 03:20 PM To: "陈华才"; Cc: "linux-kernel"; "stable"; "Paul Burton"; "James Hogan"; Subject: Re: [PATCH 4.9 03/32] MIPS: Use asyncIPIsforarch_trigger_cpumask_backtrace() On Tue, Jul 17, 2018 at 02:53:38PM +0800, 陈华才 wrote: > Hi, Greg, > > I have backported and tested it to 4.9 and 4.4: > https://patchwork.linux-mips.org/patch/19862/ > https://patchwork.linux-mips.org/patch/19863/ I can't do anything with random web links. Please properly cc: the stable list with the backports and I can take them from there. And are you sure they are needed for 4.4.y? thanks, greg k-h
Re: [PATCH 4.9 03/32] MIPS: Use async IPIsforarch_trigger_cpumask_backtrace()
Hi, Greg, I have backported and tested it to 4.9 and 4.4: https://patchwork.linux-mips.org/patch/19862/ https://patchwork.linux-mips.org/patch/19863/ Huacai -- Original -- From: "Greg Kroah-Hartman"; Date: Tue, Jul 17, 2018 02:34 AM To: "陈华才"; Cc: "linux-kernel"; "stable"; "Paul Burton"; "James Hogan"; Subject: Re: [PATCH 4.9 03/32] MIPS: Use async IPIsforarch_trigger_cpumask_backtrace() On Mon, Jul 16, 2018 at 12:46:21PM +0200, Greg Kroah-Hartman wrote: > On Mon, Jul 16, 2018 at 05:46:12PM +0800, 陈华才 wrote: > > Just change "call_single_data_t" to "struct call_single_data" and > > everything is OK. > > Ok, I've done that now, thanks. And I messed it up. So I've just dropped it entirely. Can someone send me a properly backported patch please? thanks, greg k-h
Re: [PATCH 4.9 03/32] MIPS: Use async IPIsforarch_trigger_cpumask_backtrace()
Hi, Greg, I have backported and tested it to 4.9 and 4.4: https://patchwork.linux-mips.org/patch/19862/ https://patchwork.linux-mips.org/patch/19863/ Huacai -- Original -- From: "Greg Kroah-Hartman"; Date: Tue, Jul 17, 2018 02:34 AM To: "陈华才"; Cc: "linux-kernel"; "stable"; "Paul Burton"; "James Hogan"; Subject: Re: [PATCH 4.9 03/32] MIPS: Use async IPIsforarch_trigger_cpumask_backtrace() On Mon, Jul 16, 2018 at 12:46:21PM +0200, Greg Kroah-Hartman wrote: > On Mon, Jul 16, 2018 at 05:46:12PM +0800, 陈华才 wrote: > > Just change "call_single_data_t" to "struct call_single_data" and > > everything is OK. > > Ok, I've done that now, thanks. And I messed it up. So I've just dropped it entirely. Can someone send me a properly backported patch please? thanks, greg k-h
Re: [PATCH 4.9 03/32] MIPS: Use async IPIs forarch_trigger_cpumask_backtrace()
Just change "call_single_data_t" to "struct call_single_data" and everything is OK. Huacai -- Original -- From: "Greg Kroah-Hartman"; Date: Mon, Jul 16, 2018 05:40 PM To: "陈华才"; Cc: "linux-kernel"; "stable"; "Paul Burton"; "James Hogan"; "Ralf Baechle"; "linux-mips"; Subject: Re: [PATCH 4.9 03/32] MIPS: Use async IPIs forarch_trigger_cpumask_backtrace() On Mon, Jul 16, 2018 at 05:29:05PM +0800, 陈华才 wrote: > Hi, Greg, > > kernel-4.9 doesn't have call_single_data_t, we should use struct > call_single_data instead. Can you send me a patch to merge with this one with that change so that I know I get it right? thanks, greg k-h
Re: [PATCH 4.9 03/32] MIPS: Use async IPIs forarch_trigger_cpumask_backtrace()
Just change "call_single_data_t" to "struct call_single_data" and everything is OK. Huacai -- Original -- From: "Greg Kroah-Hartman"; Date: Mon, Jul 16, 2018 05:40 PM To: "陈华才"; Cc: "linux-kernel"; "stable"; "Paul Burton"; "James Hogan"; "Ralf Baechle"; "linux-mips"; Subject: Re: [PATCH 4.9 03/32] MIPS: Use async IPIs forarch_trigger_cpumask_backtrace() On Mon, Jul 16, 2018 at 05:29:05PM +0800, 陈华才 wrote: > Hi, Greg, > > kernel-4.9 doesn't have call_single_data_t, we should use struct > call_single_data instead. Can you send me a patch to merge with this one with that change so that I know I get it right? thanks, greg k-h
Re:[PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
Hi, Greg, kernel-4.9 doesn't have call_single_data_t, we should use struct call_single_data instead. Huacai -- Original -- From: "Greg Kroah-Hartman"; Date: Mon, Jul 16, 2018 03:36 PM To: "linux-kernel"; Cc: "Greg Kroah-Hartman"; "stable"; "Paul Burton"; "James Hogan"; "Ralf Baechle"; "Huacai Chen"; "linux-mips"; Subject: [PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace() 4.9-stable review patch. If anyone has any objections, please let me know. -- From: Paul Burton commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream. The current MIPS implementation of arch_trigger_cpumask_backtrace() is broken because it attempts to use synchronous IPIs despite the fact that it may be run with interrupts disabled. This means that when arch_trigger_cpumask_backtrace() is invoked, for example by the RCU CPU stall watchdog, we may: - Deadlock due to use of synchronous IPIs with interrupts disabled, causing the CPU that's attempting to generate the backtrace output to hang itself. - Not succeed in generating the desired output from remote CPUs. - Produce warnings about this from smp_call_function_many(), for example: [42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks: [42760.535755] 0-...!: (1 GPs behind) idle=ade/140/0 softirq=526944/526945 fqs=0 [42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140/0 softirq=547885/547885 fqs=0 [42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33) [42760.568927] [ cut here ] [42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c [42760.587839] Modules linked in: [42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2 [42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fff0 0007 0006 8e09bca8 [42760.616937] 95b2b379 95b2b379 807a0080 0007 81944518 018a 0032 [42760.630095] 0030 8000 806eca74 0009 8017e2b8 01a0 [42760.643169] 0002 8e09baa4 0008 808b8008 86d69080 8e09bca0 [42760.656282] 8e09ad50 805e20aa 8017e2b8 0009 801070ca [42760.669424] ... [42760.673919] Call Trace: [42760.678672] [<27fde568>] show_stack+0x70/0xf0 [42760.685417] [<84751641>] dump_stack+0xaa/0xd0 [42760.692188] [<699d671c>] __warn+0x80/0x92 [42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36 [42760.705912] [] smp_call_function_many+0x88/0x20c [42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a [42760.722216] [] rcu_dump_cpu_stacks+0x6a/0x98 [42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac [42760.737476] [<059b3b43>] update_process_times+0x18/0x34 [42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38 [42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50 [42760.759882] [] __hrtimer_run_queues+0xc6/0x226 [42760.767418] [] hrtimer_interrupt+0x88/0x19a [42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a [42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168 [42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86 [42760.805545] [] gic_irq_dispatch+0xa/0x14 [42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.820086] [] do_IRQ+0x16/0x20 [42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94 [42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78 [42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c [42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c [42760.855693] [] flush_tlb_mm+0x2a/0x98 [42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44 [42760.869628] [] arch_tlb_finish_mmu+0x26/0x3e [42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66 [42760.883907] [] exit_mmap+0x76/0xea [42760.890428] [] mmput+0x80/0x11a [42760.896632] [] do_exit+0x1f4/0x80c [42760.903158] [] do_group_exit+0x20/0x7e [42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e [42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c [42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c [42760.931765] ---[ end trace 02aa09da9dc52a60 ]--- [42760.938342] [ cut here ] [42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8 ... This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async IPIs & smp_call_function_single_async() in order to resolve this problem. We ensure use of the pre-allocated call_single_data_t structures is serialized by maintaining a cpumask indicating that they're busy, and refusing to attempt to send an IPI when a CPU's bit is set in this mask. This should only happen if
Re:[PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
Hi, Greg, kernel-4.9 doesn't have call_single_data_t, we should use struct call_single_data instead. Huacai -- Original -- From: "Greg Kroah-Hartman"; Date: Mon, Jul 16, 2018 03:36 PM To: "linux-kernel"; Cc: "Greg Kroah-Hartman"; "stable"; "Paul Burton"; "James Hogan"; "Ralf Baechle"; "Huacai Chen"; "linux-mips"; Subject: [PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace() 4.9-stable review patch. If anyone has any objections, please let me know. -- From: Paul Burton commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream. The current MIPS implementation of arch_trigger_cpumask_backtrace() is broken because it attempts to use synchronous IPIs despite the fact that it may be run with interrupts disabled. This means that when arch_trigger_cpumask_backtrace() is invoked, for example by the RCU CPU stall watchdog, we may: - Deadlock due to use of synchronous IPIs with interrupts disabled, causing the CPU that's attempting to generate the backtrace output to hang itself. - Not succeed in generating the desired output from remote CPUs. - Produce warnings about this from smp_call_function_many(), for example: [42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks: [42760.535755] 0-...!: (1 GPs behind) idle=ade/140/0 softirq=526944/526945 fqs=0 [42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140/0 softirq=547885/547885 fqs=0 [42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33) [42760.568927] [ cut here ] [42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c [42760.587839] Modules linked in: [42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2 [42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fff0 0007 0006 8e09bca8 [42760.616937] 95b2b379 95b2b379 807a0080 0007 81944518 018a 0032 [42760.630095] 0030 8000 806eca74 0009 8017e2b8 01a0 [42760.643169] 0002 8e09baa4 0008 808b8008 86d69080 8e09bca0 [42760.656282] 8e09ad50 805e20aa 8017e2b8 0009 801070ca [42760.669424] ... [42760.673919] Call Trace: [42760.678672] [<27fde568>] show_stack+0x70/0xf0 [42760.685417] [<84751641>] dump_stack+0xaa/0xd0 [42760.692188] [<699d671c>] __warn+0x80/0x92 [42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36 [42760.705912] [] smp_call_function_many+0x88/0x20c [42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a [42760.722216] [] rcu_dump_cpu_stacks+0x6a/0x98 [42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac [42760.737476] [<059b3b43>] update_process_times+0x18/0x34 [42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38 [42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50 [42760.759882] [] __hrtimer_run_queues+0xc6/0x226 [42760.767418] [] hrtimer_interrupt+0x88/0x19a [42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a [42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168 [42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86 [42760.805545] [] gic_irq_dispatch+0xa/0x14 [42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.820086] [] do_IRQ+0x16/0x20 [42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94 [42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78 [42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c [42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c [42760.855693] [] flush_tlb_mm+0x2a/0x98 [42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44 [42760.869628] [] arch_tlb_finish_mmu+0x26/0x3e [42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66 [42760.883907] [] exit_mmap+0x76/0xea [42760.890428] [] mmput+0x80/0x11a [42760.896632] [] do_exit+0x1f4/0x80c [42760.903158] [] do_group_exit+0x20/0x7e [42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e [42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c [42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c [42760.931765] ---[ end trace 02aa09da9dc52a60 ]--- [42760.938342] [ cut here ] [42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8 ... This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async IPIs & smp_call_function_single_async() in order to resolve this problem. We ensure use of the pre-allocated call_single_data_t structures is serialized by maintaining a cpumask indicating that they're busy, and refusing to attempt to send an IPI when a CPU's bit is set in this mask. This should only happen if
Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3
Hi, Peter, I'm afraid that you have missing something.. Firstly, our previous conclusion (READ_ONCE need a barrier to avoid 'reads prioritised over writes') is totally wrong. So define cpu_relax() to smp_mb() like ARM11MPCore is incorrect, even if it can 'solve' Loongson's problem. Secondly, I think the real problem is like this: 1, CPU0 set the lock to 0, then do something; 2, While CPU0 is doing something, CPU1 set the flag to 1 with WRITE_ONCE(), and then wait the lock become to 1 with a READ_ONCE() loop; 3, After CPU0 complete its work, it wait the flag become to 1, and if so then set the lock to 1; 4, If the lock becomes to 1, CPU1 will leave the READ_ONCE() loop. If without SFB, everything is OK. But with SFB in step 2, a READ_ONCE() loop is right after WRITE_ONCE(), which makes the flag cached in SFB (so be invisible by other CPUs) for ever, then both CPU0 and CPU1 wait for ever. I don't think this is a hardware bug, in design, SFB will flushed to L1 cache in three cases: 1, data in SFB is full (be a complete cache line); 2, there is a subsequent read access in the same cache line; 3, a 'sync' instruction is executed. In this case, there is no other memory access (read or write) between WRITE_ONCE() and READ_ONCE() loop. So Case 1 and Case 2 will not happen, and the only way to make the flag be visible is wbflush (wbflush is sync in Loongson's case). I think this problem is not only happens on Loongson, but will happen on other CPUs which have write buffer (unless the write buffer has a 4th case to be flushed). Huacai -- Original -- From: "Peter Zijlstra"; Date: Tue, Jul 10, 2018 06:54 PM To: "Huacai Chen"; Cc: "Paul Burton"; "Ralf Baechle"; "James Hogan"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "stable"; "Alan Stern"; "Andrea Parri"; "Will Deacon"; "Boqun Feng"; "Nicholas Piggin"; "David Howells"; "Jade Alglave"; "Luc Maranget"; "Paul E. McKenney"; "Akira Yokosawa"; "LKML"; Subject: Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3 On Tue, Jul 10, 2018 at 11:36:37AM +0200, Peter Zijlstra wrote: > So now explain why the cpu_relax() hack that arm did doesn't work for > you? So below is the patch I think you want; if not explain in detail how this is wrong. diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afbc32d9..e59773de6528 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) +#ifdef CONFIG_CPU_LOONGSON3 +/* + * Loongson-3 has a CPU bug where the store buffer gets starved when stuck in a + * read loop. Since spin loops of any kind should have a cpu_relax() in them, + * force a store-buffer flush from cpu_relax() such that any pending writes + * will become available as expected. + */ +#define cpu_relax()smp_mb() +#else #define cpu_relax()barrier() +#endif /* * Return_address is a replacement for __builtin_return_address(count)
Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3
Hi, Peter, I'm afraid that you have missing something.. Firstly, our previous conclusion (READ_ONCE need a barrier to avoid 'reads prioritised over writes') is totally wrong. So define cpu_relax() to smp_mb() like ARM11MPCore is incorrect, even if it can 'solve' Loongson's problem. Secondly, I think the real problem is like this: 1, CPU0 set the lock to 0, then do something; 2, While CPU0 is doing something, CPU1 set the flag to 1 with WRITE_ONCE(), and then wait the lock become to 1 with a READ_ONCE() loop; 3, After CPU0 complete its work, it wait the flag become to 1, and if so then set the lock to 1; 4, If the lock becomes to 1, CPU1 will leave the READ_ONCE() loop. If without SFB, everything is OK. But with SFB in step 2, a READ_ONCE() loop is right after WRITE_ONCE(), which makes the flag cached in SFB (so be invisible by other CPUs) for ever, then both CPU0 and CPU1 wait for ever. I don't think this is a hardware bug, in design, SFB will flushed to L1 cache in three cases: 1, data in SFB is full (be a complete cache line); 2, there is a subsequent read access in the same cache line; 3, a 'sync' instruction is executed. In this case, there is no other memory access (read or write) between WRITE_ONCE() and READ_ONCE() loop. So Case 1 and Case 2 will not happen, and the only way to make the flag be visible is wbflush (wbflush is sync in Loongson's case). I think this problem is not only happens on Loongson, but will happen on other CPUs which have write buffer (unless the write buffer has a 4th case to be flushed). Huacai -- Original -- From: "Peter Zijlstra"; Date: Tue, Jul 10, 2018 06:54 PM To: "Huacai Chen"; Cc: "Paul Burton"; "Ralf Baechle"; "James Hogan"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "stable"; "Alan Stern"; "Andrea Parri"; "Will Deacon"; "Boqun Feng"; "Nicholas Piggin"; "David Howells"; "Jade Alglave"; "Luc Maranget"; "Paul E. McKenney"; "Akira Yokosawa"; "LKML"; Subject: Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3 On Tue, Jul 10, 2018 at 11:36:37AM +0200, Peter Zijlstra wrote: > So now explain why the cpu_relax() hack that arm did doesn't work for > you? So below is the patch I think you want; if not explain in detail how this is wrong. diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index af34afbc32d9..e59773de6528 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29]) #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) +#ifdef CONFIG_CPU_LOONGSON3 +/* + * Loongson-3 has a CPU bug where the store buffer gets starved when stuck in a + * read loop. Since spin loops of any kind should have a cpu_relax() in them, + * force a store-buffer flush from cpu_relax() such that any pending writes + * will become available as expected. + */ +#define cpu_relax()smp_mb() +#else #define cpu_relax()barrier() +#endif /* * Return_address is a replacement for __builtin_return_address(count)
Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
Hi, Peter and Paul, Loongson-3's Store Fill Buffer is nearly the same as your "Store Buffer", and it increases the memory ordering weakness. So, smp_cond_load_acquire() only need a __smp_mb() before the loop, not after every READ_ONCE(). In other word, the following code is just OK: #define smp_cond_load_acquire(ptr, cond_expr) \ ({ \ typeof(ptr) __PTR = (ptr); \ typeof(*ptr) VAL; \ __smp_mb(); \ for (;;) { \ VAL = READ_ONCE(*__PTR);\ if (cond_expr) \ break; \ cpu_relax();\ } \ __smp_mb(); \ VAL;\ }) the __smp_mb() before loop is used to avoid "reads prioritised over writes", which is caused by SFB's weak ordering and similar to ARM11MPCore (mentioned by Will Deacon). Huacai -- Original -- From: "Peter Zijlstra"; Date: Tue, Jun 19, 2018 03:22 PM To: "陈华才"; Cc: "Paul Burton"; "Ralf Baechle"; "James Hogan"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "Huacai Chen"; "stable"; "Alan Stern"; "AndreaParri"; "Will Deacon"; "Boqun Feng"; "Nicholas Piggin"; "David Howells"; "Jade Alglave"; "Luc Maranget"; "Paul E. McKenney"; "Akira Yokosawa"; "linux-kernel"; Subject: Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3 On Tue, Jun 19, 2018 at 02:40:14PM +0800, 陈华才 wrote: > Hi, Paul, > > First of all, could you please check why linux-mips reject e-mails > from lemote.com? Of course I can send e-mails by gmail, but my gmail > can't receive e-mails from linux-mips since March, 2018. Could you please learn to use email? No top posting and wrap lines at 78 chars. > I have already read Documentation/memory-barriers.txt, but I don't > think we should define a smp_read_barrier_depends() for Loongson-3. > Because Loongson-3's behavior isn't like Alpha, and in syntax, this is > not a data-dependent issue. Agreed, this is not a data-dependency issue. > There is no document about Loongson-3's SFB. In my opinion, SFB looks > like the L0 cache but sometimes it is out of cache-coherent machanism > (L1 cache's cross-core coherency is maintained by hardware, but not > always true for SFB). smp_mb() is needed for smp_cond_load_acquire(), > but not every READ_ONCE(). Linux does _NOT_ support non-coherent SMP. If your system is not fully coherent, you're out of luck. But please, explain in excruciating detail what exactly you need that smp_mb for. If, like I posited in my previous email, it is to ensure remote store buffer flushes, then your machine is terminally broken.
Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
Hi, Peter and Paul, Loongson-3's Store Fill Buffer is nearly the same as your "Store Buffer", and it increases the memory ordering weakness. So, smp_cond_load_acquire() only need a __smp_mb() before the loop, not after every READ_ONCE(). In other word, the following code is just OK: #define smp_cond_load_acquire(ptr, cond_expr) \ ({ \ typeof(ptr) __PTR = (ptr); \ typeof(*ptr) VAL; \ __smp_mb(); \ for (;;) { \ VAL = READ_ONCE(*__PTR);\ if (cond_expr) \ break; \ cpu_relax();\ } \ __smp_mb(); \ VAL;\ }) the __smp_mb() before loop is used to avoid "reads prioritised over writes", which is caused by SFB's weak ordering and similar to ARM11MPCore (mentioned by Will Deacon). Huacai -- Original -- From: "Peter Zijlstra"; Date: Tue, Jun 19, 2018 03:22 PM To: "陈华才"; Cc: "Paul Burton"; "Ralf Baechle"; "James Hogan"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "Huacai Chen"; "stable"; "Alan Stern"; "AndreaParri"; "Will Deacon"; "Boqun Feng"; "Nicholas Piggin"; "David Howells"; "Jade Alglave"; "Luc Maranget"; "Paul E. McKenney"; "Akira Yokosawa"; "linux-kernel"; Subject: Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3 On Tue, Jun 19, 2018 at 02:40:14PM +0800, 陈华才 wrote: > Hi, Paul, > > First of all, could you please check why linux-mips reject e-mails > from lemote.com? Of course I can send e-mails by gmail, but my gmail > can't receive e-mails from linux-mips since March, 2018. Could you please learn to use email? No top posting and wrap lines at 78 chars. > I have already read Documentation/memory-barriers.txt, but I don't > think we should define a smp_read_barrier_depends() for Loongson-3. > Because Loongson-3's behavior isn't like Alpha, and in syntax, this is > not a data-dependent issue. Agreed, this is not a data-dependency issue. > There is no document about Loongson-3's SFB. In my opinion, SFB looks > like the L0 cache but sometimes it is out of cache-coherent machanism > (L1 cache's cross-core coherency is maintained by hardware, but not > always true for SFB). smp_mb() is needed for smp_cond_load_acquire(), > but not every READ_ONCE(). Linux does _NOT_ support non-coherent SMP. If your system is not fully coherent, you're out of luck. But please, explain in excruciating detail what exactly you need that smp_mb for. If, like I posited in my previous email, it is to ensure remote store buffer flushes, then your machine is terminally broken.
Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
Hi, Paul, First of all, could you please check why linux-mips reject e-mails from lemote.com? Of course I can send e-mails by gmail, but my gmail can't receive e-mails from linux-mips since March, 2018. I have already read Documentation/memory-barriers.txt, but I don't think we should define a smp_read_barrier_depends() for Loongson-3. Because Loongson-3's behavior isn't like Alpha, and in syntax, this is not a data-dependent issue. There is no document about Loongson-3's SFB. In my opinion, SFB looks like the L0 cache but sometimes it is out of cache-coherent machanism (L1 cache's cross-core coherency is maintained by hardware, but not always true for SFB). smp_mb() is needed for smp_cond_load_acquire(), but not every READ_ONCE(). Huacai -- Original -- From: "Paul Burton"; Date: Tue, Jun 19, 2018 02:51 AM To: "Huacai Chen"; Cc: "Ralf Baechle"; "James Hogan"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "Huacai Chen"; "stable"; "Alan Stern"; "AndreaParri"; "Will Deacon"; "Peter Zijlstra"; "Boqun Feng"; "Nicholas Piggin"; "David Howells"; "Jade Alglave"; "Luc Maranget"; "Paul E. McKenney"; "Akira Yokosawa"; "linux-kernel"; Subject: Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3 Hi Huacai, On Fri, Jun 15, 2018 at 02:07:38PM +0800, Huacai Chen wrote: > After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire() > in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3 > has SFB (Store Fill Buffer) and READ_ONCE() may get an old value in a > tight loop. So in smp_cond_load_acquire() we need a __smp_mb() after > every READ_ONCE(). Thanks - modifying smp_cond_load_acquire() is a step better than modifying arch_mcs_spin_lock_contended() to avoid it, but I'm still not sure we've reached the root of the problem. If tight loops using READ_ONCE() are at fault then what's special about smp_cond_load_acquire()? Could other such loops not hit the same problem? Is the scenario you encounter the same as that outlined in the "DATA DEPENDENCY BARRIERS (HISTORICAL)" section of Documentation/memory-barriers.txt by any chance? If so then perhaps it would be better to implement smp_read_barrier_depends(), or just raw read_barrier_depends() depending upon how your SFB functions. Is there any public documentation describing the behaviour of the store fill buffer in Loongson-3? Part of the problem is that I'm still not sure what's actually happening in your system - it would be helpful to have further information in the commit message about what actually happens. For example if you could walk us through an example of the problem step by step in the style of the diagrams you'll see in Documentation/memory-barriers.txt then I think that would help us to see what the best solution here is. I've copied the LKMM maintainers in case they have further input. Thanks, Paul > This patch introduce a Loongson-specific smp_cond_load_acquire(). And > it should be backported to as early as linux-4.5, in which release the > smp_cond_acquire() is introduced. > > Cc: sta...@vger.kernel.org > Signed-off-by: Huacai Chen > --- > arch/mips/include/asm/barrier.h | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h > index a5eb1bb..4ea384d 100644 > --- a/arch/mips/include/asm/barrier.h > +++ b/arch/mips/include/asm/barrier.h > @@ -222,6 +222,23 @@ > #define __smp_mb__before_atomic()__smp_mb__before_llsc() > #define __smp_mb__after_atomic() smp_llsc_mb() > > +#ifdef CONFIG_CPU_LOONGSON3 > +/* Loongson-3 need a __smp_mb() after READ_ONCE() here */ > +#define smp_cond_load_acquire(ptr, cond_expr)\ > +({ \ > + typeof(ptr) __PTR = (ptr); \ > + typeof(*ptr) VAL; \ > + for (;;) { \ > + VAL = READ_ONCE(*__PTR);\ > + __smp_mb(); \ > + if (cond_expr) \ > + break; \ > + cpu_relax();\ > + } \ > + VAL;\ > +}) > +#endif /* CONFIG_CPU_LOONGSON3 */ > + > #include > > #endif /* __ASM_BARRIER_H */ > -- > 2.7.0 >
Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3
Hi, Paul, First of all, could you please check why linux-mips reject e-mails from lemote.com? Of course I can send e-mails by gmail, but my gmail can't receive e-mails from linux-mips since March, 2018. I have already read Documentation/memory-barriers.txt, but I don't think we should define a smp_read_barrier_depends() for Loongson-3. Because Loongson-3's behavior isn't like Alpha, and in syntax, this is not a data-dependent issue. There is no document about Loongson-3's SFB. In my opinion, SFB looks like the L0 cache but sometimes it is out of cache-coherent machanism (L1 cache's cross-core coherency is maintained by hardware, but not always true for SFB). smp_mb() is needed for smp_cond_load_acquire(), but not every READ_ONCE(). Huacai -- Original -- From: "Paul Burton"; Date: Tue, Jun 19, 2018 02:51 AM To: "Huacai Chen"; Cc: "Ralf Baechle"; "James Hogan"; "linux-mips"; "Fuxin Zhang"; "wuzhangjin"; "Huacai Chen"; "stable"; "Alan Stern"; "AndreaParri"; "Will Deacon"; "Peter Zijlstra"; "Boqun Feng"; "Nicholas Piggin"; "David Howells"; "Jade Alglave"; "Luc Maranget"; "Paul E. McKenney"; "Akira Yokosawa"; "linux-kernel"; Subject: Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3 Hi Huacai, On Fri, Jun 15, 2018 at 02:07:38PM +0800, Huacai Chen wrote: > After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire() > in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3 > has SFB (Store Fill Buffer) and READ_ONCE() may get an old value in a > tight loop. So in smp_cond_load_acquire() we need a __smp_mb() after > every READ_ONCE(). Thanks - modifying smp_cond_load_acquire() is a step better than modifying arch_mcs_spin_lock_contended() to avoid it, but I'm still not sure we've reached the root of the problem. If tight loops using READ_ONCE() are at fault then what's special about smp_cond_load_acquire()? Could other such loops not hit the same problem? Is the scenario you encounter the same as that outlined in the "DATA DEPENDENCY BARRIERS (HISTORICAL)" section of Documentation/memory-barriers.txt by any chance? If so then perhaps it would be better to implement smp_read_barrier_depends(), or just raw read_barrier_depends() depending upon how your SFB functions. Is there any public documentation describing the behaviour of the store fill buffer in Loongson-3? Part of the problem is that I'm still not sure what's actually happening in your system - it would be helpful to have further information in the commit message about what actually happens. For example if you could walk us through an example of the problem step by step in the style of the diagrams you'll see in Documentation/memory-barriers.txt then I think that would help us to see what the best solution here is. I've copied the LKMM maintainers in case they have further input. Thanks, Paul > This patch introduce a Loongson-specific smp_cond_load_acquire(). And > it should be backported to as early as linux-4.5, in which release the > smp_cond_acquire() is introduced. > > Cc: sta...@vger.kernel.org > Signed-off-by: Huacai Chen > --- > arch/mips/include/asm/barrier.h | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h > index a5eb1bb..4ea384d 100644 > --- a/arch/mips/include/asm/barrier.h > +++ b/arch/mips/include/asm/barrier.h > @@ -222,6 +222,23 @@ > #define __smp_mb__before_atomic()__smp_mb__before_llsc() > #define __smp_mb__after_atomic() smp_llsc_mb() > > +#ifdef CONFIG_CPU_LOONGSON3 > +/* Loongson-3 need a __smp_mb() after READ_ONCE() here */ > +#define smp_cond_load_acquire(ptr, cond_expr)\ > +({ \ > + typeof(ptr) __PTR = (ptr); \ > + typeof(*ptr) VAL; \ > + for (;;) { \ > + VAL = READ_ONCE(*__PTR);\ > + __smp_mb(); \ > + if (cond_expr) \ > + break; \ > + cpu_relax();\ > + } \ > + VAL;\ > +}) > +#endif /* CONFIG_CPU_LOONGSON3 */ > + > #include > > #endif /* __ASM_BARRIER_H */ > -- > 2.7.0 >
Re: Regression with arm in next with stack protector
Allwinner devices are OK, Exynos devices (except 4210) are OK. -- Original -- From: "Tony Lindgren"<t...@atomide.com>; Date: Mon, Mar 26, 2018 11:37 PM To: "陈华才"<che...@lemote.com>; Cc: "Andrew Morton"<a...@linux-foundation.org>; "Fabio Estevam"<feste...@gmail.com>; "Stephen Rothwell"<s...@canb.auug.org.au>; "Rich Felker"<dal...@libc.org>; "Russell King"<li...@arm.linux.org.uk>; "Yoshinori Sato"<ys...@users.sourceforge.jp>; "linux-kernel"<linux-kernel@vger.kernel.org>; "Ralf Baechle"<r...@linux-mips.org>; "linux-omap"<linux-o...@vger.kernel.org>; "James Hogan"<james.ho...@mips.com>; "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE"<linux-arm-ker...@lists.infradead.org>; Subject: Re: Regression with arm in next with stack protector * 陈华才 <che...@lemote.com> [180326 06:59]: > Hi, Tony and Fabio, > > Could you please upload your kernel binary to somewhere for me? I don't > understand why some ARM boards is OK while others are broken. Well the kernel I'm testing is just current Linux next cross compiled omap2plus_defconfig kernel. I do have few more config options enabled like LOCKDEP and DEBUG_ATOMIC_SLEEP, but I doubt they matter here :) Then I'm using gcc-7.3.0 and binutils-2.30 built with the buildall.git scripts: git://git.infradead.org/users/segher/buildall.git If you still need binaries, let me know. Do you know which arm devices are working with your patch? Regards, Tony
Re: Regression with arm in next with stack protector
Allwinner devices are OK, Exynos devices (except 4210) are OK. -- Original -- From: "Tony Lindgren"; Date: Mon, Mar 26, 2018 11:37 PM To: "陈华才"; Cc: "Andrew Morton"; "Fabio Estevam"; "Stephen Rothwell"; "Rich Felker"; "Russell King"; "Yoshinori Sato"; "linux-kernel"; "Ralf Baechle"; "linux-omap"; "James Hogan"; "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE"; Subject: Re: Regression with arm in next with stack protector * 陈华才 [180326 06:59]: > Hi, Tony and Fabio, > > Could you please upload your kernel binary to somewhere for me? I don't > understand why some ARM boards is OK while others are broken. Well the kernel I'm testing is just current Linux next cross compiled omap2plus_defconfig kernel. I do have few more config options enabled like LOCKDEP and DEBUG_ATOMIC_SLEEP, but I doubt they matter here :) Then I'm using gcc-7.3.0 and binutils-2.30 built with the buildall.git scripts: git://git.infradead.org/users/segher/buildall.git If you still need binaries, let me know. Do you know which arm devices are working with your patch? Regards, Tony
Re: Regression with arm in next with stack protector
Hi, Tony and Fabio, Could you please upload your kernel binary to somewhere for me? I don't understand why some ARM boards is OK while others are broken. Huacai -- Original -- From: "Andrew Morton"; Date: Sat, Mar 24, 2018 03:15 AM To: "Fabio Estevam" ; Cc: "Tony Lindgren" ; "Huacai Chen" ; "Stephen Rothwell" ; "Rich Felker" ; "Russell King" ; "Yoshinori Sato" ; "linux-kernel" ; "Ralf Baechle" ; "linux-omap" ; "James Hogan" ; "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE" ; Subject: Re: Regression with arm in next with stack protector On Fri, 23 Mar 2018 15:45:27 -0300 Fabio Estevam wrote: > On Fri, Mar 23, 2018 at 3:14 PM, Tony Lindgren wrote: > > Hi, > > > > Looks like commit 5638790dadae ("zboot: fix stack protector in > > compressed boot phase") breaks booting on arm. > > Yes, almost all imx are broken in linux-next due to this commit: > https://kernelci.org/soc/imx/job/next/kernel/next-20180323/ Thanks, I dropped it and I expect Stephen will also do that.
Re: Regression with arm in next with stack protector
Hi, Tony and Fabio, Could you please upload your kernel binary to somewhere for me? I don't understand why some ARM boards is OK while others are broken. Huacai -- Original -- From: "Andrew Morton"; Date: Sat, Mar 24, 2018 03:15 AM To: "Fabio Estevam"; Cc: "Tony Lindgren"; "Huacai Chen"; "Stephen Rothwell"; "Rich Felker"; "Russell King"; "Yoshinori Sato"; "linux-kernel"; "Ralf Baechle"; "linux-omap"; "James Hogan"; "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE"; Subject: Re: Regression with arm in next with stack protector On Fri, 23 Mar 2018 15:45:27 -0300 Fabio Estevam wrote: > On Fri, Mar 23, 2018 at 3:14 PM, Tony Lindgren wrote: > > Hi, > > > > Looks like commit 5638790dadae ("zboot: fix stack protector in > > compressed boot phase") breaks booting on arm. > > Yes, almost all imx are broken in linux-next due to this commit: > https://kernelci.org/soc/imx/job/next/kernel/next-20180323/ Thanks, I dropped it and I expect Stephen will also do that.
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
But in b44_init(), there is no device instances. -- Original -- From: "Christoph Hellwig"; Date: Fri, Nov 10, 2017 08:30 PM To: "Huacai Chen" ; Cc: "Christoph Hellwig" ; "Marek Szyprowski" ; "Robin Murphy" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-kernel" ; "Ralf Baechle" ; "James Hogan" ; "linux-mips" ; "James E . J . Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "stable" ; "Michael S . Tsirkin" ; "Pawel Osciak" ; "Kyungmin Park" ; "Michael Chan" ; "Benjamin Herrenschmidt" ; "Ivan Mikhaylov" ; "Tariq Toukan" ; "Andy Gross" ; "Mark A . Greer" ; "Robert Baldyga" ; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() > diff --git a/drivers/net/ethernet/broadcom/b44.c > b/drivers/net/ethernet/broadcom/b44.c > index a1125d1..2f6ffe5 100644 > --- a/drivers/net/ethernet/broadcom/b44.c > +++ b/drivers/net/ethernet/broadcom/b44.c > @@ -2344,6 +2344,10 @@ static int b44_init_one(struct ssb_device *sdev, > struct net_device *dev; > struct b44 *bp; > int err; > + unsigned int dma_desc_align_size = > dma_get_cache_alignment(sdev->dma_dev); > + > + /* Setup paramaters for syncing RX/TX DMA descriptors */ > + dma_desc_sync_size = max_t(unsigned int, dma_desc_align_size, > sizeof(struct dma_desc)); > > instance++; > > @@ -2587,12 +2591,8 @@ static inline void b44_pci_exit(void) > > static int __init b44_init(void) > { > - unsigned int dma_desc_align_size = dma_get_cache_alignment(); > int err; > > - /* Setup paramaters for syncing RX/TX DMA descriptors */ > - dma_desc_sync_size = max_t(unsigned int, dma_desc_align_size, > sizeof(struct dma_desc)); > - This looks wrong - you override a global variable for each probed device.
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
But in b44_init(), there is no device instances. -- Original -- From: "Christoph Hellwig"; Date: Fri, Nov 10, 2017 08:30 PM To: "Huacai Chen"; Cc: "Christoph Hellwig"; "Marek Szyprowski"; "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "James Hogan"; "linux-mips"; "James E . J . Bottomley"; "Martin K . Petersen"; "linux-scsi"; "stable"; "Michael S . Tsirkin"; "Pawel Osciak"; "Kyungmin Park"; "Michael Chan"; "Benjamin Herrenschmidt"; "Ivan Mikhaylov"; "Tariq Toukan"; "Andy Gross"; "Mark A . Greer"; "Robert Baldyga"; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() > diff --git a/drivers/net/ethernet/broadcom/b44.c > b/drivers/net/ethernet/broadcom/b44.c > index a1125d1..2f6ffe5 100644 > --- a/drivers/net/ethernet/broadcom/b44.c > +++ b/drivers/net/ethernet/broadcom/b44.c > @@ -2344,6 +2344,10 @@ static int b44_init_one(struct ssb_device *sdev, > struct net_device *dev; > struct b44 *bp; > int err; > + unsigned int dma_desc_align_size = > dma_get_cache_alignment(sdev->dma_dev); > + > + /* Setup paramaters for syncing RX/TX DMA descriptors */ > + dma_desc_sync_size = max_t(unsigned int, dma_desc_align_size, > sizeof(struct dma_desc)); > > instance++; > > @@ -2587,12 +2591,8 @@ static inline void b44_pci_exit(void) > > static int __init b44_init(void) > { > - unsigned int dma_desc_align_size = dma_get_cache_alignment(); > int err; > > - /* Setup paramaters for syncing RX/TX DMA descriptors */ > - dma_desc_sync_size = max_t(unsigned int, dma_desc_align_size, > sizeof(struct dma_desc)); > - This looks wrong - you override a global variable for each probed device.
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
Only patch 4 can be merged to stable, please ignore cc-stable in the rest. -- Original -- From: "Christoph Hellwig"<h...@lst.de>; Date: Fri, Nov 3, 2017 01:14 PM To: "陈华才"<che...@lemote.com>; Cc: "Marek Szyprowski"<m.szyprow...@samsung.com>; "Christoph Hellwig"<h...@lst.de>; "Robin Murphy"<robin.mur...@arm.com>; "Andrew Morton"<a...@linux-foundation.org>; "Fuxin Zhang"<zhan...@lemote.com>; "linux-kernel"<linux-kernel@vger.kernel.org>; "Ralf Baechle"<r...@linux-mips.org>; "JamesHogan"<james.ho...@imgtec.com>; "linux-mips"<linux-m...@linux-mips.org>; "James E . J .Bottomley"<j...@linux.vnet.ibm.com>; "Martin K . Petersen"<martin.peter...@oracle.com>; "linux-scsi"<linux-s...@vger.kernel.org>; "stable"<sta...@vger.kernel.org>; "Michael S . Tsirkin"<m...@redhat.com>; "Pawel Osciak"<pa...@osciak.com>; "Kyungmin Park"<kyungmin.p...@samsung.com>; "Michael Chan"<michael.c...@broadcom.com>; "Benjamin Herrenschmidt"<b...@kernel.crashing.org>; "Ivan Mikhaylov"<i...@ru.ibm.com>; "Tariq Toukan"<tar...@mellanox.com>; "Andy Gross"<agr...@codeaurora.org>; "Mark A . Greer"<mgr...@animalcreek.com>; "RobertBaldyga"<r.bald...@hackerion.com>; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() I can queue 1 up in the dma-mapping tree, and if I get reviews for the mips and scsi bits I'd be happy to queue those up as well. But I think you'd be better off moving patches 3 and 4 to the front without the dma_get_cache_alignment prototype change so that they can be merged to stable.
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
Only patch 4 can be merged to stable, please ignore cc-stable in the rest. -- Original -- From: "Christoph Hellwig"; Date: Fri, Nov 3, 2017 01:14 PM To: "陈华才"; Cc: "Marek Szyprowski"; "Christoph Hellwig"; "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "JamesHogan"; "linux-mips"; "James E . J .Bottomley"; "Martin K . Petersen"; "linux-scsi"; "stable"; "Michael S . Tsirkin"; "Pawel Osciak"; "Kyungmin Park"; "Michael Chan"; "Benjamin Herrenschmidt"; "Ivan Mikhaylov"; "Tariq Toukan"; "Andy Gross"; "Mark A . Greer"; "RobertBaldyga"; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() I can queue 1 up in the dma-mapping tree, and if I get reviews for the mips and scsi bits I'd be happy to queue those up as well. But I think you'd be better off moving patches 3 and 4 to the front without the dma_get_cache_alignment prototype change so that they can be merged to stable.
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
Why this is still un-merged? Should I remove the cc-stable and resend this series? Huacai -- Original -- From: "陈华才"<che...@lemote.com>; Date: Thu, Oct 26, 2017 02:33 PM To: "Marek Szyprowski"<m.szyprow...@samsung.com>; "Christoph Hellwig"<h...@lst.de>; Cc: "Robin Murphy"<robin.mur...@arm.com>; "Andrew Morton"<a...@linux-foundation.org>; "Fuxin Zhang"<zhan...@lemote.com>; "linux-kernel"<linux-kernel@vger.kernel.org>; "Ralf Baechle"<r...@linux-mips.org>; "JamesHogan"<james.ho...@imgtec.com>; "linux-mips"<linux-m...@linux-mips.org>; "James E . J .Bottomley"<j...@linux.vnet.ibm.com>; "Martin K . Petersen"<martin.peter...@oracle.com>; "linux-scsi"<linux-s...@vger.kernel.org>; "stable"<sta...@vger.kernel.org>; "Michael S . Tsirkin"<m...@redhat.com>; "Pawel Osciak"<pa...@osciak.com>; "Kyungmin Park"<kyungmin.p...@samsung.com>; "Michael Chan"<michael.c...@broadcom.com>; "Benjamin Herrenschmidt"<b...@kernel.crashing.org>; "Ivan Mikhaylov"<i...@ru.ibm.com>; "Tariq Toukan"<tar...@mellanox.com>; "Andy Gross"<agr...@codeaurora.org>; "Mark A . Greer"<mgr...@animalcreek.com>; "RobertBaldyga"<r.bald...@hackerion.com>; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() Maybe my first version is suitable for stable. Huacai -- Original -- From: "Marek Szyprowski"<m.szyprow...@samsung.com>; Date: Wed, Oct 25, 2017 03:21 PM To: "陈华才"<che...@lemote.com>; "Christoph Hellwig"<h...@lst.de>; Cc: "Robin Murphy"<robin.mur...@arm.com>; "Andrew Morton"<a...@linux-foundation.org>; "Fuxin Zhang"<zhan...@lemote.com>; "linux-kernel"<linux-kernel@vger.kernel.org>; "Ralf Baechle"<r...@linux-mips.org>; "JamesHogan"<james.ho...@imgtec.com>; "linux-mips"<linux-m...@linux-mips.org>; "James E . J .Bottomley"<j...@linux.vnet.ibm.com>; "Martin K . Petersen"<martin.peter...@oracle.com>; "linux-scsi"<linux-s...@vger.kernel.org>; "stable"<sta...@vger.kernel.org>; "Michael S . Tsirkin"<m...@redhat.com>; "Pawel Osciak"<pa...@osciak.com>; "Kyungmin Park"<kyungmin.p...@samsung.com>; "Michael Chan"<michael.c...@broadcom.com>; "Benjamin Herrenschmidt"<b...@kernel.crashing.org>; "Ivan Mikhaylov"<i...@ru.ibm.com>; "Tariq Toukan"<tar...@mellanox.com>; "Andy Gross"<agr...@codeaurora.org>; "Mark A . Greer"<mgr...@animalcreek.com>; "RobertBaldyga"<r.bald...@hackerion.com>; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() Hi Huacai, On 2017-10-25 03:22, 陈华才 wrote: > Hi, Marek > > Patch3 is needed for stable, but Patch3 depend on Patch1 and Patch2. Then patch #3 has to be reworked. First change scsi to align the block queue to dma_get_cache_alignment(). This will be safe in all cases and it will not hurt memory usage that much. Such version can be applied first and sent to stable without any dependencies. Please also describe deeply why such change is needed and what issues can be observed without it, on which systems. Then as an optimization add support for per-device cache_alignment (patches #1 and #2). I'm still not convinced that it makes sense to align DMA structures to values less than L1 cache line size. It might hurt performance, because cache coherency has its cost and it is also relevant to multi-core/smp access to any objects that are in the same l1 cache line. Memory savings that might be the results of such lower alignment are probably negligible. > > Huacai > > > -- Original -- > From: "Marek Szyprowski"<m.szyprow...@samsung.com>; > Date: Tue, Oct 24, 2017 09:30 PM > To: "Huacai Chen"<che...@lemote.com>; "Christoph Hellwig"<h...@lst.de>; > Cc: "Robin Murphy"<robin.mur...@arm.com>; "Andrew > Morton"<a...@linux-foundation.org>; "Fuxin Zhang"<zhan...@lemote.com>; > "linux-kernel"<linux-kernel@vger.kernel.org>; "Ralf > Baechle"<r...@linux-mips.org>; "JamesHogan"<james.ho...@imgtec.com>; > "linux-mips"<linux-m...@linux-mips.org>; "James E . J > .Bottomley"
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
Why this is still un-merged? Should I remove the cc-stable and resend this series? Huacai -- Original -- From: "陈华才"; Date: Thu, Oct 26, 2017 02:33 PM To: "Marek Szyprowski"; "Christoph Hellwig"; Cc: "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "JamesHogan"; "linux-mips"; "James E . J .Bottomley"; "Martin K . Petersen"; "linux-scsi"; "stable"; "Michael S . Tsirkin"; "Pawel Osciak"; "Kyungmin Park"; "Michael Chan"; "Benjamin Herrenschmidt"; "Ivan Mikhaylov"; "Tariq Toukan"; "Andy Gross"; "Mark A . Greer"; "RobertBaldyga"; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() Maybe my first version is suitable for stable. Huacai -- Original -- From: "Marek Szyprowski"; Date: Wed, Oct 25, 2017 03:21 PM To: "陈华才"; "Christoph Hellwig"; Cc: "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "JamesHogan"; "linux-mips"; "James E . J .Bottomley"; "Martin K . Petersen"; "linux-scsi"; "stable"; "Michael S . Tsirkin"; "Pawel Osciak"; "Kyungmin Park"; "Michael Chan"; "Benjamin Herrenschmidt"; "Ivan Mikhaylov"; "Tariq Toukan"; "Andy Gross"; "Mark A . Greer"; "RobertBaldyga"; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() Hi Huacai, On 2017-10-25 03:22, 陈华才 wrote: > Hi, Marek > > Patch3 is needed for stable, but Patch3 depend on Patch1 and Patch2. Then patch #3 has to be reworked. First change scsi to align the block queue to dma_get_cache_alignment(). This will be safe in all cases and it will not hurt memory usage that much. Such version can be applied first and sent to stable without any dependencies. Please also describe deeply why such change is needed and what issues can be observed without it, on which systems. Then as an optimization add support for per-device cache_alignment (patches #1 and #2). I'm still not convinced that it makes sense to align DMA structures to values less than L1 cache line size. It might hurt performance, because cache coherency has its cost and it is also relevant to multi-core/smp access to any objects that are in the same l1 cache line. Memory savings that might be the results of such lower alignment are probably negligible. > > Huacai > > > -- Original -- > From: "Marek Szyprowski"; > Date: Tue, Oct 24, 2017 09:30 PM > To: "Huacai Chen"; "Christoph Hellwig"; > Cc: "Robin Murphy"; "Andrew > Morton"; "Fuxin Zhang"; > "linux-kernel"; "Ralf > Baechle"; "JamesHogan"; > "linux-mips"; "James E . J > .Bottomley"; "Martin K . > Petersen"; > "linux-scsi"; "stable"; > "Michael S . Tsirkin"; "Pawel Osciak"; > "Kyungmin Park"; "Michael > Chan"; "Benjamin > Herrenschmidt"; "Ivan Mikhaylov"; > "Tariq Toukan"; "Andy Gross"; > "Mark A . Greer"; "Robert > Baldyga"; > Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() > > > Hi Huacai, > > On 2017-10-23 09:12, Huacai Chen wrote: >> Make dma_get_cache_alignment() to accept a 'dev' argument. As a result, >> it can return different alignments due to different devices' I/O cache >> coherency. >> >> Currently, ARM/ARM64 and MIPS support coherent & noncoherent devices >> co-exist. This may be extended in the future, so add a new function >> pointer (i.e, get_cache_alignment) in 'struct dma_map_ops' as a generic >> solution. >> >> Cc: sta...@vger.kernel.org > I don't think this change should go to stable. > >> Cc: Michael S. Tsirkin >> Cc: Pawel Osciak >> Cc: Marek Szyprowski >> Cc: Kyungmin Park >> Cc: Michael Chan >> Cc: Benjamin Herrenschmidt >> Cc: Ivan Mikhaylov >> Cc: Tariq Toukan >> Cc: Andy Gross >> Cc: Mark A. Greer >> Cc: Robert Baldyga >> Cc: Marek Szyprowski >> Signed-off-by: Huacai Chen >> --- >>drivers/infiniband/hw/mthca/mthca_main.c | 2 +- >>drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 +- >>dr
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
Maybe my first version is suitable for stable. Huacai -- Original -- From: "Marek Szyprowski"<m.szyprow...@samsung.com>; Date: Wed, Oct 25, 2017 03:21 PM To: "陈华才"<che...@lemote.com>; "Christoph Hellwig"<h...@lst.de>; Cc: "Robin Murphy"<robin.mur...@arm.com>; "Andrew Morton"<a...@linux-foundation.org>; "Fuxin Zhang"<zhan...@lemote.com>; "linux-kernel"<linux-kernel@vger.kernel.org>; "Ralf Baechle"<r...@linux-mips.org>; "JamesHogan"<james.ho...@imgtec.com>; "linux-mips"<linux-m...@linux-mips.org>; "James E . J .Bottomley"<j...@linux.vnet.ibm.com>; "Martin K . Petersen"<martin.peter...@oracle.com>; "linux-scsi"<linux-s...@vger.kernel.org>; "stable"<sta...@vger.kernel.org>; "Michael S . Tsirkin"<m...@redhat.com>; "Pawel Osciak"<pa...@osciak.com>; "Kyungmin Park"<kyungmin.p...@samsung.com>; "Michael Chan"<michael.c...@broadcom.com>; "Benjamin Herrenschmidt"<b...@kernel.crashing.org>; "Ivan Mikhaylov"<i...@ru.ibm.com>; "Tariq Toukan"<tar...@mellanox.com>; "Andy Gross"<agr...@codeaurora.org>; "Mark A . Greer"<mgr...@animalcreek.com>; "RobertBaldyga"<r.bald...@hackerion.com>; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() Hi Huacai, On 2017-10-25 03:22, 陈华才 wrote: > Hi, Marek > > Patch3 is needed for stable, but Patch3 depend on Patch1 and Patch2. Then patch #3 has to be reworked. First change scsi to align the block queue to dma_get_cache_alignment(). This will be safe in all cases and it will not hurt memory usage that much. Such version can be applied first and sent to stable without any dependencies. Please also describe deeply why such change is needed and what issues can be observed without it, on which systems. Then as an optimization add support for per-device cache_alignment (patches #1 and #2). I'm still not convinced that it makes sense to align DMA structures to values less than L1 cache line size. It might hurt performance, because cache coherency has its cost and it is also relevant to multi-core/smp access to any objects that are in the same l1 cache line. Memory savings that might be the results of such lower alignment are probably negligible. > > Huacai > > > -- Original -- > From: "Marek Szyprowski"<m.szyprow...@samsung.com>; > Date: Tue, Oct 24, 2017 09:30 PM > To: "Huacai Chen"<che...@lemote.com>; "Christoph Hellwig"<h...@lst.de>; > Cc: "Robin Murphy"<robin.mur...@arm.com>; "Andrew > Morton"<a...@linux-foundation.org>; "Fuxin Zhang"<zhan...@lemote.com>; > "linux-kernel"<linux-kernel@vger.kernel.org>; "Ralf > Baechle"<r...@linux-mips.org>; "JamesHogan"<james.ho...@imgtec.com>; > "linux-mips"<linux-m...@linux-mips.org>; "James E . J > .Bottomley"<j...@linux.vnet.ibm.com>; "Martin K . > Petersen"<martin.peter...@oracle.com>; > "linux-scsi"<linux-s...@vger.kernel.org>; "stable"<sta...@vger.kernel.org>; > "Michael S . Tsirkin"<m...@redhat.com>; "Pawel Osciak"<pa...@osciak.com>; > "Kyungmin Park"<kyungmin.p...@samsung.com>; "Michael > Chan"<michael.c...@broadcom.com>; "Benjamin > Herrenschmidt"<b...@kernel.crashing.org>; "Ivan Mikhaylov"<i...@ru.ibm.com>; > "Tariq Toukan"<tar...@mellanox.com>; "Andy Gross"<agr...@codeaurora.org>; > "Mark A . Greer"<mgr...@animalcreek.com>; "Robert > Baldyga"<r.bald...@hackerion.com>; > Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() > > > Hi Huacai, > > On 2017-10-23 09:12, Huacai Chen wrote: >> Make dma_get_cache_alignment() to accept a 'dev' argument. As a result, >> it can return different alignments due to different devices' I/O cache >> coherency. >> >> Currently, ARM/ARM64 and MIPS support coherent & noncoherent devices >> co-exist. This may be extended in the future, so add a new function >> pointer (i.e, get_cache_alignment) in 'struct dma_map_ops' as a generic >> solution. >> >> Cc: sta...@vger.kernel.org > I don't think this change should go to stable. > >> Cc: Michael S. Tsirkin <m...@redhat.com> >> Cc: Pawel Osciak
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
Maybe my first version is suitable for stable. Huacai -- Original -- From: "Marek Szyprowski"; Date: Wed, Oct 25, 2017 03:21 PM To: "陈华才"; "Christoph Hellwig"; Cc: "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "JamesHogan"; "linux-mips"; "James E . J .Bottomley"; "Martin K . Petersen"; "linux-scsi"; "stable"; "Michael S . Tsirkin"; "Pawel Osciak"; "Kyungmin Park"; "Michael Chan"; "Benjamin Herrenschmidt"; "Ivan Mikhaylov"; "Tariq Toukan"; "Andy Gross"; "Mark A . Greer"; "RobertBaldyga"; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() Hi Huacai, On 2017-10-25 03:22, 陈华才 wrote: > Hi, Marek > > Patch3 is needed for stable, but Patch3 depend on Patch1 and Patch2. Then patch #3 has to be reworked. First change scsi to align the block queue to dma_get_cache_alignment(). This will be safe in all cases and it will not hurt memory usage that much. Such version can be applied first and sent to stable without any dependencies. Please also describe deeply why such change is needed and what issues can be observed without it, on which systems. Then as an optimization add support for per-device cache_alignment (patches #1 and #2). I'm still not convinced that it makes sense to align DMA structures to values less than L1 cache line size. It might hurt performance, because cache coherency has its cost and it is also relevant to multi-core/smp access to any objects that are in the same l1 cache line. Memory savings that might be the results of such lower alignment are probably negligible. > > Huacai > > > -- Original -- > From: "Marek Szyprowski"; > Date: Tue, Oct 24, 2017 09:30 PM > To: "Huacai Chen"; "Christoph Hellwig"; > Cc: "Robin Murphy"; "Andrew > Morton"; "Fuxin Zhang"; > "linux-kernel"; "Ralf > Baechle"; "JamesHogan"; > "linux-mips"; "James E . J > .Bottomley"; "Martin K . > Petersen"; > "linux-scsi"; "stable"; > "Michael S . Tsirkin"; "Pawel Osciak"; > "Kyungmin Park"; "Michael > Chan"; "Benjamin > Herrenschmidt"; "Ivan Mikhaylov"; > "Tariq Toukan"; "Andy Gross"; > "Mark A . Greer"; "Robert > Baldyga"; > Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() > > > Hi Huacai, > > On 2017-10-23 09:12, Huacai Chen wrote: >> Make dma_get_cache_alignment() to accept a 'dev' argument. As a result, >> it can return different alignments due to different devices' I/O cache >> coherency. >> >> Currently, ARM/ARM64 and MIPS support coherent & noncoherent devices >> co-exist. This may be extended in the future, so add a new function >> pointer (i.e, get_cache_alignment) in 'struct dma_map_ops' as a generic >> solution. >> >> Cc: sta...@vger.kernel.org > I don't think this change should go to stable. > >> Cc: Michael S. Tsirkin >> Cc: Pawel Osciak >> Cc: Marek Szyprowski >> Cc: Kyungmin Park >> Cc: Michael Chan >> Cc: Benjamin Herrenschmidt >> Cc: Ivan Mikhaylov >> Cc: Tariq Toukan >> Cc: Andy Gross >> Cc: Mark A. Greer >> Cc: Robert Baldyga >> Cc: Marek Szyprowski >> Signed-off-by: Huacai Chen >> --- >>drivers/infiniband/hw/mthca/mthca_main.c | 2 +- >>drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 +- >>drivers/net/ethernet/broadcom/b44.c| 8 +- >>drivers/net/ethernet/ibm/emac/core.c | 32 +++-- >>drivers/net/ethernet/ibm/emac/core.h | 14 +- >>drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- >>drivers/spi/spi-qup.c | 4 +- >>drivers/tty/serial/mpsc.c | 179 >> + >>drivers/tty/serial/samsung.c | 14 +- >>include/linux/dma-mapping.h| 17 ++- > For videobuf2-dma-contig, serial/samsung and dma-mapping.h: > > Acked-by: Marek Szyprowski > > >>10 files changed, 150 insertions(+), 124 deletions(-) >> >> diff --git a/drivers/infiniband/hw/mthca/mthca_main.c >> b/drivers/infiniband/hw/mthca/mthca_main.c >> index e36a9bc..078fe8d 100644 >> --- a/drivers/infiniband/hw/mthca/
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
Hi, Marek Patch3 is needed for stable, but Patch3 depend on Patch1 and Patch2. Huacai -- Original -- From: "Marek Szyprowski"; Date: Tue, Oct 24, 2017 09:30 PM To: "Huacai Chen" ; "Christoph Hellwig" ; Cc: "Robin Murphy" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-kernel" ; "Ralf Baechle" ; "JamesHogan" ; "linux-mips" ; "James E . J .Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "stable" ; "Michael S . Tsirkin" ; "Pawel Osciak" ; "Kyungmin Park" ; "Michael Chan" ; "Benjamin Herrenschmidt" ; "Ivan Mikhaylov" ; "Tariq Toukan" ; "Andy Gross" ; "Mark A . Greer" ; "Robert Baldyga" ; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() Hi Huacai, On 2017-10-23 09:12, Huacai Chen wrote: > Make dma_get_cache_alignment() to accept a 'dev' argument. As a result, > it can return different alignments due to different devices' I/O cache > coherency. > > Currently, ARM/ARM64 and MIPS support coherent & noncoherent devices > co-exist. This may be extended in the future, so add a new function > pointer (i.e, get_cache_alignment) in 'struct dma_map_ops' as a generic > solution. > > Cc: sta...@vger.kernel.org I don't think this change should go to stable. > Cc: Michael S. Tsirkin > Cc: Pawel Osciak > Cc: Marek Szyprowski > Cc: Kyungmin Park > Cc: Michael Chan > Cc: Benjamin Herrenschmidt > Cc: Ivan Mikhaylov > Cc: Tariq Toukan > Cc: Andy Gross > Cc: Mark A. Greer > Cc: Robert Baldyga > Cc: Marek Szyprowski > Signed-off-by: Huacai Chen > --- > drivers/infiniband/hw/mthca/mthca_main.c | 2 +- > drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 +- > drivers/net/ethernet/broadcom/b44.c| 8 +- > drivers/net/ethernet/ibm/emac/core.c | 32 +++-- > drivers/net/ethernet/ibm/emac/core.h | 14 +- > drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- > drivers/spi/spi-qup.c | 4 +- > drivers/tty/serial/mpsc.c | 179 > + > drivers/tty/serial/samsung.c | 14 +- > include/linux/dma-mapping.h| 17 ++- For videobuf2-dma-contig, serial/samsung and dma-mapping.h: Acked-by: Marek Szyprowski > 10 files changed, 150 insertions(+), 124 deletions(-) > > diff --git a/drivers/infiniband/hw/mthca/mthca_main.c > b/drivers/infiniband/hw/mthca/mthca_main.c > index e36a9bc..078fe8d 100644 > --- a/drivers/infiniband/hw/mthca/mthca_main.c > +++ b/drivers/infiniband/hw/mthca/mthca_main.c > @@ -416,7 +416,7 @@ static int mthca_init_icm(struct mthca_dev *mdev, > > /* CPU writes to non-reserved MTTs, while HCA might DMA to reserved > mtts */ > mdev->limits.reserved_mtts = ALIGN(mdev->limits.reserved_mtts * > mdev->limits.mtt_seg_size, > -dma_get_cache_alignment()) / > mdev->limits.mtt_seg_size; > + > dma_get_cache_alignment(>pdev->dev)) / mdev->limits.mtt_seg_size; > > mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, > init_hca->mtt_base, > > mdev->limits.mtt_seg_size, > diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c > b/drivers/media/v4l2-core/videobuf2-dma-contig.c > index 9f389f3..1f6a9b7 100644 > --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c > +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c > @@ -484,7 +484,7 @@ static void *vb2_dc_get_userptr(struct device *dev, > unsigned long vaddr, > int ret = 0; > struct sg_table *sgt; > unsigned long contig_size; > - unsigned long dma_align = dma_get_cache_alignment(); > + unsigned long dma_align = dma_get_cache_alignment(dev); > > /* Only cache aligned DMA transfers are reliable */ > if (!IS_ALIGNED(vaddr | size, dma_align)) { > diff --git a/drivers/net/ethernet/broadcom/b44.c > b/drivers/net/ethernet/broadcom/b44.c > index a1125d1..2f6ffe5 100644 > --- a/drivers/net/ethernet/broadcom/b44.c > +++ b/drivers/net/ethernet/broadcom/b44.c > @@ -2344,6 +2344,10 @@
Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment()
Hi, Marek Patch3 is needed for stable, but Patch3 depend on Patch1 and Patch2. Huacai -- Original -- From: "Marek Szyprowski"; Date: Tue, Oct 24, 2017 09:30 PM To: "Huacai Chen"; "Christoph Hellwig"; Cc: "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "JamesHogan"; "linux-mips"; "James E . J .Bottomley"; "Martin K . Petersen"; "linux-scsi"; "stable"; "Michael S . Tsirkin"; "Pawel Osciak"; "Kyungmin Park"; "Michael Chan"; "Benjamin Herrenschmidt"; "Ivan Mikhaylov"; "Tariq Toukan"; "Andy Gross"; "Mark A . Greer"; "Robert Baldyga"; Subject: Re: [PATCH V9 1/4] dma-mapping: Rework dma_get_cache_alignment() Hi Huacai, On 2017-10-23 09:12, Huacai Chen wrote: > Make dma_get_cache_alignment() to accept a 'dev' argument. As a result, > it can return different alignments due to different devices' I/O cache > coherency. > > Currently, ARM/ARM64 and MIPS support coherent & noncoherent devices > co-exist. This may be extended in the future, so add a new function > pointer (i.e, get_cache_alignment) in 'struct dma_map_ops' as a generic > solution. > > Cc: sta...@vger.kernel.org I don't think this change should go to stable. > Cc: Michael S. Tsirkin > Cc: Pawel Osciak > Cc: Marek Szyprowski > Cc: Kyungmin Park > Cc: Michael Chan > Cc: Benjamin Herrenschmidt > Cc: Ivan Mikhaylov > Cc: Tariq Toukan > Cc: Andy Gross > Cc: Mark A. Greer > Cc: Robert Baldyga > Cc: Marek Szyprowski > Signed-off-by: Huacai Chen > --- > drivers/infiniband/hw/mthca/mthca_main.c | 2 +- > drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 +- > drivers/net/ethernet/broadcom/b44.c| 8 +- > drivers/net/ethernet/ibm/emac/core.c | 32 +++-- > drivers/net/ethernet/ibm/emac/core.h | 14 +- > drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- > drivers/spi/spi-qup.c | 4 +- > drivers/tty/serial/mpsc.c | 179 > + > drivers/tty/serial/samsung.c | 14 +- > include/linux/dma-mapping.h| 17 ++- For videobuf2-dma-contig, serial/samsung and dma-mapping.h: Acked-by: Marek Szyprowski > 10 files changed, 150 insertions(+), 124 deletions(-) > > diff --git a/drivers/infiniband/hw/mthca/mthca_main.c > b/drivers/infiniband/hw/mthca/mthca_main.c > index e36a9bc..078fe8d 100644 > --- a/drivers/infiniband/hw/mthca/mthca_main.c > +++ b/drivers/infiniband/hw/mthca/mthca_main.c > @@ -416,7 +416,7 @@ static int mthca_init_icm(struct mthca_dev *mdev, > > /* CPU writes to non-reserved MTTs, while HCA might DMA to reserved > mtts */ > mdev->limits.reserved_mtts = ALIGN(mdev->limits.reserved_mtts * > mdev->limits.mtt_seg_size, > -dma_get_cache_alignment()) / > mdev->limits.mtt_seg_size; > + > dma_get_cache_alignment(>pdev->dev)) / mdev->limits.mtt_seg_size; > > mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, > init_hca->mtt_base, > > mdev->limits.mtt_seg_size, > diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c > b/drivers/media/v4l2-core/videobuf2-dma-contig.c > index 9f389f3..1f6a9b7 100644 > --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c > +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c > @@ -484,7 +484,7 @@ static void *vb2_dc_get_userptr(struct device *dev, > unsigned long vaddr, > int ret = 0; > struct sg_table *sgt; > unsigned long contig_size; > - unsigned long dma_align = dma_get_cache_alignment(); > + unsigned long dma_align = dma_get_cache_alignment(dev); > > /* Only cache aligned DMA transfers are reliable */ > if (!IS_ALIGNED(vaddr | size, dma_align)) { > diff --git a/drivers/net/ethernet/broadcom/b44.c > b/drivers/net/ethernet/broadcom/b44.c > index a1125d1..2f6ffe5 100644 > --- a/drivers/net/ethernet/broadcom/b44.c > +++ b/drivers/net/ethernet/broadcom/b44.c > @@ -2344,6 +2344,10 @@ static int b44_init_one(struct ssb_device *sdev, > struct net_device *dev; > struct b44 *bp; > int err; > + unsigned int dma_desc_align_size = > dma_get_cache_alignment(sdev->dma_dev); > + > + /* Setup paramaters for syncing RX/TX DMA descriptors */ > + dma_desc_sync_size = max_t(unsigned int, dma_desc_align_size, > sizeof(struct dma_desc)); > > instance++; > > @@ -2587,12 +2591,8 @@ static inline void b44_pci_exit(void) > > static int __init b44_init(void) > { > - unsigned int dma_desc_align_size = dma_get_cache_alignment(); > int err; > > - /* Setup paramaters for syncing RX/TX DMA descriptors */ > - dma_desc_sync_size = max_t(unsigned int, dma_desc_align_size, > sizeof(struct dma_desc)); > - > err = b44_pci_init(); > if (err) > return err; > diff --git
Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment()
Hi, Matt, I found that 4ee34ea3a12396f35b26d90a094c75db ("libata: Align ata_device's id on a cacheline") can resolve everything. Because the size of id[ATA_ID_WORDS] is already aligned and devslp_timing needn't to be aligned. So, In V9 of this series I will drop this patch. Why I had problems before? because I used linux-4.4. Huacai -- Original -- From: "Matt Redfearn"; Date: Thu, Oct 19, 2017 03:52 PM To: "Tejun Heo" ; "Huacai Chen" ; Cc: "Christoph Hellwig" ; "Marek Szyprowski" ; "Robin Murphy" ; "AndrewMorton" ; "Fuxin Zhang" ; "linux-kernel" ; "Ralf Baechle" ; "JamesHogan" ; "linux-mips" ; "James E . J .Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "linux-ide" ; "stable" ; Subject: Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment() On 18/10/17 14:03, Tejun Heo wrote: > On Tue, Oct 17, 2017 at 04:05:42PM +0800, Huacai Chen wrote: >> In non-coherent DMA mode, kernel uses cache flushing operations to >> maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer >> should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer >> and a kernel structure share a same cache line, and if the kernel >> structure has dirty data, cache_invalidate (no writeback) will cause >> data corruption. >> >> Cc: sta...@vger.kernel.org >> Signed-off-by: Huacai Chen >> --- >> drivers/ata/libata-core.c | 15 +-- >> 1 file changed, 13 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c >> index ee4c1ec..e134955 100644 >> --- a/drivers/ata/libata-core.c >> +++ b/drivers/ata/libata-core.c >> @@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct >> ata_device *adev) >> unsigned int ata_do_dev_read_id(struct ata_device *dev, >> struct ata_taskfile *tf, u16 *id) >> { >> -return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, >> - id, sizeof(id[0]) * ATA_ID_WORDS, 0); >> +u16 *devid; >> +int res, size = sizeof(u16) * ATA_ID_WORDS; >> + >> +if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev))) >> +res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, >> size, 0); >> +else { >> +devid = kmalloc(size, GFP_KERNEL); >> +res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, >> size, 0); >> +memcpy(id, devid, size); >> +kfree(devid); >> +} >> + >> +return res; > Hmm... I think it'd be a lot better to ensure that the buffers are > aligned properly to begin with. There are only two buffers which are > used for id reading - ata_port->sector_buf and ata_device->id. Both > are embedded arrays but making them separately allocated aligned > buffers shouldn't be difficult. > > Thanks. FWIW, I agree that the buffers used for DMA should be split out from the structure. We ran into this problem on MIPS last year, 4ee34ea3a12396f35b26d90a094c75db95080baa ("libata: Align ata_device's id on a cacheline") partially fixed it, but likely should have also cacheline aligned the following devslp_timing in the struct such that we guarantee that members of the struct not used for DMA do not share the same cacheline as the DMA buffer. Not having this means that architectures, such as MIPS, which in some cases have to perform manual invalidation of DMA buffer can clobber valid adjacent data if it is in the same cacheline. Thanks, Matt
Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment()
Hi, Matt, I found that 4ee34ea3a12396f35b26d90a094c75db ("libata: Align ata_device's id on a cacheline") can resolve everything. Because the size of id[ATA_ID_WORDS] is already aligned and devslp_timing needn't to be aligned. So, In V9 of this series I will drop this patch. Why I had problems before? because I used linux-4.4. Huacai -- Original -- From: "Matt Redfearn"; Date: Thu, Oct 19, 2017 03:52 PM To: "Tejun Heo"; "Huacai Chen"; Cc: "Christoph Hellwig"; "Marek Szyprowski"; "Robin Murphy"; "AndrewMorton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "JamesHogan"; "linux-mips"; "James E . J .Bottomley"; "Martin K . Petersen"; "linux-scsi"; "linux-ide"; "stable"; Subject: Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment() On 18/10/17 14:03, Tejun Heo wrote: > On Tue, Oct 17, 2017 at 04:05:42PM +0800, Huacai Chen wrote: >> In non-coherent DMA mode, kernel uses cache flushing operations to >> maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer >> should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer >> and a kernel structure share a same cache line, and if the kernel >> structure has dirty data, cache_invalidate (no writeback) will cause >> data corruption. >> >> Cc: sta...@vger.kernel.org >> Signed-off-by: Huacai Chen >> --- >> drivers/ata/libata-core.c | 15 +-- >> 1 file changed, 13 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c >> index ee4c1ec..e134955 100644 >> --- a/drivers/ata/libata-core.c >> +++ b/drivers/ata/libata-core.c >> @@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct >> ata_device *adev) >> unsigned int ata_do_dev_read_id(struct ata_device *dev, >> struct ata_taskfile *tf, u16 *id) >> { >> -return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, >> - id, sizeof(id[0]) * ATA_ID_WORDS, 0); >> +u16 *devid; >> +int res, size = sizeof(u16) * ATA_ID_WORDS; >> + >> +if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev))) >> +res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, >> size, 0); >> +else { >> +devid = kmalloc(size, GFP_KERNEL); >> +res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, >> size, 0); >> +memcpy(id, devid, size); >> +kfree(devid); >> +} >> + >> +return res; > Hmm... I think it'd be a lot better to ensure that the buffers are > aligned properly to begin with. There are only two buffers which are > used for id reading - ata_port->sector_buf and ata_device->id. Both > are embedded arrays but making them separately allocated aligned > buffers shouldn't be difficult. > > Thanks. FWIW, I agree that the buffers used for DMA should be split out from the structure. We ran into this problem on MIPS last year, 4ee34ea3a12396f35b26d90a094c75db95080baa ("libata: Align ata_device's id on a cacheline") partially fixed it, but likely should have also cacheline aligned the following devslp_timing in the struct such that we guarantee that members of the struct not used for DMA do not share the same cacheline as the DMA buffer. Not having this means that architectures, such as MIPS, which in some cases have to perform manual invalidation of DMA buffer can clobber valid adjacent data if it is in the same cacheline. Thanks, Matt
Re: [PATCH V8 4/5] libsas: Align SMP req/resp todma_get_cache_alignment()
Hi, Marek, Yes, I know in include/linux/slab.h, there is #define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN But I observed that the req/resp isn't aligned to ARCH_DMA_MINALIGN and I don't know why. Problems I experienced is: In non-coherent mode, mvsas with an expander cannot detect any disks. Huacai -- Original -- From: "Marek Szyprowski"; Date: Tue, Oct 17, 2017 07:55 PM To: "Huacai Chen" ; "Christoph Hellwig" ; Cc: "Robin Murphy" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-kernel" ; "Ralf Baechle" ; "JamesHogan" ; "linux-mips" ; "James E . J .Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "Tejun Heo" ; "linux-ide" ; "stable" ; Subject: Re: [PATCH V8 4/5] libsas: Align SMP req/resp todma_get_cache_alignment() Hi Huacai, On 2017-10-17 10:05, Huacai Chen wrote: > In non-coherent DMA mode, kernel uses cache flushing operations to > maintain I/O coherency, so libsas's SMP request/response should be > aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer and a kernel > structure share a same cache line, and if the kernel structure has > dirty data, cache_invalidate (no writeback) will cause data corruption. > > Cc: sta...@vger.kernel.org > Signed-off-by: Huacai Chen > --- > drivers/scsi/libsas/sas_expander.c | 93 > +++--- > 1 file changed, 57 insertions(+), 36 deletions(-) > > diff --git a/drivers/scsi/libsas/sas_expander.c > b/drivers/scsi/libsas/sas_expander.c > index 6b4fd23..124a44b 100644 > --- a/drivers/scsi/libsas/sas_expander.c > +++ b/drivers/scsi/libsas/sas_expander.c > @@ -164,17 +164,17 @@ static int smp_execute_task(struct domain_device *dev, > void *req, int req_size, > > /* -- Allocations -- */ > > -static inline void *alloc_smp_req(int size) > +static inline void *alloc_smp_req(int size, int align) > { > - u8 *p = kzalloc(size, GFP_KERNEL); > + u8 *p = kzalloc(ALIGN(size, align), GFP_KERNEL); If I remember correctly, kernel guarantees that each kmalloced buffer is always at least aligned to the CPU cache line, so CPU cache can be invalidated on the allocated buffer without corrupting anything else. Taking this into account, I wonder if the above change make sense. Have you experienced any problems without this change? > if (p) > p[0] = SMP_REQUEST; > return p; > } > > -static inline void *alloc_smp_resp(int size) > +static inline void *alloc_smp_resp(int size, int align) > { > - return kzalloc(size, GFP_KERNEL); > + return kzalloc(ALIGN(size, align), GFP_KERNEL); Save a above. > } > > static char sas_route_char(struct domain_device *dev, struct ex_phy *phy) > @@ -403,15 +403,17 @@ static int sas_ex_phy_discover_helper(struct > domain_device *dev, u8 *disc_req, > int sas_ex_phy_discover(struct domain_device *dev, int single) > { > struct expander_device *ex = >ex_dev; > - int res = 0; > + int res = 0, align; > u8 *disc_req; > u8 *disc_resp; > > - disc_req = alloc_smp_req(DISCOVER_REQ_SIZE); > + align = dma_get_cache_alignment(>phy->dev); > + > + disc_req = alloc_smp_req(DISCOVER_REQ_SIZE, align); > if (!disc_req) > return -ENOMEM; > > - disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE); > + disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE, align); > if (!disc_resp) { > kfree(disc_req); > return -ENOMEM; > @@ -480,14 +482,15 @@ static int sas_ex_general(struct domain_device *dev) > { > u8 *rg_req; > struct smp_resp *rg_resp; > - int res; > - int i; > + int i, res, align; > > - rg_req = alloc_smp_req(RG_REQ_SIZE); > + align = dma_get_cache_alignment(>phy->dev); > + > + rg_req = alloc_smp_req(RG_REQ_SIZE, align); > if (!rg_req) > return -ENOMEM; > > - rg_resp = alloc_smp_resp(RG_RESP_SIZE); > + rg_resp = alloc_smp_resp(RG_RESP_SIZE, align); > if (!rg_resp) { > kfree(rg_req); > return -ENOMEM; > @@ -552,13 +555,15 @@ static int sas_ex_manuf_info(struct domain_device *dev) > { > u8 *mi_req; > u8 *mi_resp; > - int res; > + int res, align; > > - mi_req = alloc_smp_req(MI_REQ_SIZE); > + align = dma_get_cache_alignment(>phy->dev); > + > + mi_req = alloc_smp_req(MI_REQ_SIZE, align); > if (!mi_req) > return -ENOMEM; > > - mi_resp = alloc_smp_resp(MI_RESP_SIZE); > + mi_resp = alloc_smp_resp(MI_RESP_SIZE, align); > if (!mi_resp) { > kfree(mi_req);
Re: [PATCH V8 4/5] libsas: Align SMP req/resp todma_get_cache_alignment()
Hi, Marek, Yes, I know in include/linux/slab.h, there is #define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN But I observed that the req/resp isn't aligned to ARCH_DMA_MINALIGN and I don't know why. Problems I experienced is: In non-coherent mode, mvsas with an expander cannot detect any disks. Huacai -- Original -- From: "Marek Szyprowski"; Date: Tue, Oct 17, 2017 07:55 PM To: "Huacai Chen"; "Christoph Hellwig"; Cc: "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "JamesHogan"; "linux-mips"; "James E . J .Bottomley"; "Martin K . Petersen"; "linux-scsi"; "Tejun Heo"; "linux-ide"; "stable"; Subject: Re: [PATCH V8 4/5] libsas: Align SMP req/resp todma_get_cache_alignment() Hi Huacai, On 2017-10-17 10:05, Huacai Chen wrote: > In non-coherent DMA mode, kernel uses cache flushing operations to > maintain I/O coherency, so libsas's SMP request/response should be > aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer and a kernel > structure share a same cache line, and if the kernel structure has > dirty data, cache_invalidate (no writeback) will cause data corruption. > > Cc: sta...@vger.kernel.org > Signed-off-by: Huacai Chen > --- > drivers/scsi/libsas/sas_expander.c | 93 > +++--- > 1 file changed, 57 insertions(+), 36 deletions(-) > > diff --git a/drivers/scsi/libsas/sas_expander.c > b/drivers/scsi/libsas/sas_expander.c > index 6b4fd23..124a44b 100644 > --- a/drivers/scsi/libsas/sas_expander.c > +++ b/drivers/scsi/libsas/sas_expander.c > @@ -164,17 +164,17 @@ static int smp_execute_task(struct domain_device *dev, > void *req, int req_size, > > /* -- Allocations -- */ > > -static inline void *alloc_smp_req(int size) > +static inline void *alloc_smp_req(int size, int align) > { > - u8 *p = kzalloc(size, GFP_KERNEL); > + u8 *p = kzalloc(ALIGN(size, align), GFP_KERNEL); If I remember correctly, kernel guarantees that each kmalloced buffer is always at least aligned to the CPU cache line, so CPU cache can be invalidated on the allocated buffer without corrupting anything else. Taking this into account, I wonder if the above change make sense. Have you experienced any problems without this change? > if (p) > p[0] = SMP_REQUEST; > return p; > } > > -static inline void *alloc_smp_resp(int size) > +static inline void *alloc_smp_resp(int size, int align) > { > - return kzalloc(size, GFP_KERNEL); > + return kzalloc(ALIGN(size, align), GFP_KERNEL); Save a above. > } > > static char sas_route_char(struct domain_device *dev, struct ex_phy *phy) > @@ -403,15 +403,17 @@ static int sas_ex_phy_discover_helper(struct > domain_device *dev, u8 *disc_req, > int sas_ex_phy_discover(struct domain_device *dev, int single) > { > struct expander_device *ex = >ex_dev; > - int res = 0; > + int res = 0, align; > u8 *disc_req; > u8 *disc_resp; > > - disc_req = alloc_smp_req(DISCOVER_REQ_SIZE); > + align = dma_get_cache_alignment(>phy->dev); > + > + disc_req = alloc_smp_req(DISCOVER_REQ_SIZE, align); > if (!disc_req) > return -ENOMEM; > > - disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE); > + disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE, align); > if (!disc_resp) { > kfree(disc_req); > return -ENOMEM; > @@ -480,14 +482,15 @@ static int sas_ex_general(struct domain_device *dev) > { > u8 *rg_req; > struct smp_resp *rg_resp; > - int res; > - int i; > + int i, res, align; > > - rg_req = alloc_smp_req(RG_REQ_SIZE); > + align = dma_get_cache_alignment(>phy->dev); > + > + rg_req = alloc_smp_req(RG_REQ_SIZE, align); > if (!rg_req) > return -ENOMEM; > > - rg_resp = alloc_smp_resp(RG_RESP_SIZE); > + rg_resp = alloc_smp_resp(RG_RESP_SIZE, align); > if (!rg_resp) { > kfree(rg_req); > return -ENOMEM; > @@ -552,13 +555,15 @@ static int sas_ex_manuf_info(struct domain_device *dev) > { > u8 *mi_req; > u8 *mi_resp; > - int res; > + int res, align; > > - mi_req = alloc_smp_req(MI_REQ_SIZE); > + align = dma_get_cache_alignment(>phy->dev); > + > + mi_req = alloc_smp_req(MI_REQ_SIZE, align); > if (!mi_req) > return -ENOMEM; > > - mi_resp = alloc_smp_resp(MI_RESP_SIZE); > + mi_resp = alloc_smp_resp(MI_RESP_SIZE, align); > if (!mi_resp) { > kfree(mi_req); > return -ENOMEM; > @@ -593,13 +598,15 @@ int sas_smp_phy_control(struct domain_device *dev, int > phy_id, > { > u8 *pc_req; > u8 *pc_resp; > - int res; > + int res, align; > + > + align = dma_get_cache_alignment(>phy->dev); > > - pc_req = alloc_smp_req(PC_REQ_SIZE); > + pc_req = alloc_smp_req(PC_REQ_SIZE, align); > if (!pc_req) >
Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment()
Hi, Sergei, Without this patch, in non-coherent mode, ata_do_set_mode() return with "no PIO support", and disk detection fails. Huacai -- Original -- From: "Sergei Shtylyov"; Date: Tue, Oct 17, 2017 05:43 PM To: "Huacai Chen" ; "Christoph Hellwig" ; Cc: "Marek Szyprowski" ; "Robin Murphy" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-kernel" ; "Ralf Baechle" ; "James Hogan" ; "linux-mips" ; "James E . J . Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "Tejun Heo" ; "linux-ide" ; "stable" ; Subject: Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment() On 10/17/2017 11:05 AM, Huacai Chen wrote: > In non-coherent DMA mode, kernel uses cache flushing operations to > maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer > should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer > and a kernel structure share a same cache line, and if the kernel > structure has dirty data, cache_invalidate (no writeback) will cause > data corruption. > > Cc: sta...@vger.kernel.org > Signed-off-by: Huacai Chen > --- > drivers/ata/libata-core.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index ee4c1ec..e134955 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct > ata_device *adev) > unsigned int ata_do_dev_read_id(struct ata_device *dev, > struct ata_taskfile *tf, u16 *id) > { > - return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, > - id, sizeof(id[0]) * ATA_ID_WORDS, 0); > + u16 *devid; > + int res, size = sizeof(u16) * ATA_ID_WORDS; > + > + if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev))) > + res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, > size, 0); > + else { > + devid = kmalloc(size, GFP_KERNEL); > + res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, > size, 0); > + memcpy(id, devid, size); > + kfree(devid); > + } > + > + return res; This function is called only for the PIO mode commands, so I doubt this is necessary... MBR, Sergei
Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment()
Hi, Sergei, Without this patch, in non-coherent mode, ata_do_set_mode() return with "no PIO support", and disk detection fails. Huacai -- Original -- From: "Sergei Shtylyov"; Date: Tue, Oct 17, 2017 05:43 PM To: "Huacai Chen"; "Christoph Hellwig"; Cc: "Marek Szyprowski"; "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "James Hogan"; "linux-mips"; "James E . J . Bottomley"; "Martin K . Petersen"; "linux-scsi"; "Tejun Heo"; "linux-ide"; "stable"; Subject: Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment() On 10/17/2017 11:05 AM, Huacai Chen wrote: > In non-coherent DMA mode, kernel uses cache flushing operations to > maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer > should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer > and a kernel structure share a same cache line, and if the kernel > structure has dirty data, cache_invalidate (no writeback) will cause > data corruption. > > Cc: sta...@vger.kernel.org > Signed-off-by: Huacai Chen > --- > drivers/ata/libata-core.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index ee4c1ec..e134955 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct > ata_device *adev) > unsigned int ata_do_dev_read_id(struct ata_device *dev, > struct ata_taskfile *tf, u16 *id) > { > - return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, > - id, sizeof(id[0]) * ATA_ID_WORDS, 0); > + u16 *devid; > + int res, size = sizeof(u16) * ATA_ID_WORDS; > + > + if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev))) > + res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, > size, 0); > + else { > + devid = kmalloc(size, GFP_KERNEL); > + res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, > size, 0); > + memcpy(id, devid, size); > + kfree(devid); > + } > + > + return res; This function is called only for the PIO mode commands, so I doubt this is necessary... MBR, Sergei
Re: [PATCH V7 1/2] dma-mapping: Rework dma_get_cache_alignment()
Hi, Christoph, Can I put the declaration in asm/dma-coherence.h? And, last time you said it is OK to pass a NULL to dma_get_cache_alignment() and cc all driver maintainers. I have do so. Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 25, 2017 08:51 PM To: "Huacai Chen" ; Cc: "Christoph Hellwig" ; "Marek Szyprowski" ; "Robin Murphy" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-kernel" ; "Ralf Baechle" ; "James Hogan" ; "linux-mips" ; "James E . J . Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "Roland Dreier" ; "Pawel Osciak" ; "Kyungmin Park" ; "Michael Chan" ; "Benjamin Herrenschmidt" ; "Ivan Mikhaylov" ; "Tariq Toukan" ; "Andy Gross" ; "Mark A . Greer" ; "Robert Baldyga" ; "stable" ; Subject: Re: [PATCH V7 1/2] dma-mapping: Rework dma_get_cache_alignment() > index aba7138..e2c5d9e 100644 > --- a/arch/mips/include/asm/dma-mapping.h > +++ b/arch/mips/include/asm/dma-mapping.h > @@ -39,4 +39,6 @@ static inline void arch_setup_dma_ops(struct device *dev, > u64 dma_base, > #endif > } > > +int mips_get_cache_alignment(struct device *dev); All the other mips generic dma helpers are prefixed mips_dma_* so it might make sense to follow that. Also please don't add arch-local helpers to asm/dma-mapping.h - this is a header used by linux/dma-mapping.h and should not contain implementation details if avoidable. > +dma_get_cache_alignment(NULL)) / > mdev->limits.mtt_seg_size; As said before - please don't pass NULL to this function but the proper device, which would be >pdev->dev in this case for example.
Re: [PATCH V7 1/2] dma-mapping: Rework dma_get_cache_alignment()
Hi, Christoph, Can I put the declaration in asm/dma-coherence.h? And, last time you said it is OK to pass a NULL to dma_get_cache_alignment() and cc all driver maintainers. I have do so. Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 25, 2017 08:51 PM To: "Huacai Chen"; Cc: "Christoph Hellwig"; "Marek Szyprowski"; "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "James Hogan"; "linux-mips"; "James E . J . Bottomley"; "Martin K . Petersen"; "linux-scsi"; "Roland Dreier"; "Pawel Osciak"; "Kyungmin Park"; "Michael Chan"; "Benjamin Herrenschmidt"; "Ivan Mikhaylov"; "Tariq Toukan"; "Andy Gross"; "Mark A . Greer"; "Robert Baldyga"; "stable"; Subject: Re: [PATCH V7 1/2] dma-mapping: Rework dma_get_cache_alignment() > index aba7138..e2c5d9e 100644 > --- a/arch/mips/include/asm/dma-mapping.h > +++ b/arch/mips/include/asm/dma-mapping.h > @@ -39,4 +39,6 @@ static inline void arch_setup_dma_ops(struct device *dev, > u64 dma_base, > #endif > } > > +int mips_get_cache_alignment(struct device *dev); All the other mips generic dma helpers are prefixed mips_dma_* so it might make sense to follow that. Also please don't add arch-local helpers to asm/dma-mapping.h - this is a header used by linux/dma-mapping.h and should not contain implementation details if avoidable. > +dma_get_cache_alignment(NULL)) / > mdev->limits.mtt_seg_size; As said before - please don't pass NULL to this function but the proper device, which would be >pdev->dev in this case for example.
Re: [PATCH V7 1/2] dma-mapping: Rework dma_get_cache_alignment()
Hi, Robin, Can ARM/ARM64 use the same implementation as MIPS? Or I just do MIPS things is OK? Huacai -- Original -- From: "Robin Murphy"; Date: Mon, Sep 25, 2017 08:57 PM To: "Huacai Chen" ; "Christoph Hellwig" ; Cc: "Marek Szyprowski" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-kernel" ; "Ralf Baechle" ; "James Hogan" ; "linux-mips" ; "James E . J . Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "Roland Dreier" ; "Pawel Osciak" ; "Kyungmin Park" ; "Michael Chan" ; "Benjamin Herrenschmidt" ; "Ivan Mikhaylov" ; "Tariq Toukan" ; "Andy Gross" ; "Mark A . Greer" ; "Robert Baldyga" ; "stable" ; Subject: Re: [PATCH V7 1/2] dma-mapping: Rework dma_get_cache_alignment() On 25/09/17 10:46, Huacai Chen wrote: > Make dma_get_cache_alignment() to accept a 'dev' argument. As a result, > it can return different alignments due to different devices' I/O cache > coherency. > > Currently, MIPS is the only architecture which support coherent & non- > coherent devices co-exist. This may be changed in the future, so add a > new get_cache_alignment() function pointer in 'struct dma_map_ops' as a > generic solution. FWIW, ARM and arm64 have also supported per-device coherency for quite some time. > For compatibility (always return ARCH_DMA_MINALIGN), make all existing > callers pass a NULL dev argument to dma_get_cache_alignment(). > > Cc: sta...@vger.kernel.org > Signed-off-by: Huacai Chen --- > arch/mips/cavium-octeon/dma-octeon.c | 3 ++- > arch/mips/include/asm/dma-mapping.h| 2 ++ > arch/mips/loongson64/common/dma-swiotlb.c | 1 + > arch/mips/mm/dma-default.c | 11 ++- > arch/mips/netlogic/common/nlm-dma.c| 3 ++- > drivers/infiniband/hw/mthca/mthca_main.c | 2 +- > drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 +- > drivers/net/ethernet/broadcom/b44.c| 2 +- > drivers/net/ethernet/ibm/emac/core.h | 2 +- > drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- > drivers/spi/spi-qup.c | 4 ++-- > drivers/tty/serial/mpsc.c | 16 > drivers/tty/serial/samsung.c | 14 +++--- > include/linux/dma-mapping.h| 17 - > 14 files changed, 51 insertions(+), 30 deletions(-) I think it might be neater to split this into two patches - one making the treewide prototype change, then introducing the .get_cache_alignemnt callback separately - but that's only my personal preference. Otherwise (and modulo Christoph's comments), I'd say we're nearly there. Thanks, Robin. > diff --git a/arch/mips/cavium-octeon/dma-octeon.c > b/arch/mips/cavium-octeon/dma-octeon.c > index c64bd87..7978237 100644 > --- a/arch/mips/cavium-octeon/dma-octeon.c > +++ b/arch/mips/cavium-octeon/dma-octeon.c > @@ -324,7 +324,8 @@ static struct octeon_dma_map_ops _octeon_pci_dma_map_ops > = { > .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu, > .sync_sg_for_device = octeon_dma_sync_sg_for_device, > .mapping_error = swiotlb_dma_mapping_error, > - .dma_supported = swiotlb_dma_supported > + .dma_supported = swiotlb_dma_supported, > + .get_cache_alignment = mips_get_cache_alignment > }, > }; > > diff --git a/arch/mips/include/asm/dma-mapping.h > b/arch/mips/include/asm/dma-mapping.h > index aba7138..e2c5d9e 100644 > --- a/arch/mips/include/asm/dma-mapping.h > +++ b/arch/mips/include/asm/dma-mapping.h > @@ -39,4 +39,6 @@ static inline void arch_setup_dma_ops(struct device *dev, > u64 dma_base, > #endif > } > > +int mips_get_cache_alignment(struct device *dev); > + > #endif /* _ASM_DMA_MAPPING_H */ > diff --git a/arch/mips/loongson64/common/dma-swiotlb.c > b/arch/mips/loongson64/common/dma-swiotlb.c > index 34486c1..09cb8a4 100644 > --- a/arch/mips/loongson64/common/dma-swiotlb.c > +++ b/arch/mips/loongson64/common/dma-swiotlb.c > @@ -119,6 +119,7 @@ static const struct dma_map_ops loongson_dma_map_ops = { > .sync_sg_for_device = loongson_dma_sync_sg_for_device, > .mapping_error = swiotlb_dma_mapping_error, > .dma_supported = loongson_dma_supported, > + .get_cache_alignment = mips_get_cache_alignment > }; > > void __init plat_swiotlb_setup(void) > diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c >
Re: [PATCH V7 1/2] dma-mapping: Rework dma_get_cache_alignment()
Hi, Robin, Can ARM/ARM64 use the same implementation as MIPS? Or I just do MIPS things is OK? Huacai -- Original -- From: "Robin Murphy"; Date: Mon, Sep 25, 2017 08:57 PM To: "Huacai Chen"; "Christoph Hellwig"; Cc: "Marek Szyprowski"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "Ralf Baechle"; "James Hogan"; "linux-mips"; "James E . J . Bottomley"; "Martin K . Petersen"; "linux-scsi"; "Roland Dreier"; "Pawel Osciak"; "Kyungmin Park"; "Michael Chan"; "Benjamin Herrenschmidt"; "Ivan Mikhaylov"; "Tariq Toukan"; "Andy Gross"; "Mark A . Greer"; "Robert Baldyga"; "stable"; Subject: Re: [PATCH V7 1/2] dma-mapping: Rework dma_get_cache_alignment() On 25/09/17 10:46, Huacai Chen wrote: > Make dma_get_cache_alignment() to accept a 'dev' argument. As a result, > it can return different alignments due to different devices' I/O cache > coherency. > > Currently, MIPS is the only architecture which support coherent & non- > coherent devices co-exist. This may be changed in the future, so add a > new get_cache_alignment() function pointer in 'struct dma_map_ops' as a > generic solution. FWIW, ARM and arm64 have also supported per-device coherency for quite some time. > For compatibility (always return ARCH_DMA_MINALIGN), make all existing > callers pass a NULL dev argument to dma_get_cache_alignment(). > > Cc: sta...@vger.kernel.org > Signed-off-by: Huacai Chen --- > arch/mips/cavium-octeon/dma-octeon.c | 3 ++- > arch/mips/include/asm/dma-mapping.h| 2 ++ > arch/mips/loongson64/common/dma-swiotlb.c | 1 + > arch/mips/mm/dma-default.c | 11 ++- > arch/mips/netlogic/common/nlm-dma.c| 3 ++- > drivers/infiniband/hw/mthca/mthca_main.c | 2 +- > drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 +- > drivers/net/ethernet/broadcom/b44.c| 2 +- > drivers/net/ethernet/ibm/emac/core.h | 2 +- > drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- > drivers/spi/spi-qup.c | 4 ++-- > drivers/tty/serial/mpsc.c | 16 > drivers/tty/serial/samsung.c | 14 +++--- > include/linux/dma-mapping.h| 17 - > 14 files changed, 51 insertions(+), 30 deletions(-) I think it might be neater to split this into two patches - one making the treewide prototype change, then introducing the .get_cache_alignemnt callback separately - but that's only my personal preference. Otherwise (and modulo Christoph's comments), I'd say we're nearly there. Thanks, Robin. > diff --git a/arch/mips/cavium-octeon/dma-octeon.c > b/arch/mips/cavium-octeon/dma-octeon.c > index c64bd87..7978237 100644 > --- a/arch/mips/cavium-octeon/dma-octeon.c > +++ b/arch/mips/cavium-octeon/dma-octeon.c > @@ -324,7 +324,8 @@ static struct octeon_dma_map_ops _octeon_pci_dma_map_ops > = { > .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu, > .sync_sg_for_device = octeon_dma_sync_sg_for_device, > .mapping_error = swiotlb_dma_mapping_error, > - .dma_supported = swiotlb_dma_supported > + .dma_supported = swiotlb_dma_supported, > + .get_cache_alignment = mips_get_cache_alignment > }, > }; > > diff --git a/arch/mips/include/asm/dma-mapping.h > b/arch/mips/include/asm/dma-mapping.h > index aba7138..e2c5d9e 100644 > --- a/arch/mips/include/asm/dma-mapping.h > +++ b/arch/mips/include/asm/dma-mapping.h > @@ -39,4 +39,6 @@ static inline void arch_setup_dma_ops(struct device *dev, > u64 dma_base, > #endif > } > > +int mips_get_cache_alignment(struct device *dev); > + > #endif /* _ASM_DMA_MAPPING_H */ > diff --git a/arch/mips/loongson64/common/dma-swiotlb.c > b/arch/mips/loongson64/common/dma-swiotlb.c > index 34486c1..09cb8a4 100644 > --- a/arch/mips/loongson64/common/dma-swiotlb.c > +++ b/arch/mips/loongson64/common/dma-swiotlb.c > @@ -119,6 +119,7 @@ static const struct dma_map_ops loongson_dma_map_ops = { > .sync_sg_for_device = loongson_dma_sync_sg_for_device, > .mapping_error = swiotlb_dma_mapping_error, > .dma_supported = loongson_dma_supported, > + .get_cache_alignment = mips_get_cache_alignment > }; > > void __init plat_swiotlb_setup(void) > diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c > index c01bd20..c49987e 100644 > --- a/arch/mips/mm/dma-default.c > +++ b/arch/mips/mm/dma-default.c > @@ -394,6 +394,14 @@ void dma_cache_sync(struct device *dev, void *vaddr, > size_t size, > > EXPORT_SYMBOL(dma_cache_sync); > > +int mips_get_cache_alignment(struct device *dev) > +{ > + if (plat_device_is_coherent(dev)) > + return 1; > + else > + return ARCH_DMA_MINALIGN; > +} > + > static const struct dma_map_ops mips_default_dma_map_ops = { > .alloc = mips_dma_alloc_coherent, > .free = mips_dma_free_coherent, > @@
Re: [PATCH V6 1/3] dma-mapping: Introduce device_is_coherent() as ahelper
Hi, Robin, Before 2.6.36 dma_get_cache_alignment is arch-dependent, and it is unified in commit 4565f0170dfc849b3629c27d7 ("dma-mapping: unify dma_get_cache_alignment implementations"). Should we revert to the old implementation? Huacai -- Original -- From: "Robin Murphy"; Date: Thu, Sep 21, 2017 06:47 PM To: "Huacai Chen" ; "Christoph Hellwig" ; Cc: "Marek Szyprowski" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-kernel" ; "James E . J . Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "stable" ; Subject: Re: [PATCH V6 1/3] dma-mapping: Introduce device_is_coherent() as ahelper On 19/09/17 09:52, Huacai Chen wrote: > We will use device_is_coherent() as a helper function, which will be > used in the next patch. > > There is a MIPS-specific plat_device_is_coherent(), but we need a more > generic solution, so add and use a new function pointer in dma_map_ops. I think we're heading in the right direction with the series, but I still don't like this patch. I can pretty much guarantee that driver authors *will* abuse a generic device_is_coherent() API to mean "I can skip other DMA API calls and just use virt_to_phys()". I think it would be far better to allow architectures to provide their own override of dma_get_cache_alignment(), and let the coherency detail remain internal to the relevant arch implementations. [...] > @@ -697,6 +698,15 @@ static inline void *dma_zalloc_coherent(struct device > *dev, size_t size, > } > > #ifdef CONFIG_HAS_DMA > +static inline int device_is_coherent(struct device *dev) > +{ > + const struct dma_map_ops *ops = get_dma_ops(dev); > + if (ops && ops->device_is_coherent) > + return ops->device_is_coherent(dev); > + else > + return 1;/* compatible behavior */ That is also quite scary - if someone now adds a new dma_get_cache_alignemnt() call and dutifully passes a non-NULL device, they will now get back an alignment of 1 on all non-coherent platforms except MIPS: hello data corruption. Robin. > +} > + > static inline int dma_get_cache_alignment(void) > { > #ifdef ARCH_DMA_MINALIGN >
Re: [PATCH V6 1/3] dma-mapping: Introduce device_is_coherent() as ahelper
Hi, Robin, Before 2.6.36 dma_get_cache_alignment is arch-dependent, and it is unified in commit 4565f0170dfc849b3629c27d7 ("dma-mapping: unify dma_get_cache_alignment implementations"). Should we revert to the old implementation? Huacai -- Original -- From: "Robin Murphy"; Date: Thu, Sep 21, 2017 06:47 PM To: "Huacai Chen"; "Christoph Hellwig"; Cc: "Marek Szyprowski"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "James E . J . Bottomley"; "Martin K . Petersen"; "linux-scsi"; "stable"; Subject: Re: [PATCH V6 1/3] dma-mapping: Introduce device_is_coherent() as ahelper On 19/09/17 09:52, Huacai Chen wrote: > We will use device_is_coherent() as a helper function, which will be > used in the next patch. > > There is a MIPS-specific plat_device_is_coherent(), but we need a more > generic solution, so add and use a new function pointer in dma_map_ops. I think we're heading in the right direction with the series, but I still don't like this patch. I can pretty much guarantee that driver authors *will* abuse a generic device_is_coherent() API to mean "I can skip other DMA API calls and just use virt_to_phys()". I think it would be far better to allow architectures to provide their own override of dma_get_cache_alignment(), and let the coherency detail remain internal to the relevant arch implementations. [...] > @@ -697,6 +698,15 @@ static inline void *dma_zalloc_coherent(struct device > *dev, size_t size, > } > > #ifdef CONFIG_HAS_DMA > +static inline int device_is_coherent(struct device *dev) > +{ > + const struct dma_map_ops *ops = get_dma_ops(dev); > + if (ops && ops->device_is_coherent) > + return ops->device_is_coherent(dev); > + else > + return 1;/* compatible behavior */ That is also quite scary - if someone now adds a new dma_get_cache_alignemnt() call and dutifully passes a non-NULL device, they will now get back an alignment of 1 on all non-coherent platforms except MIPS: hello data corruption. Robin. > +} > + > static inline int dma_get_cache_alignment(void) > { > #ifdef ARCH_DMA_MINALIGN >
Re: [PATCH V6 2/3] dma-mapping: Rework dma_get_cache_alignment()function
Hi, Christoph, I have changed dma_get_cache_alignment's return value, and I don't know whether those drivers want to return ARCH_DMA_MINALIGN unconditionally. So I pass a NULL for those drivers, in order to keep their old behavior. Huacai -- Original -- From: "Christoph Hellwig"; Date: Tue, Sep 19, 2017 11:02 PM To: "Huacai Chen" ; Cc: "Christoph Hellwig" ; "Marek Szyprowski" ; "Robin Murphy" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-kernel" ; "James E . J . Bottomley" ; "Martin K . Petersen" ; "linux-scsi" ; "stable" ; Subject: Re: [PATCH V6 2/3] dma-mapping: Rework dma_get_cache_alignment()function > mdev->limits.reserved_mtts = ALIGN(mdev->limits.reserved_mtts * > mdev->limits.mtt_seg_size, > -dma_get_cache_alignment()) / > mdev->limits.mtt_seg_size; > +dma_get_cache_alignment(NULL)) / > mdev->limits.mtt_seg_size; > > mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, > init_hca->mtt_base, Please pass the actually relevant struct device for each call.
Re: [PATCH V6 2/3] dma-mapping: Rework dma_get_cache_alignment()function
Hi, Christoph, I have changed dma_get_cache_alignment's return value, and I don't know whether those drivers want to return ARCH_DMA_MINALIGN unconditionally. So I pass a NULL for those drivers, in order to keep their old behavior. Huacai -- Original -- From: "Christoph Hellwig"; Date: Tue, Sep 19, 2017 11:02 PM To: "Huacai Chen"; Cc: "Christoph Hellwig"; "Marek Szyprowski"; "Robin Murphy"; "Andrew Morton"; "Fuxin Zhang"; "linux-kernel"; "James E . J . Bottomley"; "Martin K . Petersen"; "linux-scsi"; "stable"; Subject: Re: [PATCH V6 2/3] dma-mapping: Rework dma_get_cache_alignment()function > mdev->limits.reserved_mtts = ALIGN(mdev->limits.reserved_mtts * > mdev->limits.mtt_seg_size, > -dma_get_cache_alignment()) / > mdev->limits.mtt_seg_size; > +dma_get_cache_alignment(NULL)) / > mdev->limits.mtt_seg_size; > > mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, > init_hca->mtt_base, Please pass the actually relevant struct device for each call.
Re: [V5, 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode
Oh, I know, I've make a mistake, dmapool doesn't need to change. Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 18, 2017 11:51 PM To: "Robin Murphy" ; Cc: "Huacai Chen" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-mm" ; "linux-kernel" ; "stable" ; Subject: Re: [V5, 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode On Mon, Sep 18, 2017 at 10:44:54AM +0100, Robin Murphy wrote: > On 18/09/17 05:22, Huacai Chen wrote: > > In non-coherent DMA mode, kernel uses cache flushing operations to > > maintain I/O coherency, so the dmapool objects should be aligned to > > ARCH_DMA_MINALIGN. Otherwise, it will cause data corruption, at least > > on MIPS: > > > > Step 1, dma_map_single > > Step 2, cache_invalidate (no writeback) > > Step 3, dma_from_device > > Step 4, dma_unmap_single > > This is a massive red warning flag for the whole series, because DMA > pools don't work like that. At best, this will do nothing, and at worst > it is papering over egregious bugs elsewhere. Streaming mappings of > coherent allocations means completely broken code. Oh, I hadn't even seen that part. Yes, dma coherent (and pool) allocations must never be used for streaming mappings. I wish we'd have some debug infrastructure to warn on such uses.
Re: [V5, 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode
Oh, I know, I've make a mistake, dmapool doesn't need to change. Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 18, 2017 11:51 PM To: "Robin Murphy"; Cc: "Huacai Chen"; "Andrew Morton"; "Fuxin Zhang"; "linux-mm"; "linux-kernel"; "stable"; Subject: Re: [V5, 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode On Mon, Sep 18, 2017 at 10:44:54AM +0100, Robin Murphy wrote: > On 18/09/17 05:22, Huacai Chen wrote: > > In non-coherent DMA mode, kernel uses cache flushing operations to > > maintain I/O coherency, so the dmapool objects should be aligned to > > ARCH_DMA_MINALIGN. Otherwise, it will cause data corruption, at least > > on MIPS: > > > > Step 1, dma_map_single > > Step 2, cache_invalidate (no writeback) > > Step 3, dma_from_device > > Step 4, dma_unmap_single > > This is a massive red warning flag for the whole series, because DMA > pools don't work like that. At best, this will do nothing, and at worst > it is papering over egregious bugs elsewhere. Streaming mappings of > coherent allocations means completely broken code. Oh, I hadn't even seen that part. Yes, dma coherent (and pool) allocations must never be used for streaming mappings. I wish we'd have some debug infrastructure to warn on such uses.
Re: [PATCH V5 3/3] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode
Hi, Christoph, I don't think dma_get_cache_alignment is the "absolute minimum alignment" in all cases. At least on MIPS/Loongson, if we use I/O coherent mode (Cached DMA mode), align block queue to 4Bytes is enough. If we align block queue to dma_get_cache_alignment in I/O coherent mode, there are peformance lost because we cannot use zero-copy in most cases (user buffers are usually not aligned). Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 18, 2017 01:20 PM To: "Huacai Chen" ; Cc: "James E . J . Bottomley" ; "Martin K . Petersen" ; "Andrew Morton" ; "Fuxin Zhang" ; "linux-scsi" ; "linux-kernel" ; "stable" ; Subject: Re: [PATCH V5 3/3] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode Please send all patches in the series to the same to and cc lists. On Mon, Sep 18, 2017 at 12:22:54PM +0800, Huacai Chen wrote: > In non-coherent DMA mode, kernel uses cache flushing operations to > maintain I/O coherency, so scsi's block queue should be aligned to > ARCH_DMA_MINALIGN. Otherwise, it will cause data corruption, at least > on MIPS: > > Step 1, dma_map_single > Step 2, cache_invalidate (no writeback) > Step 3, dma_from_device > Step 4, dma_unmap_single > > If a DMA buffer and a kernel structure share a same cache line, and if > the kernel structure has dirty data, cache_invalidate (no writeback) > will cause data lost. And as said before we must _always_ align to dma_get_cache_alignment. This is even documented in Documentation/DMA-API.txt: -- snip -- int dma_get_cache_alignment(void) Returns the processor cache alignment. This is the absolute minimum alignment *and* width that you must observe when either mapping memory or doing partial flushes. -- snip -- > + if (device_is_coherent(dev)) > + blk_queue_dma_alignment(q, 0x04 - 1); > + else > + blk_queue_dma_alignment(q, dma_get_cache_alignment() - 1); So as said before this should become something like: blk_queue_dma_alignment(q, max(0x04, dma_get_cache_alignment()) - 1);
Re: [PATCH V5 3/3] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode
Hi, Christoph, I don't think dma_get_cache_alignment is the "absolute minimum alignment" in all cases. At least on MIPS/Loongson, if we use I/O coherent mode (Cached DMA mode), align block queue to 4Bytes is enough. If we align block queue to dma_get_cache_alignment in I/O coherent mode, there are peformance lost because we cannot use zero-copy in most cases (user buffers are usually not aligned). Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 18, 2017 01:20 PM To: "Huacai Chen"; Cc: "James E . J . Bottomley"; "Martin K . Petersen"; "Andrew Morton"; "Fuxin Zhang"; "linux-scsi"; "linux-kernel"; "stable"; Subject: Re: [PATCH V5 3/3] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode Please send all patches in the series to the same to and cc lists. On Mon, Sep 18, 2017 at 12:22:54PM +0800, Huacai Chen wrote: > In non-coherent DMA mode, kernel uses cache flushing operations to > maintain I/O coherency, so scsi's block queue should be aligned to > ARCH_DMA_MINALIGN. Otherwise, it will cause data corruption, at least > on MIPS: > > Step 1, dma_map_single > Step 2, cache_invalidate (no writeback) > Step 3, dma_from_device > Step 4, dma_unmap_single > > If a DMA buffer and a kernel structure share a same cache line, and if > the kernel structure has dirty data, cache_invalidate (no writeback) > will cause data lost. And as said before we must _always_ align to dma_get_cache_alignment. This is even documented in Documentation/DMA-API.txt: -- snip -- int dma_get_cache_alignment(void) Returns the processor cache alignment. This is the absolute minimum alignment *and* width that you must observe when either mapping memory or doing partial flushes. -- snip -- > + if (device_is_coherent(dev)) > + blk_queue_dma_alignment(q, 0x04 - 1); > + else > + blk_queue_dma_alignment(q, dma_get_cache_alignment() - 1); So as said before this should become something like: blk_queue_dma_alignment(q, max(0x04, dma_get_cache_alignment()) - 1);
Re: [PATCH V5 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode
Hi, Christoph, Maybe you missed something. 1, pool_alloc_page() use dma_alloc_coherent() to allocate pool pages, and of course these pages are aligned to ARCH_DMA_MINALIGN. 2, dma_pool_alloc() is the element allocator, but it doesn't use dma_alloc_coherent(). Elements only align to pool->size, but pool->size is usually less than ARCH_DMA_MINALIGN. 3, ARCH_DMA_MINALIGN is now only used in serveral drivers, no dma_ops use it. Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 18, 2017 01:22 PM To: "Huacai Chen" ; Cc: "Andrew Morton" ; "Fuxin Zhang" ; "linux-mm" ; "linux-kernel" ; "stable" ; Subject: Re: [PATCH V5 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode The dmapool code uses dma_alloc_coherent to allocate each element, and dma_alloc_coherent must align to ARCH_DMA_MINALIGN already. If you implementation doesn't do that it needs to be fixed.
Re: [PATCH V5 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode
Hi, Christoph, Maybe you missed something. 1, pool_alloc_page() use dma_alloc_coherent() to allocate pool pages, and of course these pages are aligned to ARCH_DMA_MINALIGN. 2, dma_pool_alloc() is the element allocator, but it doesn't use dma_alloc_coherent(). Elements only align to pool->size, but pool->size is usually less than ARCH_DMA_MINALIGN. 3, ARCH_DMA_MINALIGN is now only used in serveral drivers, no dma_ops use it. Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 18, 2017 01:22 PM To: "Huacai Chen"; Cc: "Andrew Morton"; "Fuxin Zhang"; "linux-mm"; "linux-kernel"; "stable"; Subject: Re: [PATCH V5 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode The dmapool code uses dma_alloc_coherent to allocate each element, and dma_alloc_coherent must align to ARCH_DMA_MINALIGN already. If you implementation doesn't do that it needs to be fixed.
Re: [PATCH V3 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode
Hi, Andrew, It will cause data corruption, at least on MIPS: step 1, dma_map_single step 2, cache_invalidate (no writeback) step 3, dma_from_device step 4, dma_unmap_single If a DMA buffer and a kernel structure share a same cache line, and if the kernel structure has dirty data, cache_invalidate (no writeback) may cause data lost. Huacai -- Original -- From: "Andrew Morton"; Date: Thu, Sep 14, 2017 05:52 AM To: "Huacai Chen" ; Cc: "Fuxin Zhang" ; "linux-mm" ; "linux-kernel" ; "stable" ; Subject: Re: [PATCH V3 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode On Wed, 13 Sep 2017 17:20:51 +0800 Huacai Chen wrote: > In non-coherent DMA mode, kernel uses cache flushing operations to > maintain I/O coherency, so the dmapool objects should be aligned to > ARCH_DMA_MINALIGN. What are the user-visible effects of this bug?
Re: [PATCH V3 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode
Hi, Andrew, It will cause data corruption, at least on MIPS: step 1, dma_map_single step 2, cache_invalidate (no writeback) step 3, dma_from_device step 4, dma_unmap_single If a DMA buffer and a kernel structure share a same cache line, and if the kernel structure has dirty data, cache_invalidate (no writeback) may cause data lost. Huacai -- Original -- From: "Andrew Morton"; Date: Thu, Sep 14, 2017 05:52 AM To: "Huacai Chen"; Cc: "Fuxin Zhang"; "linux-mm"; "linux-kernel"; "stable"; Subject: Re: [PATCH V3 2/3] mm: dmapool: Align to ARCH_DMA_MINALIGN innon-coherent DMA mode On Wed, 13 Sep 2017 17:20:51 +0800 Huacai Chen wrote: > In non-coherent DMA mode, kernel uses cache flushing operations to > maintain I/O coherency, so the dmapool objects should be aligned to > ARCH_DMA_MINALIGN. What are the user-visible effects of this bug?
Re: [PATCH 2/2] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode
Hi, Christoph I think we cannot modify dma_get_cache_alignment(), because existing callers may want to unconditionally return ARCH_DMA_MINALIGN. Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 11, 2017 03:39 PM To: "Huacai Chen" ; Cc: "James E . J . Bottomley" ; "Martin K . Petersen" ; "Fuxin Zhang" ; "linux-scsi" ; "linux-kernel" ; "stable" ; Subject: Re: [PATCH 2/2] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode > + if (plat_device_is_coherent(dev)) We can't just call platform device code. We'll need a proper DMA API call for this. > + blk_queue_dma_alignment(q, 0x04 - 1); > + else > + blk_queue_dma_alignment(q, dma_get_cache_alignment() - 1); Which we already have with dma_get_cache_alignment, except that it doesn't take a struct device pointer and doesn't call into dma_map ops. So please add a struct device argument to dma_get_cache_alignment, and let it call into dma_map_ops where needed. With that you can replace the above with: blk_queue_dma_alignment(q, max(0x04U, dma_get_cache_alignment(dev)) - 1);
Re: [PATCH 2/2] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode
Hi, Christoph I think we cannot modify dma_get_cache_alignment(), because existing callers may want to unconditionally return ARCH_DMA_MINALIGN. Huacai -- Original -- From: "Christoph Hellwig"; Date: Mon, Sep 11, 2017 03:39 PM To: "Huacai Chen"; Cc: "James E . J . Bottomley"; "Martin K . Petersen"; "Fuxin Zhang"; "linux-scsi"; "linux-kernel"; "stable"; Subject: Re: [PATCH 2/2] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode > + if (plat_device_is_coherent(dev)) We can't just call platform device code. We'll need a proper DMA API call for this. > + blk_queue_dma_alignment(q, 0x04 - 1); > + else > + blk_queue_dma_alignment(q, dma_get_cache_alignment() - 1); Which we already have with dma_get_cache_alignment, except that it doesn't take a struct device pointer and doesn't call into dma_map ops. So please add a struct device argument to dma_get_cache_alignment, and let it call into dma_map_ops where needed. With that you can replace the above with: blk_queue_dma_alignment(q, max(0x04U, dma_get_cache_alignment(dev)) - 1);
Re:mips builds failing in v3.18.38 and v4.1.29
Hi, I have already told Sasha that "MIPS: Reserve nosave data for hibernation" should not be backported to 4.1/3.18. But he has no response. Huacai -- Original -- From: "Guenter Roeck"; Date: Mon, Aug 1, 2016 07:57 AM To: "Sasha Levin" ; Cc: "stable" ; "linux-kernel@vger.kernel.org" ; "Huacai Chen" ; Subject: mips builds failing in v3.18.38 and v4.1.29 Sorry I didn't notice earlier - my 3.18/4.1 build scripts were broken. arch/mips/kernel/setup.c: In function 'arch_mem_init': arch/mips/kernel/setup.c:689:2: error: implicit declaration of function 'reserve_bootmem_region' Offending commit is 1dd0964204277108e ("MIPS: Reserve nosave data for hibernation"). It fixes a bug which does not exist in 3.18 / 4.1, in turn causing all mips builds in 3.18 and 4.1 to fail. Guenter
Re:mips builds failing in v3.18.38 and v4.1.29
Hi, I have already told Sasha that "MIPS: Reserve nosave data for hibernation" should not be backported to 4.1/3.18. But he has no response. Huacai -- Original -- From: "Guenter Roeck"; Date: Mon, Aug 1, 2016 07:57 AM To: "Sasha Levin"; Cc: "stable"; "linux-kernel@vger.kernel.org"; "Huacai Chen"; Subject: mips builds failing in v3.18.38 and v4.1.29 Sorry I didn't notice earlier - my 3.18/4.1 build scripts were broken. arch/mips/kernel/setup.c: In function 'arch_mem_init': arch/mips/kernel/setup.c:689:2: error: implicit declaration of function 'reserve_bootmem_region' Offending commit is 1dd0964204277108e ("MIPS: Reserve nosave data for hibernation"). It fixes a bug which does not exist in 3.18 / 4.1, in turn causing all mips builds in 3.18 and 4.1 to fail. Guenter
Re:next: fuloong2e qemu boot failure due to 'MIPS: Loongson: AddLoongson-3A R2 basic support'
Hi, Could you please remove the line "#define cpu_hwrena_impl_bits0xc000" in arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h and try again?Thanks. Huacai -- Original -- From: "Guenter Roeck"; Date: Wed, Apr 20, 2016 10:54 AM To: "Huacai Chen" ; Cc: "Ralf Baechle" ; "linux-mips" ; "linux-next" ; "linux-kernel" ; Subject: next: fuloong2e qemu boot failure due to 'MIPS: Loongson: AddLoongson-3A R2 basic support' Hi, qemu fails to boot in -next for machine fulong2e with configuration fuloong2e_defconfig. Bisect points to commit 'MIPS: Loongson: Add Loongson-3A R2 basic support'. qemu hangs in boot, after displaying "Inode-cache hash table entries: 16384 (order: 3, 131072 bytes)". Bisect log is attached. Guenter --- # bad: [1bd7a2081d2c7b096f75aa934658e404ccaba5fd] Add linux-next specific files for 20160418 # good: [bf16200689118d19de1b8d2a3c314fc21f5dc7bb] Linux 4.6-rc3 git bisect start 'HEAD' 'v4.6-rc3' # bad: [493ac92ff65ec4c4cd4c43870e778760a012951d] Merge remote-tracking branch 'ipvs-next/master' git bisect bad 493ac92ff65ec4c4cd4c43870e778760a012951d # bad: [20ca3ae9c517eee9b2f1bd0fb2a06e2d14153792] Merge remote-tracking branch 'btrfs-kdave/for-next' git bisect bad 20ca3ae9c517eee9b2f1bd0fb2a06e2d14153792 # good: [c454e65fb9ade11d0f84718d06a6888e2c92804d] Merge remote-tracking branch 'omap/for-next' git bisect good c454e65fb9ade11d0f84718d06a6888e2c92804d # good: [6f5c70fb9b4fc0534157bfa40cea9b402e6f2506] Merge remote-tracking branch 'microblaze/next' git bisect good 6f5c70fb9b4fc0534157bfa40cea9b402e6f2506 # bad: [7f053cd68fd14243c8f202b4086d7dd75c409e6f] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT git bisect bad 7f053cd68fd14243c8f202b4086d7dd75c409e6f # good: [e9aacdd7f0b66c4ace17e5950c48e7cc61a253c8] MIPS: Allow RIXI to be used on non-R2 or R6 cores git bisect good e9aacdd7f0b66c4ace17e5950c48e7cc61a253c8 # good: [d1e8b9a8dc6c7fa9add5dfa7083e035ce037e56d] MAINTAINERS: add Loongson1 architecture entry git bisect good d1e8b9a8dc6c7fa9add5dfa7083e035ce037e56d # good: [13ff6275bb389c3669082d3ef8483592a31eb0ea] MIPS: Fix siginfo.h to use strict posix types git bisect good 13ff6275bb389c3669082d3ef8483592a31eb0ea # good: [66e74bdd51e617023fa2e79a807b704fb3eed8aa] MIPS: Enable ptrace hw watchpoints on MIPS R6 git bisect good 66e74bdd51e617023fa2e79a807b704fb3eed8aa # good: [f7cabc2dac8adf5986dbc700584bc3b8fe493d4d] MIPS: Loongson-3: Adjust irq dispatch to speedup processing git bisect good f7cabc2dac8adf5986dbc700584bc3b8fe493d4d # bad: [4978c8477e96fb9e9d870d8f42328dcabf1a65e9] MIPS: Loongson-3: Set cache flush handlers to cache_noop git bisect bad 4978c8477e96fb9e9d870d8f42328dcabf1a65e9 # bad: [04a35922c1dac1b4864b8b366a37474e9e51d8c0] MIPS: Loongson: Add Loongson-3A R2 basic support git bisect bad 04a35922c1dac1b4864b8b366a37474e9e51d8c0 # first bad commit: [04a35922c1dac1b4864b8b366a37474e9e51d8c0] MIPS: Loongson: Add Loongson-3A R2 basic support
Re:next: fuloong2e qemu boot failure due to 'MIPS: Loongson: AddLoongson-3A R2 basic support'
Hi, Could you please remove the line "#define cpu_hwrena_impl_bits0xc000" in arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h and try again?Thanks. Huacai -- Original -- From: "Guenter Roeck"; Date: Wed, Apr 20, 2016 10:54 AM To: "Huacai Chen"; Cc: "Ralf Baechle"; "linux-mips"; "linux-next"; "linux-kernel"; Subject: next: fuloong2e qemu boot failure due to 'MIPS: Loongson: AddLoongson-3A R2 basic support' Hi, qemu fails to boot in -next for machine fulong2e with configuration fuloong2e_defconfig. Bisect points to commit 'MIPS: Loongson: Add Loongson-3A R2 basic support'. qemu hangs in boot, after displaying "Inode-cache hash table entries: 16384 (order: 3, 131072 bytes)". Bisect log is attached. Guenter --- # bad: [1bd7a2081d2c7b096f75aa934658e404ccaba5fd] Add linux-next specific files for 20160418 # good: [bf16200689118d19de1b8d2a3c314fc21f5dc7bb] Linux 4.6-rc3 git bisect start 'HEAD' 'v4.6-rc3' # bad: [493ac92ff65ec4c4cd4c43870e778760a012951d] Merge remote-tracking branch 'ipvs-next/master' git bisect bad 493ac92ff65ec4c4cd4c43870e778760a012951d # bad: [20ca3ae9c517eee9b2f1bd0fb2a06e2d14153792] Merge remote-tracking branch 'btrfs-kdave/for-next' git bisect bad 20ca3ae9c517eee9b2f1bd0fb2a06e2d14153792 # good: [c454e65fb9ade11d0f84718d06a6888e2c92804d] Merge remote-tracking branch 'omap/for-next' git bisect good c454e65fb9ade11d0f84718d06a6888e2c92804d # good: [6f5c70fb9b4fc0534157bfa40cea9b402e6f2506] Merge remote-tracking branch 'microblaze/next' git bisect good 6f5c70fb9b4fc0534157bfa40cea9b402e6f2506 # bad: [7f053cd68fd14243c8f202b4086d7dd75c409e6f] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT git bisect bad 7f053cd68fd14243c8f202b4086d7dd75c409e6f # good: [e9aacdd7f0b66c4ace17e5950c48e7cc61a253c8] MIPS: Allow RIXI to be used on non-R2 or R6 cores git bisect good e9aacdd7f0b66c4ace17e5950c48e7cc61a253c8 # good: [d1e8b9a8dc6c7fa9add5dfa7083e035ce037e56d] MAINTAINERS: add Loongson1 architecture entry git bisect good d1e8b9a8dc6c7fa9add5dfa7083e035ce037e56d # good: [13ff6275bb389c3669082d3ef8483592a31eb0ea] MIPS: Fix siginfo.h to use strict posix types git bisect good 13ff6275bb389c3669082d3ef8483592a31eb0ea # good: [66e74bdd51e617023fa2e79a807b704fb3eed8aa] MIPS: Enable ptrace hw watchpoints on MIPS R6 git bisect good 66e74bdd51e617023fa2e79a807b704fb3eed8aa # good: [f7cabc2dac8adf5986dbc700584bc3b8fe493d4d] MIPS: Loongson-3: Adjust irq dispatch to speedup processing git bisect good f7cabc2dac8adf5986dbc700584bc3b8fe493d4d # bad: [4978c8477e96fb9e9d870d8f42328dcabf1a65e9] MIPS: Loongson-3: Set cache flush handlers to cache_noop git bisect bad 4978c8477e96fb9e9d870d8f42328dcabf1a65e9 # bad: [04a35922c1dac1b4864b8b366a37474e9e51d8c0] MIPS: Loongson: Add Loongson-3A R2 basic support git bisect bad 04a35922c1dac1b4864b8b366a37474e9e51d8c0 # first bad commit: [04a35922c1dac1b4864b8b366a37474e9e51d8c0] MIPS: Loongson: Add Loongson-3A R2 basic support
Re:[PATCH] mips: Fix console output for Fulong2e system
Acked-by: Huacai Chen -- Original -- From: "Guenter Roeck"; Date: Fri, Aug 7, 2015 01:57 PM To: "Ralf Baechle"; Cc: "Huacai Chen"; "linux-mips"; "linux-kernel"; "Guenter Roeck"; Subject: [PATCH] mips: Fix console output for Fulong2e system Commit 3adeb2566b9b ("MIPS: Loongson: Improve LEFI firmware interface") made the number of UARTs dynamic if LEFI_FIRMWARE_INTERFACE is configured. Unfortunately, it did not initialize the number of UARTs if LEFI_FIRMWARE_INTERFACE is not configured. As a result, the Fulong2e system has no console. Fixes: 3adeb2566b9b ("MIPS: Loongson: Improve LEFI firmware interface") Cc: Huacai Chen Signed-off-by: Guenter Roeck --- Never mind my earlier e-mail, I figured it out. Should be a candidate for stable (v3.19+, ie v4.1 in practice). arch/mips/loongson64/common/env.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/mips/loongson64/common/env.c b/arch/mips/loongson64/common/env.c index f6c44dd332e2..d6d07ad56180 100644 --- a/arch/mips/loongson64/common/env.c +++ b/arch/mips/loongson64/common/env.c @@ -64,6 +64,9 @@ void __init prom_init_env(void) } if (memsize == 0) memsize = 256; + + loongson_sysconf.nr_uarts = 1; + pr_info("memsize=%u, highmemsize=%u\n", memsize, highmemsize); #else struct boot_params *boot_p; -- 2.1.4
Re:[PATCH] mips: Fix console output for Fulong2e system
Acked-by: Huacai Chen che...@lemote.com -- Original -- From: Guenter Roeckli...@roeck-us.net; Date: Fri, Aug 7, 2015 01:57 PM To: Ralf Baechler...@linux-mips.org; Cc: Huacai Chenche...@lemote.com; linux-mipslinux-m...@linux-mips.org; linux-kernellinux-kernel@vger.kernel.org; Guenter Roeckli...@roeck-us.net; Subject: [PATCH] mips: Fix console output for Fulong2e system Commit 3adeb2566b9b (MIPS: Loongson: Improve LEFI firmware interface) made the number of UARTs dynamic if LEFI_FIRMWARE_INTERFACE is configured. Unfortunately, it did not initialize the number of UARTs if LEFI_FIRMWARE_INTERFACE is not configured. As a result, the Fulong2e system has no console. Fixes: 3adeb2566b9b (MIPS: Loongson: Improve LEFI firmware interface) Cc: Huacai Chen che...@lemote.com Signed-off-by: Guenter Roeck li...@roeck-us.net --- Never mind my earlier e-mail, I figured it out. Should be a candidate for stable (v3.19+, ie v4.1 in practice). arch/mips/loongson64/common/env.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/mips/loongson64/common/env.c b/arch/mips/loongson64/common/env.c index f6c44dd332e2..d6d07ad56180 100644 --- a/arch/mips/loongson64/common/env.c +++ b/arch/mips/loongson64/common/env.c @@ -64,6 +64,9 @@ void __init prom_init_env(void) } if (memsize == 0) memsize = 256; + + loongson_sysconf.nr_uarts = 1; + pr_info(memsize=%u, highmemsize=%u\n, memsize, highmemsize); #else struct boot_params *boot_p; -- 2.1.4
Re:[PATCH -next] MIPS: Remove duplicated include from numa.c
Reviewed-off-by: Huacai Chen -- Original -- From: "weiyj_lk"; Date: Tue, Aug 12, 2014 08:26 PM To: "Ralf Baechle"; "Huacai Chen"; Cc: "Wei Yongjun"; "linux-mips"; "linux-kernel"; Subject: [PATCH -next] MIPS: Remove duplicated include from numa.c From: Wei Yongjun Remove duplicated include. Signed-off-by: Wei Yongjun --- arch/mips/loongson/loongson-3/numa.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/mips/loongson/loongson-3/numa.c b/arch/mips/loongson/loongson-3/numa.c index ca025a6..37ed184 100644 --- a/arch/mips/loongson/loongson-3/numa.c +++ b/arch/mips/loongson/loongson-3/numa.c @@ -24,8 +24,6 @@ #include #include #include -#include -#include #include #include #include
Re:[PATCH -next] MIPS: Remove duplicated include from numa.c
Reviewed-off-by: Huacai Chen che...@lemote.com -- Original -- From: weiyj_lkweiyj...@163.com; Date: Tue, Aug 12, 2014 08:26 PM To: Ralf Baechler...@linux-mips.org; Huacai Chenche...@lemote.com; Cc: Wei Yongjunyongjun_...@trendmicro.com.cn; linux-mipslinux-m...@linux-mips.org; linux-kernellinux-kernel@vger.kernel.org; Subject: [PATCH -next] MIPS: Remove duplicated include from numa.c From: Wei Yongjun yongjun_...@trendmicro.com.cn Remove duplicated include. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- arch/mips/loongson/loongson-3/numa.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/mips/loongson/loongson-3/numa.c b/arch/mips/loongson/loongson-3/numa.c index ca025a6..37ed184 100644 --- a/arch/mips/loongson/loongson-3/numa.c +++ b/arch/mips/loongson/loongson-3/numa.c @@ -24,8 +24,6 @@ #include asm/page.h #include asm/pgalloc.h #include asm/sections.h -#include linux/bootmem.h -#include linux/init.h #include linux/irq.h #include asm/bootinfo.h #include asm/mc146818-time.h
Re: [PATCH] genirq: Avoid NULL OOPS in irq handling
> On Wed, 30 Oct 2013, "陈华才" wrote: > >> I use a Loongson-3(MIPS-series CPU) machine, there is a serial port >> integrated in the CPU (but it iss buggy), and it use handle_percpu_irq() >> as the irq handler. Maybe I should move the checking into >> handle_percpu_irq()? > > Why is a device interrupt using handle_percpu_irq()? Seems that IRQs directly deliverd to MIPS CPU (those without interrupt controller) are all handled by handle_percpu_irq() I will try to overwrite the handler in arch-specific code and keep the code in kernel/ irq as is. Thanks. > > Thanks, > > tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] genirq: Avoid NULL OOPS in irq handling
I use a Loongson-3(MIPS-series CPU) machine, there is a serial port integrated in the CPU (but it iss buggy), and it use handle_percpu_irq() as the irq handler. Maybe I should move the checking into handle_percpu_irq()? Huacai > On Sat, 28 Sep 2013, Huacai Chen wrote: > >> Some devices (e.g. serial port) setup irq handler at dev open and free >> it at dev close. So, sometimes there is no irqaction for a specific >> irq. But some buggy devices may send irqs at any time. This patch avoid >> the NULL OOPS when irqaction isn't registered. > > All callers except the real per cpu interrupts are checking whether > there is a valid action before calling. And serial ports are not > routed to real per cpu interrupts. Can you provide more detailed > information about the problem you are trying to solve please? > > Thanks, > > tglx > >> Signed-off-by: Huacai Chen >> --- >> kernel/irq/handle.c |4 >> 1 files changed, 4 insertions(+), 0 deletions(-) >> >> diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c >> index 131ca17..1c78e69 100644 >> --- a/kernel/irq/handle.c >> +++ b/kernel/irq/handle.c >> @@ -135,6 +135,9 @@ handle_irq_event_percpu(struct irq_desc *desc, >> struct irqaction *action) >> irqreturn_t retval = IRQ_NONE; >> unsigned int flags = 0, irq = desc->irq_data.irq; >> >> +if (!action) >> +goto out; >> + >> do { >> irqreturn_t res; >> >> @@ -174,6 +177,7 @@ handle_irq_event_percpu(struct irq_desc *desc, >> struct irqaction *action) >> >> add_interrupt_randomness(irq, flags); >> >> +out: >> if (!noirqdebug) >> note_interrupt(irq, desc, retval); >> return retval; >> -- >> 1.7.7.3 >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] genirq: Avoid NULL OOPS in irq handling
I use a Loongson-3(MIPS-series CPU) machine, there is a serial port integrated in the CPU (but it iss buggy), and it use handle_percpu_irq() as the irq handler. Maybe I should move the checking into handle_percpu_irq()? Huacai On Sat, 28 Sep 2013, Huacai Chen wrote: Some devices (e.g. serial port) setup irq handler at dev open and free it at dev close. So, sometimes there is no irqaction for a specific irq. But some buggy devices may send irqs at any time. This patch avoid the NULL OOPS when irqaction isn't registered. All callers except the real per cpu interrupts are checking whether there is a valid action before calling. And serial ports are not routed to real per cpu interrupts. Can you provide more detailed information about the problem you are trying to solve please? Thanks, tglx Signed-off-by: Huacai Chen che...@lemote.com --- kernel/irq/handle.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c index 131ca17..1c78e69 100644 --- a/kernel/irq/handle.c +++ b/kernel/irq/handle.c @@ -135,6 +135,9 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) irqreturn_t retval = IRQ_NONE; unsigned int flags = 0, irq = desc-irq_data.irq; +if (!action) +goto out; + do { irqreturn_t res; @@ -174,6 +177,7 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) add_interrupt_randomness(irq, flags); +out: if (!noirqdebug) note_interrupt(irq, desc, retval); return retval; -- 1.7.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] genirq: Avoid NULL OOPS in irq handling
On Wed, 30 Oct 2013, 陈华才 wrote: I use a Loongson-3(MIPS-series CPU) machine, there is a serial port integrated in the CPU (but it iss buggy), and it use handle_percpu_irq() as the irq handler. Maybe I should move the checking into handle_percpu_irq()? Why is a device interrupt using handle_percpu_irq()? Seems that IRQs directly deliverd to MIPS CPU (those without interrupt controller) are all handled by handle_percpu_irq() I will try to overwrite the handler in arch-specific code and keep the code in kernel/ irq as is. Thanks. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 01/14] MIPS: Build uasm-generated code only once to avoid CPU Hotplug problem
I'm sorry, this is the only patch, please ignore [01/14]. > Currently, clear_page()/copy_page() are generated by Micro-assembler > dynamically. But they are unavailable until uasm_resolve_relocs() has > finished because jump labels are illegal before that. Since these > functions are shared by every CPU, we only call build_clear_page()/ > build_copy_page() only once at boot time. Without this patch, programs > will get random memory corruption (segmentation fault, bus error, etc.) > while CPU Hotplug (e.g. one CPU is using clear_page() while another is > generating it in cpu_cache_init()). > > For similar reasons we modify build_tlb_refill_handler()'s invocation. > > V2: > 1, Rework the code to make CPU#0 can be online/offline. > 2, Introduce cpu_has_local_ebase feature since some types of MIPS CPU > need a per-CPU tlb_refill_handler(). > > Signed-off-by: Huacai Chen > Signed-off-by: Hongbing Hu > --- > arch/mips/include/asm/cpu-features.h |3 +++ > .../asm/mach-loongson/cpu-feature-overrides.h |1 + > arch/mips/mm/page.c| 10 ++ > arch/mips/mm/tlbex.c | 10 -- > 4 files changed, 22 insertions(+), 2 deletions(-) > > diff --git a/arch/mips/include/asm/cpu-features.h > b/arch/mips/include/asm/cpu-features.h > index c507b93..1204408 100644 > --- a/arch/mips/include/asm/cpu-features.h > +++ b/arch/mips/include/asm/cpu-features.h > @@ -110,6 +110,9 @@ > #ifndef cpu_has_pindexed_dcache > #define cpu_has_pindexed_dcache (cpu_data[0].dcache.flags & > MIPS_CACHE_PINDEX) > #endif > +#ifndef cpu_has_local_ebase > +#define cpu_has_local_ebase 1 > +#endif > > /* > * I-Cache snoops remote store. This only matters on SMP. Some > multiprocessors > diff --git a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h > b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h > index 1a05d85..8eec8e2 100644 > --- a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h > +++ b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h > @@ -57,5 +57,6 @@ > #define cpu_has_vint 0 > #define cpu_has_vtag_icache 0 > #define cpu_has_watch1 > +#define cpu_has_local_ebase 0 > > #endif /* __ASM_MACH_LOONGSON_CPU_FEATURE_OVERRIDES_H */ > diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c > index 8e666c5..6a39c01 100644 > --- a/arch/mips/mm/page.c > +++ b/arch/mips/mm/page.c > @@ -247,6 +247,11 @@ void __cpuinit build_clear_page(void) > struct uasm_label *l = labels; > struct uasm_reloc *r = relocs; > int i; > + static atomic_t run_once = ATOMIC_INIT(0); > + > + if (atomic_xchg(_once, 1)) { > + return; > + } > > memset(labels, 0, sizeof(labels)); > memset(relocs, 0, sizeof(relocs)); > @@ -389,6 +394,11 @@ void __cpuinit build_copy_page(void) > struct uasm_label *l = labels; > struct uasm_reloc *r = relocs; > int i; > + static atomic_t run_once = ATOMIC_INIT(0); > + > + if (atomic_xchg(_once, 1)) { > + return; > + } > > memset(labels, 0, sizeof(labels)); > memset(relocs, 0, sizeof(relocs)); > diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c > index 1c8ac49..4a8b294 100644 > --- a/arch/mips/mm/tlbex.c > +++ b/arch/mips/mm/tlbex.c > @@ -2161,8 +2161,11 @@ void __cpuinit build_tlb_refill_handler(void) > case CPU_TX3922: > case CPU_TX3927: > #ifndef CONFIG_MIPS_PGD_C0_CONTEXT > - build_r3000_tlb_refill_handler(); > + if (cpu_has_local_ebase) > + build_r3000_tlb_refill_handler(); > if (!run_once) { > + if (!cpu_has_local_ebase) > + build_r3000_tlb_refill_handler(); > build_r3000_tlb_load_handler(); > build_r3000_tlb_store_handler(); > build_r3000_tlb_modify_handler(); > @@ -2191,9 +2194,12 @@ void __cpuinit build_tlb_refill_handler(void) > build_r4000_tlb_load_handler(); > build_r4000_tlb_store_handler(); > build_r4000_tlb_modify_handler(); > + if (!cpu_has_local_ebase) > + build_r4000_tlb_refill_handler(); > run_once++; > } > - build_r4000_tlb_refill_handler(); > + if (cpu_has_local_ebase) > + build_r4000_tlb_refill_handler(); > } > } > > -- > 1.7.7.3 > > -- 江苏中科梦兰电子科技有限公司 软件部 陈华才 E-mail: che...@lemote.com Web: http://www.lemote.com/ Add: 江苏省常熟市虞山镇梦兰工业园 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 01/14] MIPS: Build uasm-generated code only once to avoid CPU Hotplug problem
I'm sorry, this is the only patch, please ignore [01/14]. Currently, clear_page()/copy_page() are generated by Micro-assembler dynamically. But they are unavailable until uasm_resolve_relocs() has finished because jump labels are illegal before that. Since these functions are shared by every CPU, we only call build_clear_page()/ build_copy_page() only once at boot time. Without this patch, programs will get random memory corruption (segmentation fault, bus error, etc.) while CPU Hotplug (e.g. one CPU is using clear_page() while another is generating it in cpu_cache_init()). For similar reasons we modify build_tlb_refill_handler()'s invocation. V2: 1, Rework the code to make CPU#0 can be online/offline. 2, Introduce cpu_has_local_ebase feature since some types of MIPS CPU need a per-CPU tlb_refill_handler(). Signed-off-by: Huacai Chen che...@lemote.com Signed-off-by: Hongbing Hu h...@lemote.com --- arch/mips/include/asm/cpu-features.h |3 +++ .../asm/mach-loongson/cpu-feature-overrides.h |1 + arch/mips/mm/page.c| 10 ++ arch/mips/mm/tlbex.c | 10 -- 4 files changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h index c507b93..1204408 100644 --- a/arch/mips/include/asm/cpu-features.h +++ b/arch/mips/include/asm/cpu-features.h @@ -110,6 +110,9 @@ #ifndef cpu_has_pindexed_dcache #define cpu_has_pindexed_dcache (cpu_data[0].dcache.flags MIPS_CACHE_PINDEX) #endif +#ifndef cpu_has_local_ebase +#define cpu_has_local_ebase 1 +#endif /* * I-Cache snoops remote store. This only matters on SMP. Some multiprocessors diff --git a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h index 1a05d85..8eec8e2 100644 --- a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h +++ b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h @@ -57,5 +57,6 @@ #define cpu_has_vint 0 #define cpu_has_vtag_icache 0 #define cpu_has_watch1 +#define cpu_has_local_ebase 0 #endif /* __ASM_MACH_LOONGSON_CPU_FEATURE_OVERRIDES_H */ diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c index 8e666c5..6a39c01 100644 --- a/arch/mips/mm/page.c +++ b/arch/mips/mm/page.c @@ -247,6 +247,11 @@ void __cpuinit build_clear_page(void) struct uasm_label *l = labels; struct uasm_reloc *r = relocs; int i; + static atomic_t run_once = ATOMIC_INIT(0); + + if (atomic_xchg(run_once, 1)) { + return; + } memset(labels, 0, sizeof(labels)); memset(relocs, 0, sizeof(relocs)); @@ -389,6 +394,11 @@ void __cpuinit build_copy_page(void) struct uasm_label *l = labels; struct uasm_reloc *r = relocs; int i; + static atomic_t run_once = ATOMIC_INIT(0); + + if (atomic_xchg(run_once, 1)) { + return; + } memset(labels, 0, sizeof(labels)); memset(relocs, 0, sizeof(relocs)); diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c index 1c8ac49..4a8b294 100644 --- a/arch/mips/mm/tlbex.c +++ b/arch/mips/mm/tlbex.c @@ -2161,8 +2161,11 @@ void __cpuinit build_tlb_refill_handler(void) case CPU_TX3922: case CPU_TX3927: #ifndef CONFIG_MIPS_PGD_C0_CONTEXT - build_r3000_tlb_refill_handler(); + if (cpu_has_local_ebase) + build_r3000_tlb_refill_handler(); if (!run_once) { + if (!cpu_has_local_ebase) + build_r3000_tlb_refill_handler(); build_r3000_tlb_load_handler(); build_r3000_tlb_store_handler(); build_r3000_tlb_modify_handler(); @@ -2191,9 +2194,12 @@ void __cpuinit build_tlb_refill_handler(void) build_r4000_tlb_load_handler(); build_r4000_tlb_store_handler(); build_r4000_tlb_modify_handler(); + if (!cpu_has_local_ebase) + build_r4000_tlb_refill_handler(); run_once++; } - build_r4000_tlb_refill_handler(); + if (cpu_has_local_ebase) + build_r4000_tlb_refill_handler(); } } -- 1.7.7.3 -- 江苏中科梦兰电子科技有限公司 软件部 陈华才 E-mail: che...@lemote.com Web: http://www.lemote.com/ Add: 江苏省常熟市虞山镇梦兰工业园 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V8 00/13] MIPS: Add Loongson-3 based machines support
John, for patch 3 reworking, please tell me whether my understanding is correct? (make cpu_has_coherent_cache as a runtime option rather than config option), thank you very much. > Hi, > > ok, i dropped the series for now from my tree > > John > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V8 00/13] MIPS: Add Loongson-3 based machines support
John, for patch 3 reworking, please tell me whether my understanding is correct? (make cpu_has_coherent_cache as a runtime option rather than config option), thank you very much. Hi, ok, i dropped the series for now from my tree John -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V8 00/13] MIPS: Add Loongson-3 based machines support
Hi, John, Compiling fails because __dev* prefix should be removed due to upstream changes. You said that patch 3 need to be rework, but I don't know how to improve... Could you please tell me where is unsane? Maybe you means I should make cpu_has_coherent_cache a runtime value rather than a config option as follows? 1, remove CONFIG_CPU_SUPPORTS_COHERENT_CACHE 2, in arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h #define cpu_has_coherent_cache 1 3, in arch/mips/include/asm/cpu.h #define MIPS_CPU_COHERENT_CACHE 0x0008 4, in arch/mips/include/asm/cpu-features.h #ifndef cpu_has_coherent_cache #define cpu_has_coherent_cache (cpu_data[0].options & MIPS_CPU_INCLUSIVE_CACHES) #endif Besides, the SMP code has a bug to fix (IPI sending) and patch 3, patch 6 need to update. So I think a V9 is needed :( > On 25/01/13 01:15, 陈华才 wrote: >> ok, I'll prepare v9 of this seris in these days. >>> > > > Please dont send v9 > > read my mail and compile / runtime test the tree please > > only patch 3 needs to be reworked and an update for the "MIPS: Loongson > 3: Add HT-linked PCI support." needs to e made > > John > >>>> >>>> Huacai Chen(13): >>>>MIPS: Loongson: Add basic Loongson-3 definition. >>>>MIPS: Loongson: Add basic Loongson-3 CPU support. >>>>MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature. >>>>MIPS: Loongson 3: Add Lemote-3A machtypes definition. >>>>MIPS: Loongson: Add UEFI-like firmware interface support. >>>>MIPS: Loongson 3: Add HT-linked PCI support. >>>>MIPS: Loongson 3: Add IRQ init and dispatch support. >>>>MIPS: Loongson 3: Add serial port support. >>>>MIPS: Loongson: Add swiotlb to support big memory (>4GB). >>>>MIPS: Loongson: Add Loongson-3 Kconfig options. >>>>MIPS: Loongson 3: Add Loongson-3 SMP support. >>>>MIPS: Loongson 3: Add CPU hotplug support. >>>>MIPS: Loongson: Add a Loongson-3 default config file. >>>> >>>> Signed-off-by: Huacai Chen >>>> Signed-off-by: Hongliang Tao >>>> Signed-off-by: Hua Yan >>>> --- >>> >>> Hi, >>> >>> I have added all patches apart from 3/13 to my queue. >>> >>> I believe "MIPS: Loongson: Introduce and use cpu_has_coherent_cache >>> feature." should e rewritten in a saner way. >>> >>> Please compile and runtime test the tree before I send it to Ralf >>> --> >>> http://git.linux-mips.org/?p=john/linux-john.git;a=shortlog;h=refs/heads/mips-next-3.9 >>> >>> I cleaned up a few minor whitespace errors while merging. >>> >>> http://patchwork.linux-mips.org/patch/4547/ has a few comments. Please >>> prepare a patch asap to address those so i can fold it into the series. >>> >>> John >>> >> >> > > -- 江苏中科梦兰电子科技有限公司 软件部 陈华才 E-mail: che...@lemote.com Web: http://www.lemote.com/ Add: 江苏省常熟市虞山镇梦兰工业园 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V8 00/13] MIPS: Add Loongson-3 based machines support
Hi, John, Compiling fails because __dev* prefix should be removed due to upstream changes. You said that patch 3 need to be rework, but I don't know how to improve... Could you please tell me where is unsane? Maybe you means I should make cpu_has_coherent_cache a runtime value rather than a config option as follows? 1, remove CONFIG_CPU_SUPPORTS_COHERENT_CACHE 2, in arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h #define cpu_has_coherent_cache 1 3, in arch/mips/include/asm/cpu.h #define MIPS_CPU_COHERENT_CACHE 0x0008 4, in arch/mips/include/asm/cpu-features.h #ifndef cpu_has_coherent_cache #define cpu_has_coherent_cache (cpu_data[0].options MIPS_CPU_INCLUSIVE_CACHES) #endif Besides, the SMP code has a bug to fix (IPI sending) and patch 3, patch 6 need to update. So I think a V9 is needed :( On 25/01/13 01:15, 陈华才 wrote: ok, I'll prepare v9 of this seris in these days. Please dont send v9 read my mail and compile / runtime test the tree please only patch 3 needs to be reworked and an update for the MIPS: Loongson 3: Add HT-linked PCI support. needs to e made John Huacai Chen(13): MIPS: Loongson: Add basic Loongson-3 definition. MIPS: Loongson: Add basic Loongson-3 CPU support. MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature. MIPS: Loongson 3: Add Lemote-3A machtypes definition. MIPS: Loongson: Add UEFI-like firmware interface support. MIPS: Loongson 3: Add HT-linked PCI support. MIPS: Loongson 3: Add IRQ init and dispatch support. MIPS: Loongson 3: Add serial port support. MIPS: Loongson: Add swiotlb to support big memory (4GB). MIPS: Loongson: Add Loongson-3 Kconfig options. MIPS: Loongson 3: Add Loongson-3 SMP support. MIPS: Loongson 3: Add CPU hotplug support. MIPS: Loongson: Add a Loongson-3 default config file. Signed-off-by: Huacai Chenche...@lemote.com Signed-off-by: Hongliang Taota...@lemote.com Signed-off-by: Hua Yany...@lemote.com --- Hi, I have added all patches apart from 3/13 to my queue. I believe MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature. should e rewritten in a saner way. Please compile and runtime test the tree before I send it to Ralf -- http://git.linux-mips.org/?p=john/linux-john.git;a=shortlog;h=refs/heads/mips-next-3.9 I cleaned up a few minor whitespace errors while merging. http://patchwork.linux-mips.org/patch/4547/ has a few comments. Please prepare a patch asap to address those so i can fold it into the series. John -- 江苏中科梦兰电子科技有限公司 软件部 陈华才 E-mail: che...@lemote.com Web: http://www.lemote.com/ Add: 江苏省常熟市虞山镇梦兰工业园 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V8 00/13] MIPS: Add Loongson-3 based machines support
ok, I'll prepare v9 of this seris in these days. > >> >> Huacai Chen(13): >> MIPS: Loongson: Add basic Loongson-3 definition. >> MIPS: Loongson: Add basic Loongson-3 CPU support. >> MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature. >> MIPS: Loongson 3: Add Lemote-3A machtypes definition. >> MIPS: Loongson: Add UEFI-like firmware interface support. >> MIPS: Loongson 3: Add HT-linked PCI support. >> MIPS: Loongson 3: Add IRQ init and dispatch support. >> MIPS: Loongson 3: Add serial port support. >> MIPS: Loongson: Add swiotlb to support big memory (>4GB). >> MIPS: Loongson: Add Loongson-3 Kconfig options. >> MIPS: Loongson 3: Add Loongson-3 SMP support. >> MIPS: Loongson 3: Add CPU hotplug support. >> MIPS: Loongson: Add a Loongson-3 default config file. >> >> Signed-off-by: Huacai Chen >> Signed-off-by: Hongliang Tao >> Signed-off-by: Hua Yan >> --- > > Hi, > > I have added all patches apart from 3/13 to my queue. > > I believe "MIPS: Loongson: Introduce and use cpu_has_coherent_cache > feature." should e rewritten in a saner way. > > Please compile and runtime test the tree before I send it to Ralf > --> > http://git.linux-mips.org/?p=john/linux-john.git;a=shortlog;h=refs/heads/mips-next-3.9 > > I cleaned up a few minor whitespace errors while merging. > > http://patchwork.linux-mips.org/patch/4547/ has a few comments. Please > prepare a patch asap to address those so i can fold it into the series. > > John > -- 江苏中科梦兰电子科技有限公司 软件部 陈华才 E-mail: che...@lemote.com Web: http://www.lemote.com/ Add: 江苏省常熟市虞山镇梦兰工业园 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V8 00/13] MIPS: Add Loongson-3 based machines support
ok, I'll prepare v9 of this seris in these days. Huacai Chen(13): MIPS: Loongson: Add basic Loongson-3 definition. MIPS: Loongson: Add basic Loongson-3 CPU support. MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature. MIPS: Loongson 3: Add Lemote-3A machtypes definition. MIPS: Loongson: Add UEFI-like firmware interface support. MIPS: Loongson 3: Add HT-linked PCI support. MIPS: Loongson 3: Add IRQ init and dispatch support. MIPS: Loongson 3: Add serial port support. MIPS: Loongson: Add swiotlb to support big memory (4GB). MIPS: Loongson: Add Loongson-3 Kconfig options. MIPS: Loongson 3: Add Loongson-3 SMP support. MIPS: Loongson 3: Add CPU hotplug support. MIPS: Loongson: Add a Loongson-3 default config file. Signed-off-by: Huacai Chenche...@lemote.com Signed-off-by: Hongliang Taota...@lemote.com Signed-off-by: Hua Yany...@lemote.com --- Hi, I have added all patches apart from 3/13 to my queue. I believe MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature. should e rewritten in a saner way. Please compile and runtime test the tree before I send it to Ralf -- http://git.linux-mips.org/?p=john/linux-john.git;a=shortlog;h=refs/heads/mips-next-3.9 I cleaned up a few minor whitespace errors while merging. http://patchwork.linux-mips.org/patch/4547/ has a few comments. Please prepare a patch asap to address those so i can fold it into the series. John -- 江苏中科梦兰电子科技有限公司 软件部 陈华才 E-mail: che...@lemote.com Web: http://www.lemote.com/ Add: 江苏省常熟市虞山镇梦兰工业园 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
> On Wed, 2012-10-24 at 20:34 +0800, 陈华才 wrote: >> I see, this is an arch-specific bug, sorry for my carelessness and thank >> you for your tips. > > What arch are you using? And what exactly did the arch do wrong? Most of > the code involved seems to be common code. > > Going by c0_compare_interrupt, this is some MIPS device. > Yes, I'm use MIPS, In a place which local_irq_save()/local_irq_restore() should be used, I use local_irq_disable()/local_irq_enable() by mistake. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
> On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote: >> We found poweroff sometimes fails on our computers, so we have the >> lock debug options configured. Then, when we do poweroff or take a >> cpu down via cpu-hotplug, kernel complain as below. To resove this, >> we modify sched_ttwu_pending(), disable the local irq when acquire >> rq->lock. >> >> [ 83.066406] = >> [ 83.066406] [ INFO: inconsistent lock state ] >> [ 83.066406] 3.5.0-3.lemote #428 Not tainted >> [ 83.066406] - >> [ 83.066406] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. >> [ 83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes: >> [ 83.066406] (>lock){?.-.-.}, at: [] >> sched_ttwu_pending+0x64/0x98 >> [ 83.066406] {IN-HARDIRQ-W} state was registered at: >> [ 83.066406] [] __lock_acquire+0x80c/0x1cc0 >> [ 83.066406] [] lock_acquire+0x60/0x9c >> [ 83.066406] [] _raw_spin_lock+0x3c/0x50 >> [ 83.066406] [] scheduler_tick+0x48/0x178 >> [ 83.066406] [] update_process_times+0x54/0x70 >> [ 83.066406] [] tick_handle_periodic+0x2c/0x9c >> [ 83.066406] [] c0_compare_interrupt+0x8c/0x94 >> [ 83.066406] [] handle_irq_event_percpu+0x7c/0x248 >> [ 83.066406] [] handle_percpu_irq+0x8c/0xc0 >> [ 83.066406] [] generic_handle_irq+0x48/0x58 >> [ 83.066406] [] do_IRQ+0x18/0x24 >> [ 83.066406] [] mach_irq_dispatch+0xe4/0x124 >> [ 83.066406] [] ret_from_irq+0x0/0x4 >> [ 83.066406] [] console_unlock+0x3e8/0x4c0 >> [ 83.066406] [] con_init+0x370/0x398 >> [ 83.066406] [] console_init+0x34/0x50 >> [ 83.066406] [] start_kernel+0x2f8/0x4e0 >> [ 83.066406] irq event stamp: 971 >> [ 83.066406] hardirqs last enabled at (971): [] >> local_flush_tlb_all+0x134/0x17c >> [ 83.066406] hardirqs last disabled at (970): [] >> local_flush_tlb_all+0x48/0x17c >> [ 83.066406] softirqs last enabled at (0): [] >> copy_process+0x510/0x117c >> [ 83.066406] softirqs last disabled at (0): [< (null)>] >> (null) >> [ 83.066406] >> [ 83.066406] other info that might help us debug this: >> [ 83.066406] Possible unsafe locking scenario: >> [ 83.066406] >> [ 83.066406]CPU0 >> [ 83.066406] >> [ 83.066406] lock(>lock); >> [ 83.066406] >> [ 83.066406] lock(>lock); >> [ 83.066406] >> [ 83.066406] *** DEADLOCK *** >> [ 83.066406] >> [ 83.066406] no locks held by migration/1/7. >> [ 83.066406] >> [ 83.066406] stack backtrace: >> [ 83.066406] Call Trace: >> [ 83.066406] [] dump_stack+0x8/0x34 >> [ 83.066406] [] print_usage_bug+0x2ec/0x314 >> [ 83.066406] [] mark_lock+0x3fc/0x774 >> [ 83.066406] [] __lock_acquire+0x8a8/0x1cc0 >> [ 83.066406] [] lock_acquire+0x60/0x9c >> [ 83.066406] [] _raw_spin_lock+0x3c/0x50 >> [ 83.066406] [] sched_ttwu_pending+0x64/0x98 >> [ 83.066406] [] migration_call+0x10c/0x2e0 >> [ 83.066406] [] notifier_call_chain+0x44/0x94 >> [ 83.066406] [] __cpu_notify+0x30/0x5c >> [ 83.066406] [] take_cpu_down+0x5c/0x70 >> [ 83.066406] [] stop_machine_cpu_stop+0x104/0x1e8 >> [ 83.066406] [] cpu_stopper_thread+0x110/0x1ac >> [ 83.066406] [] kthread+0x88/0x90 >> [ 83.066406] [] kernel_thread_helper+0x10/0x18 > > Weird, that's from a CPU_DYING call, I thought those were with IRQs > disabled. > > Look at how __stop_machine() calls the function with IRQs disabled for ! > stop_machine_initialized or !SMP. Also stop_machine_cpu_stop() seems to > disabled interrupts, so how do we end up calling take_cpu_down() with > IRQs enabled? > > That simply doesn't make any sense. I see, this is an arch-specific bug, sorry for my carelessness and thank you for your tips. > >> Signed-off-by: Huacai Chen >> --- >> kernel/sched/core.c |5 +++-- >> 1 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 36e2666..703754a 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void) >> { >> struct rq *rq = this_rq(); >> struct llist_node *llist = llist_del_all(>wake_list); >> +unsigned long flags; >> struct task_struct *p; >> >> -raw_spin_lock(>lock); >> +raw_spin_lock_irqsave(>lock, flags); >> >> while (llist) { >> p = llist_entry(llist, struct task_struct, wake_entry); >> @@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void) >> ttwu_do_activate(rq, p, 0); >> } >> >> -raw_spin_unlock(>lock); >> +raw_spin_unlock_irqrestore(>lock, flags); >> } >> >> void scheduler_ipi(void) > > > That's wrong though, you add the cost to the common case instead of the > hardly ever ran hotplug case. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote: We found poweroff sometimes fails on our computers, so we have the lock debug options configured. Then, when we do poweroff or take a cpu down via cpu-hotplug, kernel complain as below. To resove this, we modify sched_ttwu_pending(), disable the local irq when acquire rq-lock. [ 83.066406] = [ 83.066406] [ INFO: inconsistent lock state ] [ 83.066406] 3.5.0-3.lemote #428 Not tainted [ 83.066406] - [ 83.066406] inconsistent {IN-HARDIRQ-W} - {HARDIRQ-ON-W} usage. [ 83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 83.066406] (rq-lock){?.-.-.}, at: [802585ac] sched_ttwu_pending+0x64/0x98 [ 83.066406] {IN-HARDIRQ-W} state was registered at: [ 83.066406] [8027c9ac] __lock_acquire+0x80c/0x1cc0 [ 83.066406] [8027e3d0] lock_acquire+0x60/0x9c [ 83.066406] [8074ba04] _raw_spin_lock+0x3c/0x50 [ 83.066406] [8025a2fc] scheduler_tick+0x48/0x178 [ 83.066406] [8023b334] update_process_times+0x54/0x70 [ 83.066406] [80277568] tick_handle_periodic+0x2c/0x9c [ 83.066406] [8020a818] c0_compare_interrupt+0x8c/0x94 [ 83.066406] [8029ec8c] handle_irq_event_percpu+0x7c/0x248 [ 83.066406] [802a2774] handle_percpu_irq+0x8c/0xc0 [ 83.066406] [8029e2c8] generic_handle_irq+0x48/0x58 [ 83.066406] [80205c04] do_IRQ+0x18/0x24 [ 83.066406] [802016e4] mach_irq_dispatch+0xe4/0x124 [ 83.066406] [80203ca0] ret_from_irq+0x0/0x4 [ 83.066406] [8022d114] console_unlock+0x3e8/0x4c0 [ 83.066406] [811ff0d0] con_init+0x370/0x398 [ 83.066406] [811fe3e0] console_init+0x34/0x50 [ 83.066406] [811e4844] start_kernel+0x2f8/0x4e0 [ 83.066406] irq event stamp: 971 [ 83.066406] hardirqs last enabled at (971): [8021c384] local_flush_tlb_all+0x134/0x17c [ 83.066406] hardirqs last disabled at (970): [8021c298] local_flush_tlb_all+0x48/0x17c [ 83.066406] softirqs last enabled at (0): [802298a4] copy_process+0x510/0x117c [ 83.066406] softirqs last disabled at (0): [ (null)] (null) [ 83.066406] [ 83.066406] other info that might help us debug this: [ 83.066406] Possible unsafe locking scenario: [ 83.066406] [ 83.066406]CPU0 [ 83.066406] [ 83.066406] lock(rq-lock); [ 83.066406] Interrupt [ 83.066406] lock(rq-lock); [ 83.066406] [ 83.066406] *** DEADLOCK *** [ 83.066406] [ 83.066406] no locks held by migration/1/7. [ 83.066406] [ 83.066406] stack backtrace: [ 83.066406] Call Trace: [ 83.066406] [80747544] dump_stack+0x8/0x34 [ 83.066406] [8027ba04] print_usage_bug+0x2ec/0x314 [ 83.066406] [8027be28] mark_lock+0x3fc/0x774 [ 83.066406] [8027ca48] __lock_acquire+0x8a8/0x1cc0 [ 83.066406] [8027e3d0] lock_acquire+0x60/0x9c [ 83.066406] [8074ba04] _raw_spin_lock+0x3c/0x50 [ 83.066406] [802585ac] sched_ttwu_pending+0x64/0x98 [ 83.066406] [80745ff4] migration_call+0x10c/0x2e0 [ 83.066406] [80253110] notifier_call_chain+0x44/0x94 [ 83.066406] [8022eae0] __cpu_notify+0x30/0x5c [ 83.066406] [8072b598] take_cpu_down+0x5c/0x70 [ 83.066406] [80299ba4] stop_machine_cpu_stop+0x104/0x1e8 [ 83.066406] [802997cc] cpu_stopper_thread+0x110/0x1ac [ 83.066406] [8024c940] kthread+0x88/0x90 [ 83.066406] [80205ee4] kernel_thread_helper+0x10/0x18 Weird, that's from a CPU_DYING call, I thought those were with IRQs disabled. Look at how __stop_machine() calls the function with IRQs disabled for ! stop_machine_initialized or !SMP. Also stop_machine_cpu_stop() seems to disabled interrupts, so how do we end up calling take_cpu_down() with IRQs enabled? That simply doesn't make any sense. I see, this is an arch-specific bug, sorry for my carelessness and thank you for your tips. Signed-off-by: Huacai Chen che...@lemote.com --- kernel/sched/core.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 36e2666..703754a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void) { struct rq *rq = this_rq(); struct llist_node *llist = llist_del_all(rq-wake_list); +unsigned long flags; struct task_struct *p; -raw_spin_lock(rq-lock); +raw_spin_lock_irqsave(rq-lock, flags); while (llist) { p = llist_entry(llist, struct task_struct, wake_entry); @@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void) ttwu_do_activate(rq, p, 0); } -raw_spin_unlock(rq-lock); +raw_spin_unlock_irqrestore(rq-lock,
Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug
On Wed, 2012-10-24 at 20:34 +0800, 陈华才 wrote: I see, this is an arch-specific bug, sorry for my carelessness and thank you for your tips. What arch are you using? And what exactly did the arch do wrong? Most of the code involved seems to be common code. Going by c0_compare_interrupt, this is some MIPS device. Yes, I'm use MIPS, In a place which local_irq_save()/local_irq_restore() should be used, I use local_irq_disable()/local_irq_enable() by mistake. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch
So I still think that "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch... > On 10/06/2012 01:23 AM, Peter Zijlstra wrote: >> On Fri, 2012-10-05 at 10:10 -0700, Jonathan Nieder wrote: >>> Peter Zijlstra wrote: On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote: >>> > I'm puzzled as well. Any ideas if I should do anything here or not? So I think the current v3.5.5 code is fine. >>> >>> Now I'm puzzled. You wrote: >>> >>> | However, since we don't restart the tick, we won't be sampling load >>> muck >>> | and calling calc_load_exit_idle() from there is bound to confuse >>> state. >>> >>> Doesn't that mean 900404e5d201 "sched: Add missing call to >>> calc_load_exit_idle()" which is part of 3.5.5 was problematic? Or >>> did I just miscount the number of "not"s? >> >> >> Argh, yeah, so now I've managed to confuse everyone I'm afraid. >> >> You are right, v3.5.5 has one calc_load_exit_idle() too many, the one in >> tick_nohz_update_jiffies() needs to go. >> >> Sorry.. I got entirely confused figuring out wth happened with 3.6. >> > High loadavg reported with v3.6, and I just checked the upstream code, > which puzzled many people. Sorry for that~ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Seems like sched: Add missing call to calc_load_exit_idle() should be reverted in 3.5 branch
So I still think that sched: Add missing call to calc_load_exit_idle() should be reverted in 3.5 branch... On 10/06/2012 01:23 AM, Peter Zijlstra wrote: On Fri, 2012-10-05 at 10:10 -0700, Jonathan Nieder wrote: Peter Zijlstra wrote: On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote: I'm puzzled as well. Any ideas if I should do anything here or not? So I think the current v3.5.5 code is fine. Now I'm puzzled. You wrote: | However, since we don't restart the tick, we won't be sampling load muck | and calling calc_load_exit_idle() from there is bound to confuse state. Doesn't that mean 900404e5d201 sched: Add missing call to calc_load_exit_idle() which is part of 3.5.5 was problematic? Or did I just miscount the number of nots? Argh, yeah, so now I've managed to confuse everyone I'm afraid. You are right, v3.5.5 has one calc_load_exit_idle() too many, the one in tick_nohz_update_jiffies() needs to go. Sorry.. I got entirely confused figuring out wth happened with 3.6. High loadavg reported with v3.6, and I just checked the upstream code, which puzzled many people. Sorry for that~ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Seems like "sched: Add missing call to calc_load_exit_idle()" should be reverted in 3.5 branch
My opinion: The original patch "sched/nohz: Rewrite and fix load-avg computation -- again" is designed for 3.5-branch and calc_load_exit_idle() is called directly in tick_nohz_idle_exit(). So, the patch can be fully applied in 3.5 and doesn't need to fix (Add the missing call), but not fully applied in 3.6 (because code splitted) and need to fix. > On Thu, Oct 04, 2012 at 08:31:59PM +0200, Peter Zijlstra wrote: >> On Thu, 2012-10-04 at 10:46 -0700, Greg Kroah-Hartman wrote: >> > On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote: >> > > Hi, Greg >> > > >> > > I found that Linux-3.5.5 accept this commit "sched: Add missing call >> > > to calc_load_exit_idle()" but I think this isn't needed. Because >> > > "5167e8d5417b sched/nohz: Rewrite and fix load-avg computation -- >> > > again not fully applied" is true for 3.6 branch, but not for 3.5 >> > > branch. >> > >> > But 5167e8d5417b is in 3.5, so shouldn't this commit still be >> necessary? >> > >> > > In 3.5 branch, calc_load_exit_idle() is already called in >> > > tick_nohz_idle_exit(), it doesn't need to be called at >> > > tick_nohz_update_jiffies() again. In 3.6 branch, some code of >> > > tick_nohz_idle_exit() is splitted to tick_nohz_restart_sched_tick() >> > > and calc_load_exit_idle() is missing by accident, so commit "sched: >> > > Add missing call to calc_load_exit_idle()" is needed. >> > >> > So this really should be dropped from 3.5? Charles, Peter, Ingo, any >> > thoughts here? >> >> Bah, lots of code movement there recently.. let me try and untangle all >> that afresh.. /me checks out v3.5.5. >> >> OK, assuming ->tick_stopped means what the label says, then we only want >> to call calc_load_enter_idle() when it flips to 1 and >> calc_load_exit_idle() when it flips back to 0, such that when an actual >> tick happens its got correct state. >> >> Now the actual patch "5167e8d5417b sched/nohz: Rewrite and fix load-avg >> computation -- again not fully applied" modifies >> tick_nohz_restart_sched_tick() which doesn't appear to exist in v3.5.5 >> and the patch fobbed it into tick_nohz_update_jiffies() which is called >> from interrupt entry when nohz-idle so that the interrupt (and possible >> tailing softirq) see a valid jiffies count. >> >> However, since we don't restart the tick, we won't be sampling load muck >> and calling calc_load_exit_idle() from there is bound to confuse state. >> >> I hope.. damn this code ;-) >> >> I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c >> contains both hunks, but in that case the fixup 749c8814f0 doesn't make >> sense, not can I find anything in merge commits using: >> >> git log -S calc_load_exit_idle kernel/time/tick-sched.c >> >> /me puzzled > > I'm puzzled as well. Any ideas if I should do anything here or not? > > greg k-h > -- 江苏中科梦兰电子科技有限公司 软件部 陈华才 E-mail: che...@lemote.com Web: http://www.lemote.com/ Add: 江苏省常熟市虞山镇梦兰工业园 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Seems like sched: Add missing call to calc_load_exit_idle() should be reverted in 3.5 branch
My opinion: The original patch sched/nohz: Rewrite and fix load-avg computation -- again is designed for 3.5-branch and calc_load_exit_idle() is called directly in tick_nohz_idle_exit(). So, the patch can be fully applied in 3.5 and doesn't need to fix (Add the missing call), but not fully applied in 3.6 (because code splitted) and need to fix. On Thu, Oct 04, 2012 at 08:31:59PM +0200, Peter Zijlstra wrote: On Thu, 2012-10-04 at 10:46 -0700, Greg Kroah-Hartman wrote: On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote: Hi, Greg I found that Linux-3.5.5 accept this commit sched: Add missing call to calc_load_exit_idle() but I think this isn't needed. Because 5167e8d5417b sched/nohz: Rewrite and fix load-avg computation -- again not fully applied is true for 3.6 branch, but not for 3.5 branch. But 5167e8d5417b is in 3.5, so shouldn't this commit still be necessary? In 3.5 branch, calc_load_exit_idle() is already called in tick_nohz_idle_exit(), it doesn't need to be called at tick_nohz_update_jiffies() again. In 3.6 branch, some code of tick_nohz_idle_exit() is splitted to tick_nohz_restart_sched_tick() and calc_load_exit_idle() is missing by accident, so commit sched: Add missing call to calc_load_exit_idle() is needed. So this really should be dropped from 3.5? Charles, Peter, Ingo, any thoughts here? Bah, lots of code movement there recently.. let me try and untangle all that afresh.. /me checks out v3.5.5. OK, assuming -tick_stopped means what the label says, then we only want to call calc_load_enter_idle() when it flips to 1 and calc_load_exit_idle() when it flips back to 0, such that when an actual tick happens its got correct state. Now the actual patch 5167e8d5417b sched/nohz: Rewrite and fix load-avg computation -- again not fully applied modifies tick_nohz_restart_sched_tick() which doesn't appear to exist in v3.5.5 and the patch fobbed it into tick_nohz_update_jiffies() which is called from interrupt entry when nohz-idle so that the interrupt (and possible tailing softirq) see a valid jiffies count. However, since we don't restart the tick, we won't be sampling load muck and calling calc_load_exit_idle() from there is bound to confuse state. I hope.. damn this code ;-) I can't find wtf went wrong either, the initial patch 5167e8d5417bf5c contains both hunks, but in that case the fixup 749c8814f0 doesn't make sense, not can I find anything in merge commits using: git log -S calc_load_exit_idle kernel/time/tick-sched.c /me puzzled I'm puzzled as well. Any ideas if I should do anything here or not? greg k-h -- 江苏中科梦兰电子科技有限公司 软件部 陈华才 E-mail: che...@lemote.com Web: http://www.lemote.com/ Add: 江苏省常熟市虞山镇梦兰工业园 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/