Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 9/1/20 7:23 AM, Heinrich Schuchardt wrote: > On 01.09.20 13:14, Sean Anderson wrote: >> On 9/1/20 6:51 AM, Heinrich Schuchardt wrote: >>> When resetting the sipeed_maix_bitm_defconfig without the patch I see a >>> crash: >>> >>> => reset >>> resetting ... >>> Unhandled exception: Illegal instruction >>> EPC: RA: 821a TVAL: 0002d947c509 >>> ### ERROR ### Please RESET the board ### >> >> Hm, interesting. I don't see that error. What commit are you testing with? > > origin/master 23e333a5c083a000d0cabc > > sipeed_maix_bitm_defconfig plus > > CONFIG_DEBUG_UART=y > CONFIG_DEBUG_UART_SIFIVE=y > CONFIG_DEBUG_UART_BASE=0x3800 > CONFIG_DEBUG_UART_CLOCK=39000 > > on a Maixduino. Ok, I can reproduce that crash, but only with the debug uart enabled. I suspect it happens every time, but we only get an error with the debug uart. I enabled CONFIG_SHOW_REGS, but there was some interference between the output and the (next) instance of U-Boot. > > Best regards > > Heinrich >> >>> >>> Objdump shows that it is related to the secondary_hart_loop: >>> >>> j secondary_hart_loop >>> 821a: b7edj 8204 >>> It is actually related to the previous instruction 8216: 56e000efjal ra,8784 secondary_hart_loop is never reached because handle_ipi jumps to 0x0. I decided to do a bit more experimentation with the following patch diff --git i/arch/riscv/lib/smp.c w/arch/riscv/lib/smp.c index ac22136314..8be320f6ae 100644 --- i/arch/riscv/lib/smp.c +++ w/arch/riscv/lib/smp.c @@ -54,6 +54,10 @@ static int send_ipi_many(struct ipi_data *ipi, int wait) gd->arch.ipi[reg].arg0 = ipi->arg0; gd->arch.ipi[reg].arg1 = ipi->arg1; + printf("sending %lx(%lx, %lx) to hart %u\n", ipi->addr, + ipi->arg0, ipi->arg1, reg); + + __smp_mb(); ret = riscv_send_ipi(reg); if (ret) { pr_err("Cannot send IPI to hart %d\n", reg); @@ -77,15 +81,23 @@ void handle_ipi(ulong hart) { int ret; void (*smp_function)(ulong hart, ulong arg0, ulong arg1); + int ipi; if (hart >= CONFIG_NR_CPUS) return; + riscv_get_ipi(hart, ); __smp_mb(); smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr; invalidate_icache_all(); + printf("calling %p(%lx, %lx) on hart %lu, ipi = %d\n", smp_function, + gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1, hart, ipi); + + if (!ipi) + return; + /* * Clear the IPI to acknowledge the request before jumping to the * requested function. With this patch I get the following output => reset resetting ... calling (0, 0) on hart 1, ipi = 0 calling (0, 0) on hart 1, ipi = 0 calling (0, 0) on hart 1, ipi = 0 calling (0, 0) on hart 1, ipi = 0 calling (0, 0) on hart 1, ipi = 1 U-Boot 2020.10-rc3-00033-g23e333a5c0-dirty (Sep 01 2020 - 09:55:43 -0400) DRAM: 8 MiB sending 805cc1fa(80589040, 8058cdd0) to hart 1 In:serial@3800 Out: serial@3800 Err: serial@3800 => There is no "calling" line after the "sending" line because hart 1 is hung from jumping to 0x0. >From this log, it is clear that something other than smp_call_function is sending an ipi, and, further, that there is a delay between when the IRQ_M_SOFT bit is asserted and when the clint has a pending interrupt. To further investigate this, I enabled debug output. This causes some garbling in the console, as both harts try and write at the same time. To alleviate this, I applied the following patch which adds locking to puts and putc (NB this only works on K210) diff --git i/common/console.c w/common/console.c index 8e50af1f9d..31622df9b3 100644 --- i/common/console.c +++ w/common/console.c @@ -515,7 +515,35 @@ static inline void pre_console_puts(const char *s) {} static inline void print_pre_console_buffer(int flushpoint) {} #endif -void putc(const char c) +/* Static location which will survive across resets */ +u32 *console_spinlock = (void *)0x8040; + +void console_lock(void) +{ + u32 tmp = 1; + + /* +* We only ever write 1 to the lock, so if anything else is there then +* this is the first access to the spinlock, and we have thus acquired +* it. This will not work if the spinlock is ever (un)initialized to +* exactly 1, but I'll take my chances :) +*/ + while (tmp == 1) + asm volatile ("amoswap.w.aq %0, %0, %1" + : "+r" (tmp), "+A" (console_spinlock) + : + : "memory"); +} + +void console_unlock(void) +{ + asm volatile ("amoswap.w.rl x0, x0, %0" + : "+A"
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 01.09.20 13:14, Sean Anderson wrote: > On 9/1/20 6:51 AM, Heinrich Schuchardt wrote: >> On 8/18/20 11:00 AM, Heinrich Schuchardt wrote: >>> On 11.08.20 12:32, Sean Anderson wrote: On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: > On 11.08.20 08:20, Rick Chen wrote: >> Hi Heinrich >> >>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] >>> Sent: Tuesday, August 11, 2020 11:57 AM >>> To: Rick Jian-Zhi Chen(陳建志) >>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel >>> Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt >>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() >>> >>> At least on the Kendryte K210: >>> >>> gd->arch.available_harts= 0x0003 arch.ipi[0].addr= >>> gd->0x arch.ipi[0].arg0= 0x >>> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr= >>> gd->0x arch.ipi[1].arg0= 0x >>> gd->arch.ipi[1].arg1= 0x >>> >>> We should not jump to 0x0 to handle an interrupt. >> >> Can you explain why K210 be affected by it ? > > I have been running U-Boot on the MaixDuino. > > Without this patch secondary_hart_loop() is reached only once. With the > patch it is reached several thousand times. Hm, interesting. To me, this is a symptom of something else going terribly wrong. I originally had this check in place so that it would be easier to detect these sorts of errors. I don't think this is the correct fix, however. We should really try and find the root cause of the bug. > I would not expect NULL to contain any code that should be executed by > the secondary hart. See doc/board/sipeed/maix.rst: > > Address Size Description > == = === > 0x 0x1000 debug > 0x1000 0x1000 rom > 0x0200 0xC000 clint > >> >>> >>> Signed-off-by: Heinrich Schuchardt >>> --- >>> arch/riscv/lib/smp.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index >>> ac22136314..d725fa32e8 100644 >>> --- a/arch/riscv/lib/smp.c >>> +++ b/arch/riscv/lib/smp.c >>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) >>> return; >>> } >>> >>> + if (!smp_function) >>> + return; >>> smp_function(hart, gd->arch.ipi[hart].arg0, >>> gd->arch.ipi[hart].arg1); } >> >> I remember Sean add this check in >> [v10,14/21] riscv: Clean up IPI initialization code >> . And I ask him to remove. >> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ > > Your comment was: "Please remove the sanity check. If zero address is > the intended jump point, then system will work abnormal." > > The only place where gd->arch.ipi[hart].addr is set is: > > arch/riscv/lib/smp.c:53 send_ipi_many(): > gd->arch.ipi[reg].addr = ipi->addr; > > send_ipi_many() is only called in smp_call_function(). > > So the line > > smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); > > can only work if smp_function() has been called before this point at any > time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi in [2] as part of a larger revert. The actual revert-of-revert is in [3], though it really should be split out into its own patch. The original commit making these changes is [4]. >>> >>> Patch series [1] makes not difference. You still have >>> >>> gd->arch.ipi[0].addr= 0x >>> gd->arch.ipi[1].addr= 0x >>> >>> and the secondary hart jumps to NULL and never returns. >>> Note that the situation before [4] was that the IPI got initialized by the first hart to call any IPI function. If that hart was not the boot hart, Bad Things started to happen (e.g. race conditions, memory corruption, etc). In that patch, I moved the initialization to its own function so we would not have any race conditions and instead have (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of course, when everything is working properly, we should get neither of these. [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/ [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/ [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/ > Why do we call handle_ipi() on the secondary
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 9/1/20 6:51 AM, Heinrich Schuchardt wrote: > On 8/18/20 11:00 AM, Heinrich Schuchardt wrote: >> On 11.08.20 12:32, Sean Anderson wrote: >>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: On 11.08.20 08:20, Rick Chen wrote: > Hi Heinrich > >> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] >> Sent: Tuesday, August 11, 2020 11:57 AM >> To: Rick Jian-Zhi Chen(陳建志) >> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel >> Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt >> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() >> >> At least on the Kendryte K210: >> >> gd->arch.available_harts= 0x0003 arch.ipi[0].addr= >> gd->0x arch.ipi[0].arg0= 0x >> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr= >> gd->0x arch.ipi[1].arg0= 0x >> gd->arch.ipi[1].arg1= 0x >> >> We should not jump to 0x0 to handle an interrupt. > > Can you explain why K210 be affected by it ? I have been running U-Boot on the MaixDuino. Without this patch secondary_hart_loop() is reached only once. With the patch it is reached several thousand times. >>> >>> Hm, interesting. To me, this is a symptom of something else going >>> terribly wrong. I originally had this check in place so that it would >>> be easier to detect these sorts of errors. I don't think this is the >>> correct fix, however. We should really try and find the root cause of >>> the bug. >>> I would not expect NULL to contain any code that should be executed by the secondary hart. See doc/board/sipeed/maix.rst: Address Size Description == = === 0x 0x1000 debug 0x1000 0x1000 rom 0x0200 0xC000 clint > >> >> Signed-off-by: Heinrich Schuchardt >> --- >> arch/riscv/lib/smp.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index >> ac22136314..d725fa32e8 100644 >> --- a/arch/riscv/lib/smp.c >> +++ b/arch/riscv/lib/smp.c >> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) >> return; >> } >> >> + if (!smp_function) >> + return; >> smp_function(hart, gd->arch.ipi[hart].arg0, >> gd->arch.ipi[hart].arg1); } > > I remember Sean add this check in > [v10,14/21] riscv: Clean up IPI initialization code > . And I ask him to remove. > https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ Your comment was: "Please remove the sanity check. If zero address is the intended jump point, then system will work abnormal." The only place where gd->arch.ipi[hart].addr is set is: arch/riscv/lib/smp.c:53 send_ipi_many(): gd->arch.ipi[reg].addr = ipi->addr; send_ipi_many() is only called in smp_call_function(). So the line smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); can only work if smp_function() has been called before this point at any time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). >>> >>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi >>> in [2] as part of a larger revert. The actual revert-of-revert is in >>> [3], though it really should be split out into its own patch. The >>> original commit making these changes is [4]. >> >> Patch series [1] makes not difference. You still have >> >> gd->arch.ipi[0].addr= 0x >> gd->arch.ipi[1].addr= 0x >> >> and the secondary hart jumps to NULL and never returns. >> >>> >>> Note that the situation before [4] was that the IPI got initialized by >>> the first hart to call any IPI function. If that hart was not the boot >>> hart, Bad Things started to happen (e.g. race conditions, memory >>> corruption, etc). In that patch, I moved the initialization to its own >>> function so we would not have any race conditions and instead have >>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of >>> course, when everything is working properly, we should get neither of >>> these. >>> >>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 >>> [2] >>> https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/ >>> [3] >>> https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/ >>> [4] >>> https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/ >>> Why do we call handle_ipi() on the secondary hart at all? >>> >>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what >>> you're getting at. >>> >> >> I cannot see anything to be done
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 8/18/20 11:00 AM, Heinrich Schuchardt wrote: On 11.08.20 12:32, Sean Anderson wrote: On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: On 11.08.20 08:20, Rick Chen wrote: Hi Heinrich From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] Sent: Tuesday, August 11, 2020 11:57 AM To: Rick Jian-Zhi Chen(陳建志) Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() At least on the Kendryte K210: gd->arch.available_harts= 0x0003 arch.ipi[0].addr= gd->0x arch.ipi[0].arg0= 0x gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr= gd->0x arch.ipi[1].arg0= 0x gd->arch.ipi[1].arg1= 0x We should not jump to 0x0 to handle an interrupt. Can you explain why K210 be affected by it ? I have been running U-Boot on the MaixDuino. Without this patch secondary_hart_loop() is reached only once. With the patch it is reached several thousand times. Hm, interesting. To me, this is a symptom of something else going terribly wrong. I originally had this check in place so that it would be easier to detect these sorts of errors. I don't think this is the correct fix, however. We should really try and find the root cause of the bug. I would not expect NULL to contain any code that should be executed by the secondary hart. See doc/board/sipeed/maix.rst: AddressSize Description == = === 0x 0x1000debug 0x1000 0x1000rom 0x0200 0xC000clint Signed-off-by: Heinrich Schuchardt --- arch/riscv/lib/smp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644 --- a/arch/riscv/lib/smp.c +++ b/arch/riscv/lib/smp.c @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) return; } + if (!smp_function) + return; smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); } I remember Sean add this check in [v10,14/21] riscv: Clean up IPI initialization code . And I ask him to remove. https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ Your comment was: "Please remove the sanity check. If zero address is the intended jump point, then system will work abnormal." The only place where gd->arch.ipi[hart].addr is set is: arch/riscv/lib/smp.c:53 send_ipi_many(): gd->arch.ipi[reg].addr = ipi->addr; send_ipi_many() is only called in smp_call_function(). So the line smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); can only work if smp_function() has been called before this point at any time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi in [2] as part of a larger revert. The actual revert-of-revert is in [3], though it really should be split out into its own patch. The original commit making these changes is [4]. Patch series [1] makes not difference. You still have gd->arch.ipi[0].addr= 0x gd->arch.ipi[1].addr= 0x and the secondary hart jumps to NULL and never returns. Note that the situation before [4] was that the IPI got initialized by the first hart to call any IPI function. If that hart was not the boot hart, Bad Things started to happen (e.g. race conditions, memory corruption, etc). In that patch, I moved the initialization to its own function so we would not have any race conditions and instead have (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of course, when everything is working properly, we should get neither of these. [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/ [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/ [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/ Why do we call handle_ipi() on the secondary hart at all? Presumably to handle the IPI it got sent? Sorry, I'm confused as to what you're getting at. I cannot see anything to be done by a secondary hart in case of a software interrupt. Best regards Heinrich When resetting the sipeed_maix_bitm_defconfig without the patch I see a crash: => reset resetting ... Unhandled exception: Illegal instruction EPC: RA: 821a TVAL: 0002d947c509 ### ERROR ### Please RESET the board ### Objdump shows that it is related to the secondary_hart_loop: j secondary_hart_loop 821a: b7edj 8204 This crash is avoided with this patch. How do we get forward? Best regards Heinrich Where do you expect it to jump if U-Boot is the
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 18.08.20 11:14, Bin Meng wrote: > Hi Heinrich, > > On Tue, Aug 11, 2020 at 11:57 AM Heinrich Schuchardt > wrote: >> >> At least on the Kendryte K210: >> >> gd->arch.available_harts= 0x0003 >> gd->arch.ipi[0].addr= 0x >> gd->arch.ipi[0].arg0= 0x >> gd->arch.ipi[0].arg1= 0x >> gd->arch.ipi[1].addr= 0x >> gd->arch.ipi[1].arg0= 0x >> gd->arch.ipi[1].arg1= 0x >> >> We should not jump to 0x0 to handle an interrupt. >> >> Signed-off-by: Heinrich Schuchardt >> --- >> arch/riscv/lib/smp.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c >> index ac22136314..d725fa32e8 100644 >> --- a/arch/riscv/lib/smp.c >> +++ b/arch/riscv/lib/smp.c >> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) >> return; >> } >> >> + if (!smp_function) >> + return; > > Can you trace down to why gd->arch.ipi[0].addr is set to zero? I > looked at all places that call smp_call_function() and > gd->arch.ipi[0].addr should not be zero, unless something is seriously > wrong on your board. arch/riscv/cpu/start.S:122: jal board_init_f_init_reserve board_init_f_init_reserve(): memset(gd_ptr, '\0', sizeof(*gd)); Which code did you expect to set another value on the Kendryte K210? Best regards Heinrich > > Regards, > Bin >
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 8/18/20 5:00 AM, Heinrich Schuchardt wrote: > On 11.08.20 12:32, Sean Anderson wrote: >> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: >>> On 11.08.20 08:20, Rick Chen wrote: Hi Heinrich > From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] > Sent: Tuesday, August 11, 2020 11:57 AM > To: Rick Jian-Zhi Chen(陳建志) > Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel > Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt > Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() > > At least on the Kendryte K210: > > gd->arch.available_harts= 0x0003 arch.ipi[0].addr= > gd->0x arch.ipi[0].arg0= 0x > gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr= > gd->0x arch.ipi[1].arg0= 0x > gd->arch.ipi[1].arg1= 0x > > We should not jump to 0x0 to handle an interrupt. Can you explain why K210 be affected by it ? >>> >>> I have been running U-Boot on the MaixDuino. >>> >>> Without this patch secondary_hart_loop() is reached only once. With the >>> patch it is reached several thousand times. >> >> Hm, interesting. To me, this is a symptom of something else going >> terribly wrong. I originally had this check in place so that it would >> be easier to detect these sorts of errors. I don't think this is the >> correct fix, however. We should really try and find the root cause of >> the bug. >> >>> I would not expect NULL to contain any code that should be executed by >>> the secondary hart. See doc/board/sipeed/maix.rst: >>> >>> AddressSize Description >>> == = === >>> 0x 0x1000debug >>> 0x1000 0x1000rom >>> 0x0200 0xC000clint >>> > > Signed-off-by: Heinrich Schuchardt > --- > arch/riscv/lib/smp.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index > ac22136314..d725fa32e8 100644 > --- a/arch/riscv/lib/smp.c > +++ b/arch/riscv/lib/smp.c > @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) > return; > } > > + if (!smp_function) > + return; > smp_function(hart, gd->arch.ipi[hart].arg0, > gd->arch.ipi[hart].arg1); } I remember Sean add this check in [v10,14/21] riscv: Clean up IPI initialization code . And I ask him to remove. https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ >>> >>> Your comment was: "Please remove the sanity check. If zero address is >>> the intended jump point, then system will work abnormal." >>> >>> The only place where gd->arch.ipi[hart].addr is set is: >>> >>> arch/riscv/lib/smp.c:53 send_ipi_many(): >>> gd->arch.ipi[reg].addr = ipi->addr; >>> >>> send_ipi_many() is only called in smp_call_function(). >>> >>> So the line >>> >>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); >>> >>> can only work if smp_function() has been called before this point at any >>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). >> >> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi >> in [2] as part of a larger revert. The actual revert-of-revert is in >> [3], though it really should be split out into its own patch. The >> original commit making these changes is [4]. > > Patch series [1] makes not difference. You still have > > gd->arch.ipi[0].addr= 0x > gd->arch.ipi[1].addr= 0x > > and the secondary hart jumps to NULL and never returns. Hm, then there is a code path where an IPI gets triggered by something other than opensbi. > >> >> Note that the situation before [4] was that the IPI got initialized by >> the first hart to call any IPI function. If that hart was not the boot >> hart, Bad Things started to happen (e.g. race conditions, memory >> corruption, etc). In that patch, I moved the initialization to its own >> function so we would not have any race conditions and instead have >> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of >> course, when everything is working properly, we should get neither of >> these. >> >> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 >> [2] >> https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/ >> [3] >> https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/ >> [4] >> https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/ >> >>> Why do we call handle_ipi() on the secondary hart at all? >> >> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what >> you're getting at. >> > > I cannot see anything to be done by a secondary hart in case of a > software interrupt. Isn't it supposed to
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
Hi Heinrich, On Tue, Aug 11, 2020 at 11:57 AM Heinrich Schuchardt wrote: > > At least on the Kendryte K210: > > gd->arch.available_harts= 0x0003 > gd->arch.ipi[0].addr= 0x > gd->arch.ipi[0].arg0= 0x > gd->arch.ipi[0].arg1= 0x > gd->arch.ipi[1].addr= 0x > gd->arch.ipi[1].arg0= 0x > gd->arch.ipi[1].arg1= 0x > > We should not jump to 0x0 to handle an interrupt. > > Signed-off-by: Heinrich Schuchardt > --- > arch/riscv/lib/smp.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c > index ac22136314..d725fa32e8 100644 > --- a/arch/riscv/lib/smp.c > +++ b/arch/riscv/lib/smp.c > @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) > return; > } > > + if (!smp_function) > + return; Can you trace down to why gd->arch.ipi[0].addr is set to zero? I looked at all places that call smp_call_function() and gd->arch.ipi[0].addr should not be zero, unless something is seriously wrong on your board. Regards, Bin
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 11.08.20 12:32, Sean Anderson wrote: > On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: >> On 11.08.20 08:20, Rick Chen wrote: >>> Hi Heinrich >>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] Sent: Tuesday, August 11, 2020 11:57 AM To: Rick Jian-Zhi Chen(陳建志) Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() At least on the Kendryte K210: gd->arch.available_harts= 0x0003 arch.ipi[0].addr= gd->0x arch.ipi[0].arg0= 0x gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr= gd->0x arch.ipi[1].arg0= 0x gd->arch.ipi[1].arg1= 0x We should not jump to 0x0 to handle an interrupt. >>> >>> Can you explain why K210 be affected by it ? >> >> I have been running U-Boot on the MaixDuino. >> >> Without this patch secondary_hart_loop() is reached only once. With the >> patch it is reached several thousand times. > > Hm, interesting. To me, this is a symptom of something else going > terribly wrong. I originally had this check in place so that it would > be easier to detect these sorts of errors. I don't think this is the > correct fix, however. We should really try and find the root cause of > the bug. > >> I would not expect NULL to contain any code that should be executed by >> the secondary hart. See doc/board/sipeed/maix.rst: >> >> AddressSize Description >> == = === >> 0x 0x1000debug >> 0x1000 0x1000rom >> 0x0200 0xC000clint >> >>> Signed-off-by: Heinrich Schuchardt --- arch/riscv/lib/smp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644 --- a/arch/riscv/lib/smp.c +++ b/arch/riscv/lib/smp.c @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) return; } + if (!smp_function) + return; smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); } >>> >>> I remember Sean add this check in >>> [v10,14/21] riscv: Clean up IPI initialization code >>> . And I ask him to remove. >>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ >> >> Your comment was: "Please remove the sanity check. If zero address is >> the intended jump point, then system will work abnormal." >> >> The only place where gd->arch.ipi[hart].addr is set is: >> >> arch/riscv/lib/smp.c:53 send_ipi_many(): >> gd->arch.ipi[reg].addr = ipi->addr; >> >> send_ipi_many() is only called in smp_call_function(). >> >> So the line >> >> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); >> >> can only work if smp_function() has been called before this point at any >> time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). > > Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi > in [2] as part of a larger revert. The actual revert-of-revert is in > [3], though it really should be split out into its own patch. The > original commit making these changes is [4]. Patch series [1] makes not difference. You still have gd->arch.ipi[0].addr= 0x gd->arch.ipi[1].addr= 0x and the secondary hart jumps to NULL and never returns. > > Note that the situation before [4] was that the IPI got initialized by > the first hart to call any IPI function. If that hart was not the boot > hart, Bad Things started to happen (e.g. race conditions, memory > corruption, etc). In that patch, I moved the initialization to its own > function so we would not have any race conditions and instead have > (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of > course, when everything is working properly, we should get neither of > these. > > [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 > [2] > https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/ > [3] > https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/ > [4] > https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/ > >> Why do we call handle_ipi() on the secondary hart at all? > > Presumably to handle the IPI it got sent? Sorry, I'm confused as to what > you're getting at. > I cannot see anything to be done by a secondary hart in case of a software interrupt. Best regards Heinrich >> Where do you expect it to jump if U-Boot is the first binary loaded? >> >> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump >> address to secondary_hart_relocate(), do we want the secondary hart to >> execute it again and again? > > --Sean >
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: > On 11.08.20 08:20, Rick Chen wrote: >> Hi Heinrich >> >>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] >>> Sent: Tuesday, August 11, 2020 11:57 AM >>> To: Rick Jian-Zhi Chen(陳建志) >>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; >>> u-boot@lists.denx.de; Heinrich Schuchardt >>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() >>> >>> At least on the Kendryte K210: >>> >>> gd->arch.available_harts= 0x0003 arch.ipi[0].addr= >>> gd->0x arch.ipi[0].arg0= 0x >>> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr= >>> gd->0x arch.ipi[1].arg0= 0x >>> gd->arch.ipi[1].arg1= 0x >>> >>> We should not jump to 0x0 to handle an interrupt. >> >> Can you explain why K210 be affected by it ? > > I have been running U-Boot on the MaixDuino. > > Without this patch secondary_hart_loop() is reached only once. With the > patch it is reached several thousand times. Hm, interesting. To me, this is a symptom of something else going terribly wrong. I originally had this check in place so that it would be easier to detect these sorts of errors. I don't think this is the correct fix, however. We should really try and find the root cause of the bug. > I would not expect NULL to contain any code that should be executed by > the secondary hart. See doc/board/sipeed/maix.rst: > > AddressSize Description > == = === > 0x 0x1000debug > 0x1000 0x1000rom > 0x0200 0xC000clint > >> >>> >>> Signed-off-by: Heinrich Schuchardt >>> --- >>> arch/riscv/lib/smp.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index >>> ac22136314..d725fa32e8 100644 >>> --- a/arch/riscv/lib/smp.c >>> +++ b/arch/riscv/lib/smp.c >>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) >>> return; >>> } >>> >>> + if (!smp_function) >>> + return; >>> smp_function(hart, gd->arch.ipi[hart].arg0, >>> gd->arch.ipi[hart].arg1); } >> >> I remember Sean add this check in >> [v10,14/21] riscv: Clean up IPI initialization code >> . And I ask him to remove. >> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ > > Your comment was: "Please remove the sanity check. If zero address is > the intended jump point, then system will work abnormal." > > The only place where gd->arch.ipi[hart].addr is set is: > > arch/riscv/lib/smp.c:53 send_ipi_many(): > gd->arch.ipi[reg].addr = ipi->addr; > > send_ipi_many() is only called in smp_call_function(). > > So the line > > smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); > > can only work if smp_function() has been called before this point at any > time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi in [2] as part of a larger revert. The actual revert-of-revert is in [3], though it really should be split out into its own patch. The original commit making these changes is [4]. Note that the situation before [4] was that the IPI got initialized by the first hart to call any IPI function. If that hart was not the boot hart, Bad Things started to happen (e.g. race conditions, memory corruption, etc). In that patch, I moved the initialization to its own function so we would not have any race conditions and instead have (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of course, when everything is working properly, we should get neither of these. [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/ [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/ [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/ > Why do we call handle_ipi() on the secondary hart at all? Presumably to handle the IPI it got sent? Sorry, I'm confused as to what you're getting at. > Where do you expect it to jump if U-Boot is the first binary loaded? > > Even if spl_secondary_hart_stack_gd_setup() has initialized the jump > address to secondary_hart_relocate(), do we want the secondary hart to > execute it again and again? --Sean
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
On 11.08.20 08:20, Rick Chen wrote: > Hi Heinrich > >> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] >> Sent: Tuesday, August 11, 2020 11:57 AM >> To: Rick Jian-Zhi Chen(陳建志) >> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; >> u-boot@lists.denx.de; Heinrich Schuchardt >> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() >> >> At least on the Kendryte K210: >> >> gd->arch.available_harts= 0x0003 arch.ipi[0].addr= >> gd->0x arch.ipi[0].arg0= 0x >> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr= >> gd->0x arch.ipi[1].arg0= 0x >> gd->arch.ipi[1].arg1= 0x >> >> We should not jump to 0x0 to handle an interrupt. > > Can you explain why K210 be affected by it ? I have been running U-Boot on the MaixDuino. Without this patch secondary_hart_loop() is reached only once. With the patch it is reached several thousand times. I would not expect NULL to contain any code that should be executed by the secondary hart. See doc/board/sipeed/maix.rst: AddressSize Description == = === 0x 0x1000debug 0x1000 0x1000rom 0x0200 0xC000clint > >> >> Signed-off-by: Heinrich Schuchardt >> --- >> arch/riscv/lib/smp.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index >> ac22136314..d725fa32e8 100644 >> --- a/arch/riscv/lib/smp.c >> +++ b/arch/riscv/lib/smp.c >> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) >> return; >> } >> >> + if (!smp_function) >> + return; >> smp_function(hart, gd->arch.ipi[hart].arg0, >> gd->arch.ipi[hart].arg1); } > > I remember Sean add this check in > [v10,14/21] riscv: Clean up IPI initialization code > . And I ask him to remove. > https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ Your comment was: "Please remove the sanity check. If zero address is the intended jump point, then system will work abnormal." The only place where gd->arch.ipi[hart].addr is set is: arch/riscv/lib/smp.c:53 send_ipi_many(): gd->arch.ipi[reg].addr = ipi->addr; send_ipi_many() is only called in smp_call_function(). So the line smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); can only work if smp_function() has been called before this point at any time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). Why do we call handle_ipi() on the secondary hart at all? Where do you expect it to jump if U-Boot is the first binary loaded? Even if spl_secondary_hart_stack_gd_setup() has initialized the jump address to secondary_hart_relocate(), do we want the secondary hart to execute it again and again? Best regards Heinrich
Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
Hi Heinrich > From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de] > Sent: Tuesday, August 11, 2020 11:57 AM > To: Rick Jian-Zhi Chen(陳建志) > Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; > u-boot@lists.denx.de; Heinrich Schuchardt > Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() > > At least on the Kendryte K210: > > gd->arch.available_harts= 0x0003 arch.ipi[0].addr= > gd->0x arch.ipi[0].arg0= 0x > gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr= > gd->0x arch.ipi[1].arg0= 0x > gd->arch.ipi[1].arg1= 0x > > We should not jump to 0x0 to handle an interrupt. Can you explain why K210 be affected by it ? > > Signed-off-by: Heinrich Schuchardt > --- > arch/riscv/lib/smp.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index > ac22136314..d725fa32e8 100644 > --- a/arch/riscv/lib/smp.c > +++ b/arch/riscv/lib/smp.c > @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) > return; > } > > + if (!smp_function) > + return; > smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); > } I remember Sean add this check in [v10,14/21] riscv: Clean up IPI initialization code . And I ask him to remove. https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/ Thanks, Rick > > -- > 2.28.0