Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-09-01 Thread Sean Anderson
On 9/1/20 7:23 AM, Heinrich Schuchardt wrote:
> On 01.09.20 13:14, Sean Anderson wrote:
>> On 9/1/20 6:51 AM, Heinrich Schuchardt wrote:
>>> When resetting the sipeed_maix_bitm_defconfig without the patch I see a
>>> crash:
>>>
>>> => reset
>>> resetting ...
>>> Unhandled exception: Illegal instruction
>>> EPC:  RA: 821a TVAL: 0002d947c509
>>> ### ERROR ### Please RESET the board ###
>>
>> Hm, interesting. I don't see that error. What commit are you testing with?
> 
> origin/master 23e333a5c083a000d0cabc
> 
> sipeed_maix_bitm_defconfig plus
> 
> CONFIG_DEBUG_UART=y
> CONFIG_DEBUG_UART_SIFIVE=y
> CONFIG_DEBUG_UART_BASE=0x3800
> CONFIG_DEBUG_UART_CLOCK=39000
> 
> on a Maixduino.

Ok, I can reproduce that crash, but only with the debug uart enabled. I
suspect it happens every time, but we only get an error with the debug
uart. I enabled CONFIG_SHOW_REGS, but there was some interference
between the output and the (next) instance of U-Boot. 

> 
> Best regards
> 
> Heinrich
>>
>>>
>>> Objdump shows that it is related to the secondary_hart_loop:
>>>
>>> j   secondary_hart_loop
>>> 821a:   b7edj   8204
>>> 

It is actually related to the previous instruction

8216:   56e000efjal ra,8784 

secondary_hart_loop is never reached because handle_ipi jumps to 0x0.

I decided to do a bit more experimentation with the following patch

diff --git i/arch/riscv/lib/smp.c w/arch/riscv/lib/smp.c
index ac22136314..8be320f6ae 100644
--- i/arch/riscv/lib/smp.c
+++ w/arch/riscv/lib/smp.c
@@ -54,6 +54,10 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
gd->arch.ipi[reg].arg0 = ipi->arg0;
gd->arch.ipi[reg].arg1 = ipi->arg1;
 
+   printf("sending %lx(%lx, %lx) to hart %u\n", ipi->addr,
+  ipi->arg0, ipi->arg1, reg);
+
+   __smp_mb();
ret = riscv_send_ipi(reg);
if (ret) {
pr_err("Cannot send IPI to hart %d\n", reg);
@@ -77,15 +81,23 @@ void handle_ipi(ulong hart)
 {
int ret;
void (*smp_function)(ulong hart, ulong arg0, ulong arg1);
+   int ipi;
 
if (hart >= CONFIG_NR_CPUS)
return;
+   riscv_get_ipi(hart, );
 
__smp_mb();
 
smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
invalidate_icache_all();
 
+   printf("calling %p(%lx, %lx) on hart %lu, ipi = %d\n", smp_function,
+  gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1, hart, ipi);
+
+   if (!ipi)
+   return;
+
/*
 * Clear the IPI to acknowledge the request before jumping to the
 * requested function.

With this patch I get the following output

=> reset
resetting ...
calling (0, 0) on hart 1, ipi = 0
calling (0, 0) on hart 1, ipi = 0
calling (0, 0) on hart 1, ipi = 0
calling (0, 0) on hart 1, ipi = 0
calling (0, 0) on hart 1, ipi = 1


U-Boot 2020.10-rc3-00033-g23e333a5c0-dirty (Sep 01 2020 - 09:55:43 -0400)

DRAM:  8 MiB
sending 805cc1fa(80589040, 8058cdd0) to hart 1
In:serial@3800
Out:   serial@3800
Err:   serial@3800
=>

There is no "calling" line after the "sending" line because hart 1 is
hung from jumping to 0x0.

>From this log, it is clear that something other than smp_call_function
is sending an ipi, and, further, that there is a delay between when the
IRQ_M_SOFT bit is asserted and when the clint has a pending interrupt.

To further investigate this, I enabled debug output. This causes some
garbling in the console, as both harts try and write at the same time.
To alleviate this, I applied the following patch which adds locking to
puts and putc (NB this only works on K210)


diff --git i/common/console.c w/common/console.c
index 8e50af1f9d..31622df9b3 100644
--- i/common/console.c
+++ w/common/console.c
@@ -515,7 +515,35 @@ static inline void pre_console_puts(const char *s) {}
 static inline void print_pre_console_buffer(int flushpoint) {}
 #endif
 
-void putc(const char c)
+/* Static location which will survive across resets */
+u32 *console_spinlock = (void *)0x8040;
+
+void console_lock(void)
+{
+   u32 tmp = 1;
+
+   /*
+* We only ever write 1 to the lock, so if anything else is there then
+* this is the first access to the spinlock, and we have thus acquired
+* it. This will not work if the spinlock is ever (un)initialized to
+* exactly 1, but I'll take my chances :)
+*/
+   while (tmp == 1)
+   asm volatile ("amoswap.w.aq %0, %0, %1"
+ : "+r" (tmp), "+A" (console_spinlock)
+ :
+ : "memory");
+}
+
+void console_unlock(void)
+{
+   asm volatile ("amoswap.w.rl x0, x0, %0"
+ : "+A" 

Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-09-01 Thread Heinrich Schuchardt
On 01.09.20 13:14, Sean Anderson wrote:
> On 9/1/20 6:51 AM, Heinrich Schuchardt wrote:
>> On 8/18/20 11:00 AM, Heinrich Schuchardt wrote:
>>> On 11.08.20 12:32, Sean Anderson wrote:
 On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
> On 11.08.20 08:20, Rick Chen wrote:
>> Hi Heinrich
>>
>>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>> To: Rick Jian-Zhi Chen(陳建志)
>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel 
>>> Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>
>>> At least on the Kendryte K210:
>>>
>>> gd->arch.available_harts= 0x0003 arch.ipi[0].addr=
>>> gd->0x arch.ipi[0].arg0= 0x
>>> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr=
>>> gd->0x arch.ipi[1].arg0= 0x
>>> gd->arch.ipi[1].arg1= 0x
>>>
>>> We should not jump to 0x0 to handle an interrupt.
>>
>> Can you explain why K210 be affected by it ?
>
> I have been running U-Boot on the MaixDuino.
>
> Without this patch secondary_hart_loop() is reached only once. With the
> patch it is reached several thousand times.

 Hm, interesting. To me, this is a symptom of something else going
 terribly wrong. I originally had this check in place so that it would
 be easier to detect these sorts of errors. I don't think this is the
 correct fix, however. We should really try and find the root cause of
 the bug.

> I would not expect NULL to contain any code that should be executed by
> the secondary hart. See doc/board/sipeed/maix.rst:
>
> Address    Size  Description
> == = ===
> 0x 0x1000    debug
> 0x1000 0x1000    rom
> 0x0200 0xC000    clint
>
>>
>>>
>>> Signed-off-by: Heinrich Schuchardt 
>>> ---
>>>   arch/riscv/lib/smp.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
>>> ac22136314..d725fa32e8 100644
>>> --- a/arch/riscv/lib/smp.c
>>> +++ b/arch/riscv/lib/smp.c
>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>  return;
>>>  }
>>>
>>> +   if (!smp_function)
>>> +   return;
>>>  smp_function(hart, gd->arch.ipi[hart].arg0, 
>>> gd->arch.ipi[hart].arg1);  }
>>
>> I remember Sean add this check in
>> [v10,14/21] riscv: Clean up IPI initialization code
>> . And I ask him to remove.
>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/
>
> Your comment was: "Please remove the sanity check. If zero address is
> the intended jump point, then system will work abnormal."
>
> The only place where gd->arch.ipi[hart].addr is set is:
>
> arch/riscv/lib/smp.c:53 send_ipi_many():
> gd->arch.ipi[reg].addr = ipi->addr;
>
> send_ipi_many() is only called in smp_call_function().
>
> So the line
>
> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>
> can only work if smp_function() has been called before this point at any
> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().

 Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
 in [2] as part of a larger revert. The actual revert-of-revert is in
 [3], though it really should be split out into its own patch. The
 original commit making these changes is [4].
>>>
>>> Patch series [1] makes not difference. You still have
>>>
>>> gd->arch.ipi[0].addr= 0x
>>> gd->arch.ipi[1].addr= 0x
>>>
>>> and the secondary hart jumps to NULL and never returns.
>>>

 Note that the situation before [4] was that the IPI got initialized by
 the first hart to call any IPI function. If that hart was not the boot
 hart, Bad Things started to happen (e.g. race conditions, memory
 corruption, etc). In that patch, I moved the initialization to its own
 function so we would not have any race conditions and instead have
 (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
 course, when everything is working properly, we should get neither of
 these.

 [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
 [2] 
 https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/
 [3] 
 https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/
 [4] 
 https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/

> Why do we call handle_ipi() on the secondary 

Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-09-01 Thread Sean Anderson
On 9/1/20 6:51 AM, Heinrich Schuchardt wrote:
> On 8/18/20 11:00 AM, Heinrich Schuchardt wrote:
>> On 11.08.20 12:32, Sean Anderson wrote:
>>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
 On 11.08.20 08:20, Rick Chen wrote:
> Hi Heinrich
>
>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
>> Sent: Tuesday, August 11, 2020 11:57 AM
>> To: Rick Jian-Zhi Chen(陳建志)
>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel 
>> Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>
>> At least on the Kendryte K210:
>>
>> gd->arch.available_harts= 0x0003 arch.ipi[0].addr=
>> gd->0x arch.ipi[0].arg0= 0x
>> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr=
>> gd->0x arch.ipi[1].arg0= 0x
>> gd->arch.ipi[1].arg1= 0x
>>
>> We should not jump to 0x0 to handle an interrupt.
>
> Can you explain why K210 be affected by it ?

 I have been running U-Boot on the MaixDuino.

 Without this patch secondary_hart_loop() is reached only once. With the
 patch it is reached several thousand times.
>>>
>>> Hm, interesting. To me, this is a symptom of something else going
>>> terribly wrong. I originally had this check in place so that it would
>>> be easier to detect these sorts of errors. I don't think this is the
>>> correct fix, however. We should really try and find the root cause of
>>> the bug.
>>>
 I would not expect NULL to contain any code that should be executed by
 the secondary hart. See doc/board/sipeed/maix.rst:

 Address    Size  Description
 == = ===
 0x 0x1000    debug
 0x1000 0x1000    rom
 0x0200 0xC000    clint

>
>>
>> Signed-off-by: Heinrich Schuchardt 
>> ---
>>   arch/riscv/lib/smp.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
>> ac22136314..d725fa32e8 100644
>> --- a/arch/riscv/lib/smp.c
>> +++ b/arch/riscv/lib/smp.c
>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>  return;
>>  }
>>
>> +   if (!smp_function)
>> +   return;
>>  smp_function(hart, gd->arch.ipi[hart].arg0, 
>> gd->arch.ipi[hart].arg1);  }
>
> I remember Sean add this check in
> [v10,14/21] riscv: Clean up IPI initialization code
> . And I ask him to remove.
> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/

 Your comment was: "Please remove the sanity check. If zero address is
 the intended jump point, then system will work abnormal."

 The only place where gd->arch.ipi[hart].addr is set is:

 arch/riscv/lib/smp.c:53 send_ipi_many():
 gd->arch.ipi[reg].addr = ipi->addr;

 send_ipi_many() is only called in smp_call_function().

 So the line

 smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);

 can only work if smp_function() has been called before this point at any
 time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>>>
>>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
>>> in [2] as part of a larger revert. The actual revert-of-revert is in
>>> [3], though it really should be split out into its own patch. The
>>> original commit making these changes is [4].
>>
>> Patch series [1] makes not difference. You still have
>>
>> gd->arch.ipi[0].addr= 0x
>> gd->arch.ipi[1].addr= 0x
>>
>> and the secondary hart jumps to NULL and never returns.
>>
>>>
>>> Note that the situation before [4] was that the IPI got initialized by
>>> the first hart to call any IPI function. If that hart was not the boot
>>> hart, Bad Things started to happen (e.g. race conditions, memory
>>> corruption, etc). In that patch, I moved the initialization to its own
>>> function so we would not have any race conditions and instead have
>>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
>>> course, when everything is working properly, we should get neither of
>>> these.
>>>
>>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
>>> [2] 
>>> https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/
>>> [3] 
>>> https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/
>>> [4] 
>>> https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/
>>>
 Why do we call handle_ipi() on the secondary hart at all?
>>>
>>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
>>> you're getting at.
>>>
>>
>> I cannot see anything to be done 

Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-09-01 Thread Heinrich Schuchardt

On 8/18/20 11:00 AM, Heinrich Schuchardt wrote:

On 11.08.20 12:32, Sean Anderson wrote:

On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:

On 11.08.20 08:20, Rick Chen wrote:

Hi Heinrich


From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
Sent: Tuesday, August 11, 2020 11:57 AM
To: Rick Jian-Zhi Chen(陳建志)
Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; 
u-boot@lists.denx.de; Heinrich Schuchardt
Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

At least on the Kendryte K210:

gd->arch.available_harts= 0x0003 arch.ipi[0].addr=
gd->0x arch.ipi[0].arg0= 0x
gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr=
gd->0x arch.ipi[1].arg0= 0x
gd->arch.ipi[1].arg1= 0x

We should not jump to 0x0 to handle an interrupt.


Can you explain why K210 be affected by it ?


I have been running U-Boot on the MaixDuino.

Without this patch secondary_hart_loop() is reached only once. With the
patch it is reached several thousand times.


Hm, interesting. To me, this is a symptom of something else going
terribly wrong. I originally had this check in place so that it would
be easier to detect these sorts of errors. I don't think this is the
correct fix, however. We should really try and find the root cause of
the bug.


I would not expect NULL to contain any code that should be executed by
the secondary hart. See doc/board/sipeed/maix.rst:

AddressSize  Description
== = ===
0x 0x1000debug
0x1000 0x1000rom
0x0200 0xC000clint





Signed-off-by: Heinrich Schuchardt 
---
  arch/riscv/lib/smp.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
ac22136314..d725fa32e8 100644
--- a/arch/riscv/lib/smp.c
+++ b/arch/riscv/lib/smp.c
@@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
 return;
 }

+   if (!smp_function)
+   return;
 smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  
}


I remember Sean add this check in
[v10,14/21] riscv: Clean up IPI initialization code
. And I ask him to remove.
https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/


Your comment was: "Please remove the sanity check. If zero address is
the intended jump point, then system will work abnormal."

The only place where gd->arch.ipi[hart].addr is set is:

arch/riscv/lib/smp.c:53 send_ipi_many():
gd->arch.ipi[reg].addr = ipi->addr;

send_ipi_many() is only called in smp_call_function().

So the line

smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);

can only work if smp_function() has been called before this point at any
time. The start code only calls it in spl_secondary_hart_stack_gd_setup().


Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
in [2] as part of a larger revert. The actual revert-of-revert is in
[3], though it really should be split out into its own patch. The
original commit making these changes is [4].


Patch series [1] makes not difference. You still have

gd->arch.ipi[0].addr= 0x
gd->arch.ipi[1].addr= 0x

and the secondary hart jumps to NULL and never returns.



Note that the situation before [4] was that the IPI got initialized by
the first hart to call any IPI function. If that hart was not the boot
hart, Bad Things started to happen (e.g. race conditions, memory
corruption, etc). In that patch, I moved the initialization to its own
function so we would not have any race conditions and instead have
(easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
course, when everything is working properly, we should get neither of
these.

[1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
[2] 
https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/
[3] 
https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/
[4] 
https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/


Why do we call handle_ipi() on the secondary hart at all?


Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
you're getting at.



I cannot see anything to be done by a secondary hart in case of a
software interrupt.

Best regards

Heinrich


When resetting the sipeed_maix_bitm_defconfig without the patch I see a
crash:

=> reset
resetting ...
Unhandled exception: Illegal instruction
EPC:  RA: 821a TVAL: 0002d947c509
### ERROR ### Please RESET the board ###

Objdump shows that it is related to the secondary_hart_loop:

j   secondary_hart_loop
821a:   b7edj   8204


This crash is avoided with this patch.

How do we get forward?

Best regards

Heinrich




Where do you expect it to jump if U-Boot is the 

Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-08-18 Thread Heinrich Schuchardt
On 18.08.20 11:14, Bin Meng wrote:
> Hi Heinrich,
>
> On Tue, Aug 11, 2020 at 11:57 AM Heinrich Schuchardt  
> wrote:
>>
>> At least on the Kendryte K210:
>>
>> gd->arch.available_harts= 0x0003
>> gd->arch.ipi[0].addr= 0x
>> gd->arch.ipi[0].arg0= 0x
>> gd->arch.ipi[0].arg1= 0x
>> gd->arch.ipi[1].addr= 0x
>> gd->arch.ipi[1].arg0= 0x
>> gd->arch.ipi[1].arg1= 0x
>>
>> We should not jump to 0x0 to handle an interrupt.
>>
>> Signed-off-by: Heinrich Schuchardt 
>> ---
>>  arch/riscv/lib/smp.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
>> index ac22136314..d725fa32e8 100644
>> --- a/arch/riscv/lib/smp.c
>> +++ b/arch/riscv/lib/smp.c
>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>> return;
>> }
>>
>> +   if (!smp_function)
>> +   return;
>
> Can you trace down to why gd->arch.ipi[0].addr is set to zero? I
> looked at all places that call smp_call_function() and
> gd->arch.ipi[0].addr should not be zero, unless something is seriously
> wrong on your board.

arch/riscv/cpu/start.S:122: jal board_init_f_init_reserve

board_init_f_init_reserve():
memset(gd_ptr, '\0', sizeof(*gd));

Which code did you expect to set another value on the Kendryte K210?

Best regards

Heinrich

>
> Regards,
> Bin
>



Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-08-18 Thread Sean Anderson


On 8/18/20 5:00 AM, Heinrich Schuchardt wrote:
> On 11.08.20 12:32, Sean Anderson wrote:
>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>>> On 11.08.20 08:20, Rick Chen wrote:
 Hi Heinrich

> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
> Sent: Tuesday, August 11, 2020 11:57 AM
> To: Rick Jian-Zhi Chen(陳建志)
> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel 
> Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>
> At least on the Kendryte K210:
>
> gd->arch.available_harts= 0x0003 arch.ipi[0].addr=
> gd->0x arch.ipi[0].arg0= 0x
> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr=
> gd->0x arch.ipi[1].arg0= 0x
> gd->arch.ipi[1].arg1= 0x
>
> We should not jump to 0x0 to handle an interrupt.

 Can you explain why K210 be affected by it ?
>>>
>>> I have been running U-Boot on the MaixDuino.
>>>
>>> Without this patch secondary_hart_loop() is reached only once. With the
>>> patch it is reached several thousand times.
>>
>> Hm, interesting. To me, this is a symptom of something else going
>> terribly wrong. I originally had this check in place so that it would
>> be easier to detect these sorts of errors. I don't think this is the
>> correct fix, however. We should really try and find the root cause of
>> the bug.
>>
>>> I would not expect NULL to contain any code that should be executed by
>>> the secondary hart. See doc/board/sipeed/maix.rst:
>>>
>>> AddressSize  Description
>>> == = ===
>>> 0x 0x1000debug
>>> 0x1000 0x1000rom
>>> 0x0200 0xC000clint
>>>

>
> Signed-off-by: Heinrich Schuchardt 
> ---
>  arch/riscv/lib/smp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
> ac22136314..d725fa32e8 100644
> --- a/arch/riscv/lib/smp.c
> +++ b/arch/riscv/lib/smp.c
> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
> return;
> }
>
> +   if (!smp_function)
> +   return;
> smp_function(hart, gd->arch.ipi[hart].arg0, 
> gd->arch.ipi[hart].arg1);  }

 I remember Sean add this check in
 [v10,14/21] riscv: Clean up IPI initialization code
 . And I ask him to remove.
 https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/
>>>
>>> Your comment was: "Please remove the sanity check. If zero address is
>>> the intended jump point, then system will work abnormal."
>>>
>>> The only place where gd->arch.ipi[hart].addr is set is:
>>>
>>> arch/riscv/lib/smp.c:53 send_ipi_many():
>>> gd->arch.ipi[reg].addr = ipi->addr;
>>>
>>> send_ipi_many() is only called in smp_call_function().
>>>
>>> So the line
>>>
>>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>>
>>> can only work if smp_function() has been called before this point at any
>>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>>
>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
>> in [2] as part of a larger revert. The actual revert-of-revert is in
>> [3], though it really should be split out into its own patch. The
>> original commit making these changes is [4].
> 
> Patch series [1] makes not difference. You still have
> 
> gd->arch.ipi[0].addr= 0x
> gd->arch.ipi[1].addr= 0x
> 
> and the secondary hart jumps to NULL and never returns.

Hm, then there is a code path where an IPI gets triggered by something
other than opensbi.

> 
>>
>> Note that the situation before [4] was that the IPI got initialized by
>> the first hart to call any IPI function. If that hart was not the boot
>> hart, Bad Things started to happen (e.g. race conditions, memory
>> corruption, etc). In that patch, I moved the initialization to its own
>> function so we would not have any race conditions and instead have
>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
>> course, when everything is working properly, we should get neither of
>> these.
>>
>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
>> [2] 
>> https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/
>> [3] 
>> https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/
>> [4] 
>> https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/
>>
>>> Why do we call handle_ipi() on the secondary hart at all?
>>
>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
>> you're getting at.
>>
> 
> I cannot see anything to be done by a secondary hart in case of a
> software interrupt.

Isn't it supposed to 

Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-08-18 Thread Bin Meng
Hi Heinrich,

On Tue, Aug 11, 2020 at 11:57 AM Heinrich Schuchardt  wrote:
>
> At least on the Kendryte K210:
>
> gd->arch.available_harts= 0x0003
> gd->arch.ipi[0].addr= 0x
> gd->arch.ipi[0].arg0= 0x
> gd->arch.ipi[0].arg1= 0x
> gd->arch.ipi[1].addr= 0x
> gd->arch.ipi[1].arg0= 0x
> gd->arch.ipi[1].arg1= 0x
>
> We should not jump to 0x0 to handle an interrupt.
>
> Signed-off-by: Heinrich Schuchardt 
> ---
>  arch/riscv/lib/smp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
> index ac22136314..d725fa32e8 100644
> --- a/arch/riscv/lib/smp.c
> +++ b/arch/riscv/lib/smp.c
> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
> return;
> }
>
> +   if (!smp_function)
> +   return;

Can you trace down to why gd->arch.ipi[0].addr is set to zero? I
looked at all places that call smp_call_function() and
gd->arch.ipi[0].addr should not be zero, unless something is seriously
wrong on your board.

Regards,
Bin


Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-08-18 Thread Heinrich Schuchardt
On 11.08.20 12:32, Sean Anderson wrote:
> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>> On 11.08.20 08:20, Rick Chen wrote:
>>> Hi Heinrich
>>>
 From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
 Sent: Tuesday, August 11, 2020 11:57 AM
 To: Rick Jian-Zhi Chen(陳建志)
 Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel 
 Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
 Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

 At least on the Kendryte K210:

 gd->arch.available_harts= 0x0003 arch.ipi[0].addr=
 gd->0x arch.ipi[0].arg0= 0x
 gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr=
 gd->0x arch.ipi[1].arg0= 0x
 gd->arch.ipi[1].arg1= 0x

 We should not jump to 0x0 to handle an interrupt.
>>>
>>> Can you explain why K210 be affected by it ?
>>
>> I have been running U-Boot on the MaixDuino.
>>
>> Without this patch secondary_hart_loop() is reached only once. With the
>> patch it is reached several thousand times.
>
> Hm, interesting. To me, this is a symptom of something else going
> terribly wrong. I originally had this check in place so that it would
> be easier to detect these sorts of errors. I don't think this is the
> correct fix, however. We should really try and find the root cause of
> the bug.
>
>> I would not expect NULL to contain any code that should be executed by
>> the secondary hart. See doc/board/sipeed/maix.rst:
>>
>> AddressSize  Description
>> == = ===
>> 0x 0x1000debug
>> 0x1000 0x1000rom
>> 0x0200 0xC000clint
>>
>>>

 Signed-off-by: Heinrich Schuchardt 
 ---
  arch/riscv/lib/smp.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
 ac22136314..d725fa32e8 100644
 --- a/arch/riscv/lib/smp.c
 +++ b/arch/riscv/lib/smp.c
 @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
 return;
 }

 +   if (!smp_function)
 +   return;
 smp_function(hart, gd->arch.ipi[hart].arg0, 
 gd->arch.ipi[hart].arg1);  }
>>>
>>> I remember Sean add this check in
>>> [v10,14/21] riscv: Clean up IPI initialization code
>>> . And I ask him to remove.
>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/
>>
>> Your comment was: "Please remove the sanity check. If zero address is
>> the intended jump point, then system will work abnormal."
>>
>> The only place where gd->arch.ipi[hart].addr is set is:
>>
>> arch/riscv/lib/smp.c:53 send_ipi_many():
>> gd->arch.ipi[reg].addr = ipi->addr;
>>
>> send_ipi_many() is only called in smp_call_function().
>>
>> So the line
>>
>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>
>> can only work if smp_function() has been called before this point at any
>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>
> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
> in [2] as part of a larger revert. The actual revert-of-revert is in
> [3], though it really should be split out into its own patch. The
> original commit making these changes is [4].

Patch series [1] makes not difference. You still have

gd->arch.ipi[0].addr= 0x
gd->arch.ipi[1].addr= 0x

and the secondary hart jumps to NULL and never returns.

>
> Note that the situation before [4] was that the IPI got initialized by
> the first hart to call any IPI function. If that hart was not the boot
> hart, Bad Things started to happen (e.g. race conditions, memory
> corruption, etc). In that patch, I moved the initialization to its own
> function so we would not have any race conditions and instead have
> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
> course, when everything is working properly, we should get neither of
> these.
>
> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
> [2] 
> https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/
> [3] 
> https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/
> [4] 
> https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/
>
>> Why do we call handle_ipi() on the secondary hart at all?
>
> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
> you're getting at.
>

I cannot see anything to be done by a secondary hart in case of a
software interrupt.

Best regards

Heinrich

>> Where do you expect it to jump if U-Boot is the first binary loaded?
>>
>> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
>> address to secondary_hart_relocate(), do we want the secondary hart to
>> execute it again and again?
>
> --Sean
>



Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-08-11 Thread Sean Anderson
On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
> On 11.08.20 08:20, Rick Chen wrote:
>> Hi Heinrich
>>
>>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>> To: Rick Jian-Zhi Chen(陳建志)
>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; 
>>> u-boot@lists.denx.de; Heinrich Schuchardt
>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>
>>> At least on the Kendryte K210:
>>>
>>> gd->arch.available_harts= 0x0003 arch.ipi[0].addr=
>>> gd->0x arch.ipi[0].arg0= 0x
>>> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr=
>>> gd->0x arch.ipi[1].arg0= 0x
>>> gd->arch.ipi[1].arg1= 0x
>>>
>>> We should not jump to 0x0 to handle an interrupt.
>>
>> Can you explain why K210 be affected by it ?
> 
> I have been running U-Boot on the MaixDuino.
> 
> Without this patch secondary_hart_loop() is reached only once. With the
> patch it is reached several thousand times.

Hm, interesting. To me, this is a symptom of something else going
terribly wrong. I originally had this check in place so that it would
be easier to detect these sorts of errors. I don't think this is the
correct fix, however. We should really try and find the root cause of
the bug.

> I would not expect NULL to contain any code that should be executed by
> the secondary hart. See doc/board/sipeed/maix.rst:
> 
> AddressSize  Description
> == = ===
> 0x 0x1000debug
> 0x1000 0x1000rom
> 0x0200 0xC000clint
> 
>>
>>>
>>> Signed-off-by: Heinrich Schuchardt 
>>> ---
>>>  arch/riscv/lib/smp.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
>>> ac22136314..d725fa32e8 100644
>>> --- a/arch/riscv/lib/smp.c
>>> +++ b/arch/riscv/lib/smp.c
>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>> return;
>>> }
>>>
>>> +   if (!smp_function)
>>> +   return;
>>> smp_function(hart, gd->arch.ipi[hart].arg0, 
>>> gd->arch.ipi[hart].arg1);  }
>>
>> I remember Sean add this check in
>> [v10,14/21] riscv: Clean up IPI initialization code
>> . And I ask him to remove.
>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/
> 
> Your comment was: "Please remove the sanity check. If zero address is
> the intended jump point, then system will work abnormal."
> 
> The only place where gd->arch.ipi[hart].addr is set is:
> 
> arch/riscv/lib/smp.c:53 send_ipi_many():
> gd->arch.ipi[reg].addr = ipi->addr;
> 
> send_ipi_many() is only called in smp_call_function().
> 
> So the line
> 
> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
> 
> can only work if smp_function() has been called before this point at any
> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().

Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
in [2] as part of a larger revert. The actual revert-of-revert is in
[3], though it really should be split out into its own patch. The
original commit making these changes is [4].

Note that the situation before [4] was that the IPI got initialized by
the first hart to call any IPI function. If that hart was not the boot
hart, Bad Things started to happen (e.g. race conditions, memory
corruption, etc). In that patch, I moved the initialization to its own
function so we would not have any race conditions and instead have
(easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
course, when everything is working properly, we should get neither of
these.

[1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
[2] 
https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng...@gmail.com/
[3] 
https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-sean...@gmail.com/
[4] 
https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-sean...@gmail.com/

> Why do we call handle_ipi() on the secondary hart at all?

Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
you're getting at.

> Where do you expect it to jump if U-Boot is the first binary loaded?
> 
> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
> address to secondary_hart_relocate(), do we want the secondary hart to
> execute it again and again?

--Sean



Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-08-11 Thread Heinrich Schuchardt
On 11.08.20 08:20, Rick Chen wrote:
> Hi Heinrich
>
>> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
>> Sent: Tuesday, August 11, 2020 11:57 AM
>> To: Rick Jian-Zhi Chen(陳建志)
>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; 
>> u-boot@lists.denx.de; Heinrich Schuchardt
>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>
>> At least on the Kendryte K210:
>>
>> gd->arch.available_harts= 0x0003 arch.ipi[0].addr=
>> gd->0x arch.ipi[0].arg0= 0x
>> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr=
>> gd->0x arch.ipi[1].arg0= 0x
>> gd->arch.ipi[1].arg1= 0x
>>
>> We should not jump to 0x0 to handle an interrupt.
>
> Can you explain why K210 be affected by it ?

I have been running U-Boot on the MaixDuino.

Without this patch secondary_hart_loop() is reached only once. With the
patch it is reached several thousand times.

I would not expect NULL to contain any code that should be executed by
the secondary hart. See doc/board/sipeed/maix.rst:

AddressSize  Description
== = ===
0x 0x1000debug
0x1000 0x1000rom
0x0200 0xC000clint

>
>>
>> Signed-off-by: Heinrich Schuchardt 
>> ---
>>  arch/riscv/lib/smp.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
>> ac22136314..d725fa32e8 100644
>> --- a/arch/riscv/lib/smp.c
>> +++ b/arch/riscv/lib/smp.c
>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>> return;
>> }
>>
>> +   if (!smp_function)
>> +   return;
>> smp_function(hart, gd->arch.ipi[hart].arg0, 
>> gd->arch.ipi[hart].arg1);  }
>
> I remember Sean add this check in
> [v10,14/21] riscv: Clean up IPI initialization code
> . And I ask him to remove.
> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/

Your comment was: "Please remove the sanity check. If zero address is
the intended jump point, then system will work abnormal."

The only place where gd->arch.ipi[hart].addr is set is:

arch/riscv/lib/smp.c:53 send_ipi_many():
gd->arch.ipi[reg].addr = ipi->addr;

send_ipi_many() is only called in smp_call_function().

So the line

smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);

can only work if smp_function() has been called before this point at any
time. The start code only calls it in spl_secondary_hart_stack_gd_setup().

Why do we call handle_ipi() on the secondary hart at all?

Where do you expect it to jump if U-Boot is the first binary loaded?

Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
address to secondary_hart_relocate(), do we want the secondary hart to
execute it again and again?

Best regards

Heinrich


Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

2020-08-11 Thread Rick Chen
Hi Heinrich

> From: Heinrich Schuchardt [mailto:xypron.g...@gmx.de]
> Sent: Tuesday, August 11, 2020 11:57 AM
> To: Rick Jian-Zhi Chen(陳建志)
> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; 
> u-boot@lists.denx.de; Heinrich Schuchardt
> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>
> At least on the Kendryte K210:
>
> gd->arch.available_harts= 0x0003 arch.ipi[0].addr=
> gd->0x arch.ipi[0].arg0= 0x
> gd->arch.ipi[0].arg1= 0x arch.ipi[1].addr=
> gd->0x arch.ipi[1].arg0= 0x
> gd->arch.ipi[1].arg1= 0x
>
> We should not jump to 0x0 to handle an interrupt.

Can you explain why K210 be affected by it ?

>
> Signed-off-by: Heinrich Schuchardt 
> ---
>  arch/riscv/lib/smp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
> ac22136314..d725fa32e8 100644
> --- a/arch/riscv/lib/smp.c
> +++ b/arch/riscv/lib/smp.c
> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
> return;
> }
>
> +   if (!smp_function)
> +   return;
> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); 
>  }

I remember Sean add this check in
[v10,14/21] riscv: Clean up IPI initialization code
. And I ask him to remove.
https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-sean...@gmail.com/

Thanks,
Rick

>
> --
> 2.28.0