Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-28 Thread Sebastian Gottschall
i can exclude neon code for dd-wrt in dropbear if it helps. but would be 
greater to nail down the problem. otherwise other programms would be 
likelly affected too


Am 28.03.2020 um 21:06 schrieb Horshack ‪‬:
As a postscript, I was able to refine the logic to produce the 
corrupted result almost instantaneously. I'm also able to get it to 
fail with an all-zero input dataset and a bitwise OR operation instead 
of the original squaring multiplication operations, which allows me to 
see what actual corrupted loads are. The result is very interesting - 
sometimes the corrupted data is valid ARM instructions, other times 
valid kernel-space addresses, so it seems clear this is an addressing 
problem. Also interesting is how I'll see just one or a few corrupted 
words, which implies the corruption is in the interface between DCACHE 
and the processor rather than errant fetch of a line into DCACHE from 
memory (otherwise the entire DCACHE line would hold corrupt data). You 
can see a sample of the failure output here: 
https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt 



Finally, to exclude any possibility the issue is related to possible 
kernel code running and corrupting register sets/memory (such as an 
interrupt routine), I ported the test to a kernel module and ran the 
logic within a local_irq_disable() block, which disables both 
preemption and interrupts on the core. Still fails. I created a 
separate repository for the kernel module version here: 
https://github.com/horshack-dpreview/ipq8065-sqrbug-driver 




*From:* Horshack ‪‬ 
*Sent:* Tuesday, March 24, 2020 9:25 PM
*To:* Sebastian Gottschall ; 
dropbear@ucc.asn.au 
*Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear 
X4S R7800
I excluded context switches as a possible culprit by looping until a 
corruption happened for which no context switches occurred while the 
test was running (ie, at the start of the test I would save the # of 
involuntary/voluntary context switches from /proc//status, then 
check those counts again after the failure - if they were different I 
restarted the test and kept looping until a failure happened in which 
the ctx switch counts were the same.



*From:* dropbear-bounces+horshack=live@ucc.asn.au 
 on behalf of Sebastian 
Gottschall 

*Sent:* Tuesday, March 24, 2020 9:13 PM
*To:* dropbear@ucc.asn.au 
*Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear 
X4S R7800


if the corruption is caused by a context switch the problem can be 
caused by the kernel.

try the following and disable "CONFIG_KERNEL_MODE_NEON"
in the kernel config. this will disable some kernel crypto assembly code

Am 24.03.2020 um 16:11 schrieb Matt Johnston:

Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on 
OpenWRT lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is 
stressing it somehow? It doesn't seem to be using the Neon 
instruction for anything apart from loads/stores though - is there 
something that the compiler should be doing mixing Neon and non-Neon 
operations?


Cheers,
Matt

(Your emails got held up being over 100kB, I've trimmed the reply 
below and let them through. Apologies to everyone for the stale old 
one that got let through with them just now, I wasn't looking closely)


On Tue 24/3/2020, at 11:23 am, Horshack ‪‬ > wrote:


I was able to isolate the issue to just a handful of assembly 
instructions within fast_s_mp_sqr(), related to the squaring loop. I 
broke that code out into a separate utility that reproduces the 
issue within a few seconds. The failure is somewhat sensitive to the 
data pattern and very sensitive to timing, indicating a likely 
memory/data path issue within my particular router. I'm guessing 
it's the IPQ8065 and not the SDRAM because I can get it to fail with 
a tiny data set easily fits within DCACHE. I can alter the frequency 
of the failure with a single ARM memory barrier instruction, which 
at first implied a superscalar data ordering condition but the 
memory barrier also alters the timing through the DCACHE so that is 
likely the effect it's having. I was able to exclude the VFP/Neon 
register corruption as the cause with some test code. I also 
excluded any context switch-speciifc issue by measuring the # of 
context switches in /proc//status and catching a failure where 
no switches had occurred. I also modified the affinity so the 
utility runs on just one processor to rule out a specific core 
having the issue.


I put the source and binary of my utility on github - if anyone on 
this mailing list has 

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-28 Thread Horshack ‪‬
As a postscript, I was able to refine the logic to produce the corrupted result 
almost instantaneously. I'm also able to get it to fail with an all-zero input 
dataset and a bitwise OR operation instead of the original squaring 
multiplication operations, which allows me to see what actual corrupted loads 
are. The result is very interesting - sometimes the corrupted data is valid ARM 
instructions, other times valid kernel-space addresses, so it seems clear this 
is an addressing problem. Also interesting is how I'll see just one or a few 
corrupted words, which implies the corruption is in the interface between 
DCACHE and the processor rather than errant fetch of a line into DCACHE from 
memory (otherwise the entire DCACHE line would hold corrupt data). You can see 
a sample of the failure output here: 
https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt

Finally, to exclude any possibility the issue is related to possible kernel 
code running and corrupting register sets/memory (such as an interrupt 
routine), I ported the test to a kernel module and ran the logic within a 
local_irq_disable() block, which disables both preemption and interrupts on the 
core. Still fails. I created a separate repository for the kernel module 
version here: https://github.com/horshack-dpreview/ipq8065-sqrbug-driver


From: Horshack ‪‬ 
Sent: Tuesday, March 24, 2020 9:25 PM
To: Sebastian Gottschall ; dropbear@ucc.asn.au 

Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

I excluded context switches as a possible culprit by looping until a corruption 
happened for which no context switches occurred while the test was running (ie, 
at the start of the test I would save the # of involuntary/voluntary context 
switches from /proc//status, then check those counts again after the 
failure - if they were different I restarted the test and kept looping until a 
failure happened in which the ctx switch counts were the same.


From: dropbear-bounces+horshack=live@ucc.asn.au 
 on behalf of Sebastian 
Gottschall 
Sent: Tuesday, March 24, 2020 9:13 PM
To: dropbear@ucc.asn.au 
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800


if the corruption is caused by a context switch the problem can be caused by 
the kernel.
try the following and disable "CONFIG_KERNEL_MODE_NEON"
in the kernel config. this will disable some kernel crypto assembly code

Am 24.03.2020 um 16:11 schrieb Matt Johnston:
Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on OpenWRT 
lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is stressing it 
somehow? It doesn't seem to be using the Neon instruction for anything apart 
from loads/stores though - is there something that the compiler should be doing 
mixing Neon and non-Neon operations?

Cheers,
Matt

(Your emails got held up being over 100kB, I've trimmed the reply below and let 
them through. Apologies to everyone for the stale old one that got let through 
with them just now, I wasn't looking closely)

On Tue 24/3/2020, at 11:23 am, Horshack ‪‬ 
mailto:horsh...@live.com>> wrote:

I was able to isolate the issue to just a handful of assembly instructions 
within fast_s_mp_sqr(), related to the squaring loop. I broke that code out 
into a separate utility that reproduces the issue within a few seconds. The 
failure is somewhat sensitive to the data pattern and very sensitive to timing, 
indicating a likely memory/data path issue within my particular router. I'm 
guessing it's the IPQ8065 and not the SDRAM because I can get it to fail with a 
tiny data set easily fits within DCACHE. I can alter the frequency of the 
failure with a single ARM memory barrier instruction, which at first implied a 
superscalar data ordering condition but the memory barrier also alters the 
timing through the DCACHE so that is likely the effect it's having. I was able 
to exclude the VFP/Neon register corruption as the cause with some test code. I 
also excluded any context switch-speciifc issue by measuring the # of context 
switches in /proc//status and catching a failure where no switches had 
occurred. I also modified the affinity so the utility runs on just one 
processor to rule out a specific core having the issue.

I put the source and binary of my utility on github - if anyone on this mailing 
list has this model router can you give it a try if possible? You only need the 
ipq8065-sqrbug (binary) and run-ipq8065-sqrbug.sh (script). Here's the link to 
the repository: https://github.com/horshack-dpreview/ipq8065-sqrbug



From: Horshack ‪‬ mailto:horsh...@live.com>>
Sent: Saturday, March 21, 2020 7:54 AM
To: dropbear@ucc.asn.au 
mailto:dropbear@ucc.asn.au>>
Subject: SSH key exchange fails 30-70% of the time on