Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Sebastian Gottschall
if the corruption is caused by a context switch the problem can be caused by the kernel. try the following and disable "CONFIG_KERNEL_MODE_NEON" in the kernel config. this will disable some kernel crypto assembly code Am 24.03.2020 um 16:11 schrieb Matt Johnston: Good work narrowing down a

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ‪‬
I excluded context switches as a possible culprit by looping until a corruption happened for which no context switches occurred while the test was running (ie, at the start of the test I would save the # of involuntary/voluntary context switches from /proc//status, then check those counts again

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Sebastian Gottschall
how can you make sure that no context switch is happening if the kernel uses neon instructions by itself? by stopping the kernel? this is faily impossible. check if this option is on, and disable it to make sure that the kernel does not make use of neon instructions Am 25.03.2020 um 05:25

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ‪‬
I was able to isolate the issue to just a handful of assembly instructions within fast_s_mp_sqr(), related to the squaring loop. I broke that code out into a separate utility that reproduces the issue within a few seconds. The failure is somewhat sensitive to the data pattern and very sensitive

SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ??
Hi, I have a strange issue on my Netgear X4S R7800. Running either DD-WRT or OpenWrt, approximately 30-70% of my SSH login attempts fail. For OpenSSH clients the error reported is "error in libcrypto". For the PuTTY client the error is more descriptive - "Signature from server's host key is

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ‪‬
I have one of the failure paths isolated down to a single corrupt 64-bit word in memory, which required a significant amount of code instrumentation to achieve. I implemented a code execution history buffer that gets filled at various checkpoints within s_mp_exptmod() and some of the modules

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ‪‬
Disassembly of fast_s_mp_sqr() and other libtommath functions reveals gcc is utilizing the arm NEON SIMD instructions and registers for calculations involved with libtommath's mp_word scalar. Based on the 64-bit word corruption I see I'm guessing the SIMD registers aren't being

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Matt Johnston
Good work narrowing down a test case there. That's an interesting finding - I guess it might be worth posting on OpenWRT lists/forum to try find other testers. Could it be power related if the tight multiplication loop is stressing it somehow? It doesn't seem to be using the Neon instruction for