Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-28 Thread Sebastian Gottschall
i can exclude neon code for dd-wrt in dropbear if it helps. but would be 
greater to nail down the problem. otherwise other programms would be 
likelly affected too


Am 28.03.2020 um 21:06 schrieb Horshack ‪‬:
As a postscript, I was able to refine the logic to produce the 
corrupted result almost instantaneously. I'm also able to get it to 
fail with an all-zero input dataset and a bitwise OR operation instead 
of the original squaring multiplication operations, which allows me to 
see what actual corrupted loads are. The result is very interesting - 
sometimes the corrupted data is valid ARM instructions, other times 
valid kernel-space addresses, so it seems clear this is an addressing 
problem. Also interesting is how I'll see just one or a few corrupted 
words, which implies the corruption is in the interface between DCACHE 
and the processor rather than errant fetch of a line into DCACHE from 
memory (otherwise the entire DCACHE line would hold corrupt data). You 
can see a sample of the failure output here: 
https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt 
<https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt>


Finally, to exclude any possibility the issue is related to possible 
kernel code running and corrupting register sets/memory (such as an 
interrupt routine), I ported the test to a kernel module and ran the 
logic within a local_irq_disable() block, which disables both 
preemption and interrupts on the core. Still fails. I created a 
separate repository for the kernel module version here: 
https://github.com/horshack-dpreview/ipq8065-sqrbug-driver 
<https://github.com/horshack-dpreview/ipq8065-sqrbug-driver>



*From:* Horshack ‪‬ 
*Sent:* Tuesday, March 24, 2020 9:25 PM
*To:* Sebastian Gottschall ; 
dropbear@ucc.asn.au 
*Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear 
X4S R7800
I excluded context switches as a possible culprit by looping until a 
corruption happened for which no context switches occurred while the 
test was running (ie, at the start of the test I would save the # of 
involuntary/voluntary context switches from /proc//status, then 
check those counts again after the failure - if they were different I 
restarted the test and kept looping until a failure happened in which 
the ctx switch counts were the same.



*From:* dropbear-bounces+horshack=live@ucc.asn.au 
 on behalf of Sebastian 
Gottschall 

*Sent:* Tuesday, March 24, 2020 9:13 PM
*To:* dropbear@ucc.asn.au 
*Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear 
X4S R7800


if the corruption is caused by a context switch the problem can be 
caused by the kernel.

try the following and disable "CONFIG_KERNEL_MODE_NEON"
in the kernel config. this will disable some kernel crypto assembly code

Am 24.03.2020 um 16:11 schrieb Matt Johnston:

Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on 
OpenWRT lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is 
stressing it somehow? It doesn't seem to be using the Neon 
instruction for anything apart from loads/stores though - is there 
something that the compiler should be doing mixing Neon and non-Neon 
operations?


Cheers,
Matt

(Your emails got held up being over 100kB, I've trimmed the reply 
below and let them through. Apologies to everyone for the stale old 
one that got let through with them just now, I wasn't looking closely)


On Tue 24/3/2020, at 11:23 am, Horshack ‪‬ <mailto:horsh...@live.com>> wrote:


I was able to isolate the issue to just a handful of assembly 
instructions within fast_s_mp_sqr(), related to the squaring loop. I 
broke that code out into a separate utility that reproduces the 
issue within a few seconds. The failure is somewhat sensitive to the 
data pattern and very sensitive to timing, indicating a likely 
memory/data path issue within my particular router. I'm guessing 
it's the IPQ8065 and not the SDRAM because I can get it to fail with 
a tiny data set easily fits within DCACHE. I can alter the frequency 
of the failure with a single ARM memory barrier instruction, which 
at first implied a superscalar data ordering condition but the 
memory barrier also alters the timing through the DCACHE so that is 
likely the effect it's having. I was able to exclude the VFP/Neon 
register corruption as the cause with some test code. I also 
excluded any context switch-speciifc issue by measuring the # of 
context switches in /proc//status and catching a failure where 
no switches had occurred. I also modified the affinity so the 
utility runs on just one processor to rule out a specific core 
having the issue.


I put the source and binary of my utility on github - if a

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-28 Thread Horshack ‪‬
As a postscript, I was able to refine the logic to produce the corrupted result 
almost instantaneously. I'm also able to get it to fail with an all-zero input 
dataset and a bitwise OR operation instead of the original squaring 
multiplication operations, which allows me to see what actual corrupted loads 
are. The result is very interesting - sometimes the corrupted data is valid ARM 
instructions, other times valid kernel-space addresses, so it seems clear this 
is an addressing problem. Also interesting is how I'll see just one or a few 
corrupted words, which implies the corruption is in the interface between 
DCACHE and the processor rather than errant fetch of a line into DCACHE from 
memory (otherwise the entire DCACHE line would hold corrupt data). You can see 
a sample of the failure output here: 
https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt

Finally, to exclude any possibility the issue is related to possible kernel 
code running and corrupting register sets/memory (such as an interrupt 
routine), I ported the test to a kernel module and ran the logic within a 
local_irq_disable() block, which disables both preemption and interrupts on the 
core. Still fails. I created a separate repository for the kernel module 
version here: https://github.com/horshack-dpreview/ipq8065-sqrbug-driver


From: Horshack ‪‬ 
Sent: Tuesday, March 24, 2020 9:25 PM
To: Sebastian Gottschall ; dropbear@ucc.asn.au 

Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

I excluded context switches as a possible culprit by looping until a corruption 
happened for which no context switches occurred while the test was running (ie, 
at the start of the test I would save the # of involuntary/voluntary context 
switches from /proc//status, then check those counts again after the 
failure - if they were different I restarted the test and kept looping until a 
failure happened in which the ctx switch counts were the same.


From: dropbear-bounces+horshack=live@ucc.asn.au 
 on behalf of Sebastian 
Gottschall 
Sent: Tuesday, March 24, 2020 9:13 PM
To: dropbear@ucc.asn.au 
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800


if the corruption is caused by a context switch the problem can be caused by 
the kernel.
try the following and disable "CONFIG_KERNEL_MODE_NEON"
in the kernel config. this will disable some kernel crypto assembly code

Am 24.03.2020 um 16:11 schrieb Matt Johnston:
Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on OpenWRT 
lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is stressing it 
somehow? It doesn't seem to be using the Neon instruction for anything apart 
from loads/stores though - is there something that the compiler should be doing 
mixing Neon and non-Neon operations?

Cheers,
Matt

(Your emails got held up being over 100kB, I've trimmed the reply below and let 
them through. Apologies to everyone for the stale old one that got let through 
with them just now, I wasn't looking closely)

On Tue 24/3/2020, at 11:23 am, Horshack ‪‬ 
mailto:horsh...@live.com>> wrote:

I was able to isolate the issue to just a handful of assembly instructions 
within fast_s_mp_sqr(), related to the squaring loop. I broke that code out 
into a separate utility that reproduces the issue within a few seconds. The 
failure is somewhat sensitive to the data pattern and very sensitive to timing, 
indicating a likely memory/data path issue within my particular router. I'm 
guessing it's the IPQ8065 and not the SDRAM because I can get it to fail with a 
tiny data set easily fits within DCACHE. I can alter the frequency of the 
failure with a single ARM memory barrier instruction, which at first implied a 
superscalar data ordering condition but the memory barrier also alters the 
timing through the DCACHE so that is likely the effect it's having. I was able 
to exclude the VFP/Neon register corruption as the cause with some test code. I 
also excluded any context switch-speciifc issue by measuring the # of context 
switches in /proc//status and catching a failure where no switches had 
occurred. I also modified the affinity so the utility runs on just one 
processor to rule out a specific core having the issue.

I put the source and binary of my utility on github - if anyone on this mailing 
list has this model router can you give it a try if possible? You only need the 
ipq8065-sqrbug (binary) and run-ipq8065-sqrbug.sh (script). Here's the link to 
the repository: https://github.com/horshack-dpreview/ipq8065-sqrbug



From: Horshack ‪‬ mailto:horsh...@live.com>>
Sent: Saturday, March 21, 2020 7:54 AM
To: dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au> 
mailto:dropbear@ucc.asn.au>>
Subject: SSH key

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Sebastian Gottschall
how can you make sure that no context switch is happening if the kernel 
uses neon instructions by itself? by stopping the kernel?


this is faily impossible. check if this option is on, and disable it to 
make sure that the kernel does not make use of neon instructions



Am 25.03.2020 um 05:25 schrieb Horshack ‪‬:
I excluded context switches as a possible culprit by looping until a 
corruption happened for which no context switches occurred while the 
test was running (ie, at the start of the test I would save the # of 
involuntary/voluntary context switches from /proc//status, then 
check those counts again after the failure - if they were different I 
restarted the test and kept looping until a failure happened in which 
the ctx switch counts were the same.



*From:* dropbear-bounces+horshack=live@ucc.asn.au 
 on behalf of Sebastian 
Gottschall 

*Sent:* Tuesday, March 24, 2020 9:13 PM
*To:* dropbear@ucc.asn.au 
*Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear 
X4S R7800


if the corruption is caused by a context switch the problem can be 
caused by the kernel.

try the following and disable "CONFIG_KERNEL_MODE_NEON"
in the kernel config. this will disable some kernel crypto assembly code

Am 24.03.2020 um 16:11 schrieb Matt Johnston:

Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on 
OpenWRT lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is 
stressing it somehow? It doesn't seem to be using the Neon 
instruction for anything apart from loads/stores though - is there 
something that the compiler should be doing mixing Neon and non-Neon 
operations?


Cheers,
Matt

(Your emails got held up being over 100kB, I've trimmed the reply 
below and let them through. Apologies to everyone for the stale old 
one that got let through with them just now, I wasn't looking closely)


On Tue 24/3/2020, at 11:23 am, Horshack ‪‬ <mailto:horsh...@live.com>> wrote:


I was able to isolate the issue to just a handful of assembly 
instructions within fast_s_mp_sqr(), related to the squaring loop. I 
broke that code out into a separate utility that reproduces the 
issue within a few seconds. The failure is somewhat sensitive to the 
data pattern and very sensitive to timing, indicating a likely 
memory/data path issue within my particular router. I'm guessing 
it's the IPQ8065 and not the SDRAM because I can get it to fail with 
a tiny data set easily fits within DCACHE. I can alter the frequency 
of the failure with a single ARM memory barrier instruction, which 
at first implied a superscalar data ordering condition but the 
memory barrier also alters the timing through the DCACHE so that is 
likely the effect it's having. I was able to exclude the VFP/Neon 
register corruption as the cause with some test code. I also 
excluded any context switch-speciifc issue by measuring the # of 
context switches in /proc//status and catching a failure where 
no switches had occurred. I also modified the affinity so the 
utility runs on just one processor to rule out a specific core 
having the issue.


I put the source and binary of my utility on github - if anyone on 
this mailing list has this model router can you give it a try if 
possible? You only need the ipq8065-sqrbug (binary) and 
run-ipq8065-sqrbug.sh (script). Here's the link to the 
repository:https://github.com/horshack-dpreview/ipq8065-sqrbug 
<https://github.com/horshack-dpreview/ipq8065-sqrbug>




*From:*Horshack ‪‬ mailto:horsh...@live.com>>
*Sent:*Saturday, March 21, 2020 7:54 AM
*To:*dropbear@ucc.asn.au 
<mailto:dropbear@ucc.asn.au><mailto:dropbear@ucc.asn.au>>
*Subject:*SSH key exchange fails 30-70% of the time on Netgear X4S 
R7800

Including mailing list for my last two messages below...

Begin forwarded message:


*From:*Horshack ‪‬ mailto:horsh...@live.com>>
*Date:*March 21, 2020 at 7:35:18 AM PDT
*To:*Matt Johnston mailto:m...@ucc.asn.au>>
*Cc:*"dropbear@ucc.asn.au <mailto:dropbear@ucc.asn.au>" 
mailto:dropbear@ucc.asn.au>>
*Subject:**Re:  SSH key exchange fails 30-70% of the time on 
Netgear X4S R7800*



Disassembly of fast_s_mp_sqr() and other libtommath functions 
reveals gcc is utilizing the arm NEON SIMD instructions and 
registers for calculations involved with libtommath's mp_word 
scalar. Based on the 64-bit word corruption I see I'm guessing the 
SIMD registers aren't being preserved/restored properly somewhere, 
probably during a context switch, specifically s16–s31 (d8–d15, 
q4–q7), which AAPCS says must be preserved and which I see being 
used in the disassembly of fast_s_mp_sqr(). I'lll write some test 
code later today to see if this is the case, 

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ‪‬
I excluded context switches as a possible culprit by looping until a corruption 
happened for which no context switches occurred while the test was running (ie, 
at the start of the test I would save the # of involuntary/voluntary context 
switches from /proc//status, then check those counts again after the 
failure - if they were different I restarted the test and kept looping until a 
failure happened in which the ctx switch counts were the same.


From: dropbear-bounces+horshack=live@ucc.asn.au 
 on behalf of Sebastian 
Gottschall 
Sent: Tuesday, March 24, 2020 9:13 PM
To: dropbear@ucc.asn.au 
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800


if the corruption is caused by a context switch the problem can be caused by 
the kernel.
try the following and disable "CONFIG_KERNEL_MODE_NEON"
in the kernel config. this will disable some kernel crypto assembly code

Am 24.03.2020 um 16:11 schrieb Matt Johnston:
Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on OpenWRT 
lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is stressing it 
somehow? It doesn't seem to be using the Neon instruction for anything apart 
from loads/stores though - is there something that the compiler should be doing 
mixing Neon and non-Neon operations?

Cheers,
Matt

(Your emails got held up being over 100kB, I've trimmed the reply below and let 
them through. Apologies to everyone for the stale old one that got let through 
with them just now, I wasn't looking closely)

On Tue 24/3/2020, at 11:23 am, Horshack ‪‬ 
mailto:horsh...@live.com>> wrote:

I was able to isolate the issue to just a handful of assembly instructions 
within fast_s_mp_sqr(), related to the squaring loop. I broke that code out 
into a separate utility that reproduces the issue within a few seconds. The 
failure is somewhat sensitive to the data pattern and very sensitive to timing, 
indicating a likely memory/data path issue within my particular router. I'm 
guessing it's the IPQ8065 and not the SDRAM because I can get it to fail with a 
tiny data set easily fits within DCACHE. I can alter the frequency of the 
failure with a single ARM memory barrier instruction, which at first implied a 
superscalar data ordering condition but the memory barrier also alters the 
timing through the DCACHE so that is likely the effect it's having. I was able 
to exclude the VFP/Neon register corruption as the cause with some test code. I 
also excluded any context switch-speciifc issue by measuring the # of context 
switches in /proc//status and catching a failure where no switches had 
occurred. I also modified the affinity so the utility runs on just one 
processor to rule out a specific core having the issue.

I put the source and binary of my utility on github - if anyone on this mailing 
list has this model router can you give it a try if possible? You only need the 
ipq8065-sqrbug (binary) and run-ipq8065-sqrbug.sh (script). Here's the link to 
the repository: https://github.com/horshack-dpreview/ipq8065-sqrbug



From: Horshack ‪‬ mailto:horsh...@live.com>>
Sent: Saturday, March 21, 2020 7:54 AM
To: dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au> 
mailto:dropbear@ucc.asn.au>>
Subject: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Including mailing list for my last two messages below...

Begin forwarded message:

From: Horshack ‪‬ mailto:horsh...@live.com>>
Date: March 21, 2020 at 7:35:18 AM PDT
To: Matt Johnston mailto:m...@ucc.asn.au>>
Cc: "dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au>" 
mailto:dropbear@ucc.asn.au>>
Subject: Re:  SSH key exchange fails 30-70% of the time on Netgear X4S R7800


Disassembly of fast_s_mp_sqr() and other libtommath functions reveals gcc is 
utilizing the arm NEON SIMD instructions and registers for calculations 
involved with libtommath's mp_word scalar. Based on the 64-bit word corruption 
I see I'm guessing the SIMD registers aren't being preserved/restored properly 
somewhere, probably during a context switch, specifically s16–s31 (d8–d15, 
q4–q7), which AAPCS says must be preserved and which I see being used in the 
disassembly of fast_s_mp_sqr(). I'lll write some test code later today to see 
if this is the case, and if so, try to track down where and why the registers 
aren't being preserved.


From: Horshack ‪‬ mailto:horsh...@live.com>>
Sent: Saturday, March 21, 2020 1:11 AM
To: Matt Johnston mailto:m...@ucc.asn.au>>
Cc: dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au> 
mailto:dropbear@ucc.asn.au>>
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

I have one of the failure paths isolated down to a single corrupt 64-bit word 
in memory, which required a significa

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Sebastian Gottschall
if the corruption is caused by a context switch the problem can be 
caused by the kernel.

try the following and disable "CONFIG_KERNEL_MODE_NEON"
in the kernel config. this will disable some kernel crypto assembly code

Am 24.03.2020 um 16:11 schrieb Matt Johnston:

Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on 
OpenWRT lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is 
stressing it somehow? It doesn't seem to be using the Neon instruction 
for anything apart from loads/stores though - is there something that 
the compiler should be doing mixing Neon and non-Neon operations?


Cheers,
Matt

(Your emails got held up being over 100kB, I've trimmed the reply 
below and let them through. Apologies to everyone for the stale old 
one that got let through with them just now, I wasn't looking closely)


On Tue 24/3/2020, at 11:23 am, Horshack ‪‬ <mailto:horsh...@live.com>> wrote:


I was able to isolate the issue to just a handful of assembly 
instructions within fast_s_mp_sqr(), related to the squaring loop. I 
broke that code out into a separate utility that reproduces the issue 
within a few seconds. The failure is somewhat sensitive to the data 
pattern and very sensitive to timing, indicating a likely memory/data 
path issue within my particular router. I'm guessing it's the IPQ8065 
and not the SDRAM because I can get it to fail with a tiny data set 
easily fits within DCACHE. I can alter the frequency of the failure 
with a single ARM memory barrier instruction, which at first implied 
a superscalar data ordering condition but the memory barrier also 
alters the timing through the DCACHE so that is likely the effect 
it's having. I was able to exclude the VFP/Neon register corruption 
as the cause with some test code. I also excluded any context 
switch-speciifc issue by measuring the # of context switches in 
/proc//status and catching a failure where no switches had 
occurred. I also modified the affinity so the utility runs on just 
one processor to rule out a specific core having the issue.


I put the source and binary of my utility on github - if anyone on 
this mailing list has this model router can you give it a try if 
possible? You only need the ipq8065-sqrbug (binary) and 
run-ipq8065-sqrbug.sh (script). Here's the link to the 
repository:https://github.com/horshack-dpreview/ipq8065-sqrbug 
<https://github.com/horshack-dpreview/ipq8065-sqrbug>




*From:*Horshack ‪‬ mailto:horsh...@live.com>>
*Sent:*Saturday, March 21, 2020 7:54 AM
*To:*dropbear@ucc.asn.au 
<mailto:dropbear@ucc.asn.au><mailto:dropbear@ucc.asn.au>>

*Subject:*SSH key exchange fails 30-70% of the time on Netgear X4S R7800
Including mailing list for my last two messages below...

Begin forwarded message:


*From:*Horshack ‪‬ mailto:horsh...@live.com>>
*Date:*March 21, 2020 at 7:35:18 AM PDT
*To:*Matt Johnston mailto:m...@ucc.asn.au>>
*Cc:*"dropbear@ucc.asn.au <mailto:dropbear@ucc.asn.au>" 
mailto:dropbear@ucc.asn.au>>
*Subject:**Re:  SSH key exchange fails 30-70% of the time on Netgear 
X4S R7800*



Disassembly of fast_s_mp_sqr() and other libtommath functions 
reveals gcc is utilizing the arm NEON SIMD instructions and 
registers for calculations involved with libtommath's mp_word 
scalar. Based on the 64-bit word corruption I see I'm guessing the 
SIMD registers aren't being preserved/restored properly somewhere, 
probably during a context switch, specifically s16–s31 (d8–d15, 
q4–q7), which AAPCS says must be preserved and which I see being 
used in the disassembly of fast_s_mp_sqr(). I'lll write some test 
code later today to see if this is the case, and if so, try to track 
down where and why the registers aren't being preserved.



*From:*Horshack ‪‬ mailto:horsh...@live.com>>
*Sent:*Saturday, March 21, 2020 1:11 AM
*To:*Matt Johnston mailto:m...@ucc.asn.au>>
*Cc:*dropbear@ucc.asn.au <mailto:dropbear@ucc.asn.au> 
mailto:dropbear@ucc.asn.au>>
*Subject:*Re: SSH key exchange fails 30-70% of the time on Netgear 
X4S R7800
I have one of the failure paths isolated down to a single corrupt 
64-bit word in memory, which required a significant amount of code 
instrumentation to achieve. I implemented a code execution history 
buffer that gets filled at various checkpoints within s_mp_exptmod() 
and some of the modules called by it. To facilitate this history 
mechanism I packaged all of s_mp_exptmod()'s local variables inside 
a structure , which consists of saving the local scalar vars in 
addition to crc32's of all the mp_int data structures with a 
separate crc32 of the mp_int.dp payload (data). When a failure 
occurs, ie one or more of the three back-to-back debu

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Matt Johnston
Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on OpenWRT 
lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is stressing it 
somehow? It doesn't seem to be using the Neon instruction for anything apart 
from loads/stores though - is there something that the compiler should be doing 
mixing Neon and non-Neon operations?

Cheers,
Matt

(Your emails got held up being over 100kB, I've trimmed the reply below and let 
them through. Apologies to everyone for the stale old one that got let through 
with them just now, I wasn't looking closely)

> On Tue 24/3/2020, at 11:23 am, Horshack ‪‬  wrote:
> 
> I was able to isolate the issue to just a handful of assembly instructions 
> within fast_s_mp_sqr(), related to the squaring loop. I broke that code out 
> into a separate utility that reproduces the issue within a few seconds. The 
> failure is somewhat sensitive to the data pattern and very sensitive to 
> timing, indicating a likely memory/data path issue within my particular 
> router. I'm guessing it's the IPQ8065 and not the SDRAM because I can get it 
> to fail with a tiny data set easily fits within DCACHE. I can alter the 
> frequency of the failure with a single ARM memory barrier instruction, which 
> at first implied a superscalar data ordering condition but the memory barrier 
> also alters the timing through the DCACHE so that is likely the effect it's 
> having. I was able to exclude the VFP/Neon register corruption as the cause 
> with some test code. I also excluded any context switch-speciifc issue by 
> measuring the # of context switches in /proc//status and catching a 
> failure where no switches had occurred. I also modified the affinity so the 
> utility runs on just one processor to rule out a specific core having the 
> issue.
> 
> I put the source and binary of my utility on github - if anyone on this 
> mailing list has this model router can you give it a try if possible? You 
> only need the ipq8065-sqrbug (binary) and run-ipq8065-sqrbug.sh (script). 
> Here's the link to the repository: 
> https://github.com/horshack-dpreview/ipq8065-sqrbug 
> <https://github.com/horshack-dpreview/ipq8065-sqrbug>
> 
> 
> From: Horshack ‪‬ mailto:horsh...@live.com>>
> Sent: Saturday, March 21, 2020 7:54 AM
> To: dropbear@ucc.asn.au <mailto:dropbear@ucc.asn.au>  <mailto:dropbear@ucc.asn.au>>
> Subject: SSH key exchange fails 30-70% of the time on Netgear X4S R7800
>  
> Including mailing list for my last two messages below...
> 
> Begin forwarded message:
> 
>> From: Horshack ‪‬ 
>> Date: March 21, 2020 at 7:35:18 AM PDT
>> To: Matt Johnston 
>> Cc: "dropbear@ucc.asn.au" 
>> Subject: Re:  SSH key exchange fails 30-70% of the time on Netgear X4S R7800
>> 
>> 
>> Disassembly of fast_s_mp_sqr() and other libtommath functions reveals gcc is 
>> utilizing the arm NEON SIMD instructions and registers for calculations 
>> involved with libtommath's mp_word scalar. Based on the 64-bit word 
>> corruption I see I'm guessing the SIMD registers aren't being 
>> preserved/restored properly somewhere, probably during a context switch, 
>> specifically s16–s31 (d8–d15, q4–q7), which AAPCS says must be preserved and 
>> which I see being used in the disassembly of fast_s_mp_sqr(). I'lll write 
>> some test code later today to see if this is the case, and if so, try to 
>> track down where and why the registers aren't being preserved.
>> 
>> From: Horshack ‪‬ 
>> Sent: Saturday, March 21, 2020 1:11 AM
>> To: Matt Johnston 
>> Cc: dropbear@ucc.asn.au 
>> Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800
>>  
>> I have one of the failure paths isolated down to a single corrupt 64-bit 
>> word in memory, which required a significant amount of code instrumentation 
>> to achieve. I implemented a code execution history buffer that gets filled 
>> at various checkpoints within s_mp_exptmod() and some of the modules called 
>> by it. To facilitate this history mechanism I packaged all of 
>> s_mp_exptmod()'s local variables inside a structure , which consists of 
>> saving the local scalar vars in addition to crc32's of all the mp_int data 
>> structures with a separate crc32 of the mp_int.dp payload (data). When a 
>> failure occurs, ie one or more of the three back-to-back debug invocations 
>> of s_mp_exptmod yields a mismatching signed key result, I  dump out the 
>> history elements for each of the invocations to determine the first code 
>> checkpoint where failing invocation departed from the known correct 
>> invocation.

*snipped*




Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ‪‬
Disassembly of fast_s_mp_sqr() and other libtommath functions reveals gcc is 
utilizing the arm NEON SIMD instructions and registers for calculations 
involved with libtommath's mp_word scalar. Based on the 64-bit word corruption 
I see I'm guessing the SIMD registers aren't being preserved/restored properly 
somewhere, probably during a context switch, specifically s16–s31 (d8–d15, 
q4–q7), which AAPCS says must be preserved and which I see being used in the 
disassembly of fast_s_mp_sqr(). I'lll write some test code later today to see 
if this is the case, and if so, try to track down where and why the registers 
aren't being preserved.


From: Horshack ‪‬ 
Sent: Saturday, March 21, 2020 1:11 AM
To: Matt Johnston 
Cc: dropbear@ucc.asn.au 
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

I have one of the failure paths isolated down to a single corrupt 64-bit word 
in memory, which required a significant amount of code instrumentation to 
achieve. I implemented a code execution history buffer that gets filled at 
various checkpoints within s_mp_exptmod() and some of the modules called by it. 
To facilitate this history mechanism I packaged all of s_mp_exptmod()'s local 
variables inside a structure , which consists of saving the local scalar vars 
in addition to crc32's of all the mp_int data structures with a separate crc32 
of the mp_int.dp payload (data). When a failure occurs, ie one or more of the 
three back-to-back debug invocations of s_mp_exptmod yields a mismatching 
signed key result, I  dump out the history elements for each of the invocations 
to determine the first code checkpoint where failing invocation departed from 
the known correct invocation.

Here's a sample capture demonstrating.

Format is event #:, source code line #, crc32 of local scalars, crc32 of mp_int 
structures (minus dp field), and crc32 of all the mp_int dp data payloads. In 
this sample, the crc32 of the dp data payload is different, which causes all 
subsequent crc32's for the remainder of the invocation to be difference since 
the data propagates through all the subsequent calculations performed in the 
routine.

1554: line=0492, crcLocalVars=6a08573e, crcMpIntNoDp=ab967993, 
crcMpIntDp=ded4078e crcRes=2554be5b, 0021 0005 0016 0002 0061 0003 0001
1555: line=0488, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=e71d5c11 crcRes=5ef59250, 0021 0005 0016 0002 0061 0004 0001
1556: line=2049, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=e71d5c11 crcRes=5ef59250, 0021 0005 0016 0002 0061 0004 0001
1557: line=2062, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ab967993, 
crcMpIntDp=21b13223 crcRes=a43fde70, 0021 0005 0016 0002 0061 0004 0001
1558: line=0492, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ab967993, 
crcMpIntDp=21b13223 crcRes=a43fde70, 0021 0005 0016 0002 0061 0004 0001
1559: line=0501, crcLocalVars=7a3e1d2a, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=7691624d crcRes=6d1388bc, 0021 0005 0016 0002 0061 0005 0001

1554: line=0492, crcLocalVars=6a08573e, crcMpIntNoDp=ab967993, 
crcMpIntDp=ded4078e crcRes=2554be5b, 0021 0005 0016 0002 0061 0003 0001
1555: line=0488, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=e71d5c11 crcRes=5ef59250, 0021 0005 0016 0002 0061 0004 0001
1556: line=2049, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=e71d5c11 crcRes=5ef59250, 0021 0005 0016 0002 0061 0004 0001
1557: line=2062, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ab967993, 
crcMpIntDp=a2639ce4 crcRes=74f7dec6, 0021 0005 0016 0002 0061 0004 0001
1558: line=0492, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ab967993, 
crcMpIntDp=a2639ce4 crcRes=74f7dec6, 0021 0005 0016 0002 0061 0004 0001
1559: line=0501, crcLocalVars=7a3e1d2a, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=5e3343d2 crcRes=517ed1b0, 0021 0005 0016 0002 0061 0005 0001

I initially found the failure occurs at seemingly random places, affected 
mostly by the variances of code/data placement between builds, which also 
affects the frequency of failure. Through a lot of trial and error I was able 
to tease the failure down to one of the simplest code paths (fast_s_mp_sqr), 
which required balancing debug code placement to keep the movement of the 
failure in control. fast_s_mp_sqr() does only basic arithmetic and is easy to 
follow. I haven't yet determined if the corrupt data is pre-calculation or 
post-calculation due to the limits of how much data I can snapshot in the 
history buffer. Nevertheless I expanded the history mechanism to snapshot the 
specific mp_int that usually is corrupted via this path (s_mp_exptmod's local 
res structure).

Here is correct vs corrupt mp_init at the specific execution point where it 
departs from the previous correction invocation. The data fields prefixed by : 
are the actual content of the mp_int - I've highlighted the mismatching crc32's 
and the mismatching 64-bit word:

Correct invocation:
8057: line=2062, crcLocalVars=1d8f10b6, crcMpIntNoDp=80a0f0a7, 
crcMpIntDp=e92a3e1f

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ‪‬
 nonaliasing data cache, PIPT instruction cache
[0.00] OF: fdt: Machine model: Netgear Nighthawk X4S R7800
[0.00] Memory policy: Data cache writealloc
[0.00] On node 0 totalpages: 122880
[0.00] free_area_init_node: node 0, pgdat c0a27880, node_mem_map 
dda39000
[0.00]   Normal zone: 960 pages used for memmap
[0.00]   Normal zone: 0 pages reserved
[0.00]   Normal zone: 122880 pages, LIFO batch:31
[0.00] random: get_random_bytes called from 0xc09008dc with crng_init=0
[0.00] percpu: Embedded 15 pages/cpu s29388 r8192 d23860 u61440


From: Matt Johnston 
Sent: Friday, March 20, 2020 3:50 AM
To: Horshack ‪‬ 
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Hi,

That's an interesting failure. You should be able to disable SMP if you set 
maxcpus=1 as a kernel boot argument - not sure where you would set that for 
your device though.
I guess the other option is that a kernel syscall somewhere is clobbering 
registers, disabling SMP wouldn't avoid that...
Which kernel is it running, and what's the CPU (/proc/cpuinfo)?

Cheers,
Matt


On Fri 20/3/2020, at 3:28 pm, Horshack ‪‬ 
mailto:horsh...@live.com>> wrote:

Update - I have isolated the intermittent issue down to the interchangeable 
functions s_mp_exptmod_fast() and s_mp_exptmod() - by default 
s_mp_exptmod_fast() is compiled instead of s_mp_exptmod() 
[BN_MP_EXPTMOD_FAST_C] but both functions intermittently fail and I decided to 
use s_mp_exptmod() as my focus because it's slightly simpler.

s_mp_exptmod() is called indirectly by rsa.c::buf_put_rsa_sign()'s call to 
mp_exptmod(). For the intermittent failing case if I call mp_exptmod() / 
s_mp_exptmod() immediately again with the same source mp_int structures it 
yields the correct data. Example - debug code bolded:

DEF_MP_INT(rsa_s_backup);
DEF_MP_INT(rsa_s_backup_2);

mp_copy (_s, _s_backup);
mp_copy (_s, _s_backup_2);

if (mp_exptmod(_tmp1, key->d, key->n, _s) != MP_OKAY) {
dropbear_exit("RSA error");
}
if (mp_exptmod(_tmp1, key->d, key->n, _s_backup) != MP_OKAY) {
dropbear_exit("RSA error");
}
if (mp_exptmod(_tmp1, key->d, key->n, _s_backup_2) != MP_OKAY) {
dropbear_exit("RSA error");
}
printf("after mp_exptmod\n");
dump_mp_int("rsa_s", _s);
dump_mp_int("rsa_s_backup", _s_backup);
dump_mp_int("rsa_s_backup_2", _s_backup_2);
comp_mp_int("rsa_s", "rsa_s_backup", _s, _s_backup);
comp_mp_int("rsa_s_backup", "rsa_s_backup_2", _s_backup, 
_s_backup_2);
mp_clear(_s_backup);
mp_clear(_s_backup_2);

Sample output from a failure, which contains the first portion of each 
mp_int->dp. Bolded text has wrong data:

after mp_exptmod
rsa_s [0xbef6c358]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 30 e1 8f 00  J...0...
rsa_s->dp [0x008fe130]:
    05 fb c0 0f 68 91 ff 0a 9f 05 57 0b 35 a2 bd 05  h.W.5...
  0010  57 ec a0 0b 34 3c b1 0f fa 8b b5 08 ed aa 9c 04  W...4<..
  0020  7e 88 bb 04 12 42 51 05 9a 6d 7d 0a 98 ef 12 0c  ~BQ..m}.
  0030  76 e0 f4 0f ea 89 d7 0c 87 b0 76 03 12 a1 2d 0e  v.v...-.
  0040  d7 3c df 06 0f 54 92 04 23 90.<...T..#.
rsa_s_backup [0xbef6c398]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 00 d8 8f 00  J...
rsa_s_backup->dp [0x008fd800]:
    ec 9f a0 01 d4 8e e8 07 c3 ae df 0b 45 61 e6 06  Ea..
  0010  a1 99 59 03 d7 49 24 02 50 a6 ac 0a de a2 5c 0d  ..Y..I$.P.\.
  0020  cb b7 3c 05 33 cb da 08 28 10 f2 04 14 69 d6 07  ..<.3...(i..
  0030  8c 8e a5 04 f5 fc 92 0c ba 88 d9 04 71 b4 b2 08  q...
  0040  bc 4f c7 0d de 73 f9 06 0d bf.O...s
rsa_s_backup_2 [0xbef6c3a8]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 e0 d1 8f 00  J...
rsa_s_backup_2->dp [0x008fd1e0]:
    ec 9f a0 01 d4 8e e8 07 c3 ae df 0b 45 61 e6 06  Ea..
  0010  a1 99 59 03 d7 49 24 02 50 a6 ac 0a de a2 5c 0d  ..Y..I$.P.\.
  0020  cb b7 3c 05 33 cb da 08 28 10 f2 04 14 69 d6 07  ..<.3...(i..
  0030  8c 8e a5 04 f5 fc 92 0c ba 88 d9 04 71 b4 b2 08  q...
  0040  bc 4f c7 0d de 73 f9 06 0d bf.O...s
rsa_s and rsa_s_backup differ

Sometimes it's the second or third call that yields the incorrect data. In this 
instance it was the second call:
after mp_exptmod
rsa_s [0xbe9a6358]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 30 c1 40 02  J...0.@.
rsa_s->dp [0x0240c130]:
    25 b9 db 00 ec 62 00 0d 80 2d b0 0d 00 13 d3 06  %b...-..
  0010  3f ec 8b 0a af 5d e9 03 2d f4 4b 0c 6c 3c 72 08  ?]..-.K.ldp [0x0240b800]:
    df 86 0c 0a 6c 2f 68 09 f9 a1 37 01 26 02 e7 0b  l/h...7.&...
  0010  69 5c b8 0e 0b 95 3a 0d 26 24 00 0e 97 

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-24 Thread Horshack ‪‬
I was able to isolate the issue to just a handful of assembly instructions 
within fast_s_mp_sqr(), related to the squaring loop. I broke that code out 
into a separate utility that reproduces the issue within a few seconds. The 
failure is somewhat sensitive to the data pattern and very sensitive to timing, 
indicating a likely memory/data path issue within my particular router. I'm 
guessing it's the IPQ8065 and not the SDRAM because I can get it to fail with a 
tiny data set easily fits within DCACHE. I can alter the frequency of the 
failure with a single ARM memory barrier instruction, which at first implied a 
superscalar data ordering condition but the memory barrier also alters the 
timing through the DCACHE so that is likely the effect it's having. I was able 
to exclude the VFP/Neon register corruption as the cause with some test code. I 
also excluded any context switch-speciifc issue by measuring the # of context 
switches in /proc//status and catching a failure where no switches had 
occurred. I also modified the affinity so the utility runs on just one 
processor to rule out a specific core having the issue.

I put the source and binary of my utility on github - if anyone on this mailing 
list has this model router can you give it a try if possible? You only need the 
ipq8065-sqrbug (binary) and run-ipq8065-sqrbug.sh (script). Here's the link to 
the repository: https://github.com/horshack-dpreview/ipq8065-sqrbug



From: Horshack ‪‬ 
Sent: Saturday, March 21, 2020 7:54 AM
To: dropbear@ucc.asn.au 
Subject: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Including mailing list for my last two messages below...

Begin forwarded message:

From: Horshack ‪‬ 
Date: March 21, 2020 at 7:35:18 AM PDT
To: Matt Johnston 
Cc: "dropbear@ucc.asn.au" 
Subject: Re:  SSH key exchange fails 30-70% of the time on Netgear X4S R7800


Disassembly of fast_s_mp_sqr() and other libtommath functions reveals gcc is 
utilizing the arm NEON SIMD instructions and registers for calculations 
involved with libtommath's mp_word scalar. Based on the 64-bit word corruption 
I see I'm guessing the SIMD registers aren't being preserved/restored properly 
somewhere, probably during a context switch, specifically s16–s31 (d8–d15, 
q4–q7), which AAPCS says must be preserved and which I see being used in the 
disassembly of fast_s_mp_sqr(). I'lll write some test code later today to see 
if this is the case, and if so, try to track down where and why the registers 
aren't being preserved.


From: Horshack ‪‬ 
Sent: Saturday, March 21, 2020 1:11 AM
To: Matt Johnston 
Cc: dropbear@ucc.asn.au 
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

I have one of the failure paths isolated down to a single corrupt 64-bit word 
in memory, which required a significant amount of code instrumentation to 
achieve. I implemented a code execution history buffer that gets filled at 
various checkpoints within s_mp_exptmod() and some of the modules called by it. 
To facilitate this history mechanism I packaged all of s_mp_exptmod()'s local 
variables inside a structure , which consists of saving the local scalar vars 
in addition to crc32's of all the mp_int data structures with a separate crc32 
of the mp_int.dp payload (data). When a failure occurs, ie one or more of the 
three back-to-back debug invocations of s_mp_exptmod yields a mismatching 
signed key result, I  dump out the history elements for each of the invocations 
to determine the first code checkpoint where failing invocation departed from 
the known correct invocation.

Here's a sample capture demonstrating.

Format is event #:, source code line #, crc32 of local scalars, crc32 of mp_int 
structures (minus dp field), and crc32 of all the mp_int dp data payloads. In 
this sample, the crc32 of the dp data payload is different, which causes all 
subsequent crc32's for the remainder of the invocation to be difference since 
the data propagates through all the subsequent calculations performed in the 
routine.

1554: line=0492, crcLocalVars=6a08573e, crcMpIntNoDp=ab967993, 
crcMpIntDp=ded4078e crcRes=2554be5b, 0021 0005 0016 0002 0061 0003 0001
1555: line=0488, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=e71d5c11 crcRes=5ef59250, 0021 0005 0016 0002 0061 0004 0001
1556: line=2049, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=e71d5c11 crcRes=5ef59250, 0021 0005 0016 0002 0061 0004 0001
1557: line=2062, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ab967993, 
crcMpIntDp=21b13223 crcRes=a43fde70, 0021 0005 0016 0002 0061 0004 0001
1558: line=0492, crcLocalVars=7dc8fe2c, crcMpIntNoDp=ab967993, 
crcMpIntDp=21b13223 crcRes=a43fde70, 0021 0005 0016 0002 0061 0004 0001
1559: line=0501, crcLocalVars=7a3e1d2a, crcMpIntNoDp=ad3e197a, 
crcMpIntDp=7691624d crcRes=6d1388bc, 0021 0005 0016 0002 0061 0005 0001

1554: line=0492, crcLocalVars=6a08573e, crcMpIntNoD

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-20 Thread Horshack ‪‬
Update - I have isolated the intermittent issue down to the interchangeable 
functions s_mp_exptmod_fast() and s_mp_exptmod() - by default 
s_mp_exptmod_fast() is compiled instead of s_mp_exptmod() 
[BN_MP_EXPTMOD_FAST_C] but both functions intermittently fail and I decided to 
use s_mp_exptmod() as my focus because it's slightly simpler.

s_mp_exptmod() is called indirectly by rsa.c::buf_put_rsa_sign()'s call to 
mp_exptmod(). For the intermittent failing case if I call mp_exptmod() / 
s_mp_exptmod() immediately again with the same source mp_int structures it 
yields the correct data. Example - debug code bolded:

DEF_MP_INT(rsa_s_backup);
DEF_MP_INT(rsa_s_backup_2);

mp_copy (_s, _s_backup);
mp_copy (_s, _s_backup_2);

if (mp_exptmod(_tmp1, key->d, key->n, _s) != MP_OKAY) {
dropbear_exit("RSA error");
}
if (mp_exptmod(_tmp1, key->d, key->n, _s_backup) != MP_OKAY) {
dropbear_exit("RSA error");
}
if (mp_exptmod(_tmp1, key->d, key->n, _s_backup_2) != MP_OKAY) {
dropbear_exit("RSA error");
}
printf("after mp_exptmod\n");
dump_mp_int("rsa_s", _s);
dump_mp_int("rsa_s_backup", _s_backup);
dump_mp_int("rsa_s_backup_2", _s_backup_2);
comp_mp_int("rsa_s", "rsa_s_backup", _s, _s_backup);
comp_mp_int("rsa_s_backup", "rsa_s_backup_2", _s_backup, 
_s_backup_2);
mp_clear(_s_backup);
mp_clear(_s_backup_2);

Sample output from a failure, which contains the first portion of each 
mp_int->dp. Bolded text has wrong data:

after mp_exptmod
rsa_s [0xbef6c358]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 30 e1 8f 00  J...0...
rsa_s->dp [0x008fe130]:
    05 fb c0 0f 68 91 ff 0a 9f 05 57 0b 35 a2 bd 05  h.W.5...
  0010  57 ec a0 0b 34 3c b1 0f fa 8b b5 08 ed aa 9c 04  W...4<..
  0020  7e 88 bb 04 12 42 51 05 9a 6d 7d 0a 98 ef 12 0c  ~BQ..m}.
  0030  76 e0 f4 0f ea 89 d7 0c 87 b0 76 03 12 a1 2d 0e  v.v...-.
  0040  d7 3c df 06 0f 54 92 04 23 90.<...T..#.
rsa_s_backup [0xbef6c398]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 00 d8 8f 00  J...
rsa_s_backup->dp [0x008fd800]:
    ec 9f a0 01 d4 8e e8 07 c3 ae df 0b 45 61 e6 06  Ea..
  0010  a1 99 59 03 d7 49 24 02 50 a6 ac 0a de a2 5c 0d  ..Y..I$.P.\.
  0020  cb b7 3c 05 33 cb da 08 28 10 f2 04 14 69 d6 07  ..<.3...(i..
  0030  8c 8e a5 04 f5 fc 92 0c ba 88 d9 04 71 b4 b2 08  q...
  0040  bc 4f c7 0d de 73 f9 06 0d bf.O...s
rsa_s_backup_2 [0xbef6c3a8]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 e0 d1 8f 00  J...
rsa_s_backup_2->dp [0x008fd1e0]:
    ec 9f a0 01 d4 8e e8 07 c3 ae df 0b 45 61 e6 06  Ea..
  0010  a1 99 59 03 d7 49 24 02 50 a6 ac 0a de a2 5c 0d  ..Y..I$.P.\.
  0020  cb b7 3c 05 33 cb da 08 28 10 f2 04 14 69 d6 07  ..<.3...(i..
  0030  8c 8e a5 04 f5 fc 92 0c ba 88 d9 04 71 b4 b2 08  q...
  0040  bc 4f c7 0d de 73 f9 06 0d bf.O...s
rsa_s and rsa_s_backup differ

Sometimes it's the second or third call that yields the incorrect data. In this 
instance it was the second call:
after mp_exptmod
rsa_s [0xbe9a6358]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 30 c1 40 02  J...0.@.
rsa_s->dp [0x0240c130]:
    25 b9 db 00 ec 62 00 0d 80 2d b0 0d 00 13 d3 06  %b...-..
  0010  3f ec 8b 0a af 5d e9 03 2d f4 4b 0c 6c 3c 72 08  ?]..-.K.ldp [0x0240b800]:
    df 86 0c 0a 6c 2f 68 09 f9 a1 37 01 26 02 e7 0b  l/h...7.&...
  0010  69 5c b8 0e 0b 95 3a 0d 26 24 00 0e 97 6f dc 0b  i\:.&$...o..
  0020  64 95 ed 0a c0 75 53 03 66 3d ff 0b 26 4b ce 09  duS.f=..
  0030  89 12 d2 03 9b 9b 0b 09 19 2c 5a 00 2c 99 fc 0b  .,Z.,...
  0040  ea ad 61 09 38 e1 6a 0a 49 a5..a.8.j.I.
rsa_s_backup_2 [0xbe9a63a8]:
    4a 00 00 00 c0 00 00 00 00 00 00 00 e0 b1 40 02  J.@.
rsa_s_backup_2->dp [0x0240b1e0]:
    25 b9 db 00 ec 62 00 0d 80 2d b0 0d 00 13 d3 06  %b...-..
  0010  3f ec 8b 0a af 5d e9 03 2d f4 4b 0c 6c 3c 72 08  ?]..-.K.l
Sent: Thursday, March 19, 2020 7:11 AM
To: Matt Johnston 
Cc: dropbear@ucc.asn.au 
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Thanks Matt, I'll give that a shot when I get a build environment set up for 
the server-side/openwrt.

I also plan to look at the RSA blinding logic in buf_put_rsa_sign(). 
Considering the intermittency of  the issue I'm thinking the issue has some 
correlation or dependency to the random data generated or transformed by that 
logic. Crypto is well outside my core competency so it'll be slow-going.

________
From: Matt Johnston 
Sent: Thursday, March 19, 2020 7:04 AM
To: Horshack ‪‬ 
Cc: dropbear@ucc.asn.au 
Subject: Re: SSH key excha

Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-19 Thread Horshack ‪‬
Thanks Matt, I'll give that a shot when I get a build environment set up for 
the server-side/openwrt.

I also plan to look at the RSA blinding logic in buf_put_rsa_sign(). 
Considering the intermittency of  the issue I'm thinking the issue has some 
correlation or dependency to the random data generated or transformed by that 
logic. Crypto is well outside my core competency so it'll be slow-going.


From: Matt Johnston 
Sent: Thursday, March 19, 2020 7:04 AM
To: Horshack ‪‬ 
Cc: dropbear@ucc.asn.au 
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Hi,

The first thing I'd try would be to build with -O0 compilation flags to rule 
out compiler optimisations doing something strange.

Cheers,
Matt


On Thu 19/3/2020, at 3:42 pm, Horshack ‪‬ 
mailto:horsh...@live.com>> wrote:

Update - I cloned and built the dbclient source so I could enable the debug 
tracing facility to get more information about the 'Bad hostkey signature'. The 
intermittent failure is detected in recv_msg_kexdh_reply() -> buf_rsa_verify() 
-> mp_cmd(). If I bypass the buf_rsa_verify() call then the session proceeds 
normally without issue, which indicates everything else in the key exchange is 
working 100% of the time. I'll dig deeper to see why the signed host key sent 
by the server is wrong.


From: Horshack ‪‬
Sent: Wednesday, March 18, 2020 9:36 AM
To: dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au> 
mailto:dropbear@ucc.asn.au>>
Subject: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Hi,

I have a strange issue on my Netgear X4S R7800. Running either DD-WRT or 
OpenWrt, approximately 30-70% of my SSH login attempts fail. For OpenSSH 
clients the error reported is "error in libcrypto". For the PuTTY client the 
error is more descriptive - "Signature from server's host key is invalid". The 
failure occurs even when using the OpenSSH client built in to OpenWrt itself 
(ie, SSH'ing into the router from the router via an existing remote SSH 
session).

The failure appears to be at the tail end of the key exchange, before 
authentication. I've tried varying the cipher (aes128-ctr / aes256-ctr), the 
MAC (hmac-sha1 / hmac-sha2-256), and the key exchange algo (curve25519-sha256 / 
curve25519-sha...@libssh.org<mailto:curve25519-sha...@libssh.org> / 
diffie-hellman-group14-sha256 / diffie-hellman-group14-sha1) but the 
intermittent failure still occurs. The frequency of failure is about the same 
for all these configuration options except for diffie-hellman-group14-sha256, 
which fails much more frequently - it sometimes takes hundreds of attempts to 
succeed. Perhaps that will provide a clue to the underlying cause.

Once an SSH login succeeds the connection is stable. However if I initiate a 
manual rekey operation via ~R then the key re-exchange fails. The router is 
otherwise very stable with no noticeable issues.

I'm an embedded firmware engineer but have never worked on DD-WRT/OpenWrt 
firmware or dropbear. I have a conceptual understanding of the key exchange 
algo but haven't looked at the actual code of any implementation including 
Dropbear's. I'm seek ideas on how to troubleshoot this issue. Considering the 
problem is intermittent I'm thinking it's some variant in the key 
generation/exchange algorithm that's failing due to some issue with the router, 
or a more remote possibility, an issue with the Dropbear implementation.

Here are pastebin links to the PuTTY full debug logs (w/raw data dumps) for 
both the failure and success cases:
Failure Case: https://pastebin.com/MS2BtFmW
Success Case: https://pastebin.com/c4j66Ga9

The only message I see from dropbear for a failed connection attempt is:

authpriv.info<http://authpriv.info> dropbear[15948]: Child connection from 
192.168.1.249:54819
authpriv.info<http://authpriv.info> dropbear[15948]: Exit before auth: 
Disconnect received


Thanks!



Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

2020-03-19 Thread Matt Johnston
Hi,

The first thing I'd try would be to build with -O0 compilation flags to rule 
out compiler optimisations doing something strange. 

Cheers,
Matt


> On Thu 19/3/2020, at 3:42 pm, Horshack ‪‬  wrote:
> 
> Update - I cloned and built the dbclient source so I could enable the debug 
> tracing facility to get more information about the 'Bad hostkey signature'. 
> The intermittent failure is detected in recv_msg_kexdh_reply() -> 
> buf_rsa_verify() -> mp_cmd(). If I bypass the buf_rsa_verify() call then the 
> session proceeds normally without issue, which indicates everything else in 
> the key exchange is working 100% of the time. I'll dig deeper to see why the 
> signed host key sent by the server is wrong.
> 
> From: Horshack ‪‬
> Sent: Wednesday, March 18, 2020 9:36 AM
> To: dropbear@ucc.asn.au 
> Subject: SSH key exchange fails 30-70% of the time on Netgear X4S R7800
>  
> Hi,
> 
> I have a strange issue on my Netgear X4S R7800. Running either DD-WRT or 
> OpenWrt, approximately 30-70% of my SSH login attempts fail. For OpenSSH 
> clients the error reported is "error in libcrypto". For the PuTTY client the 
> error is more descriptive - "Signature from server's host key is invalid". 
> The failure occurs even when using the OpenSSH client built in to OpenWrt 
> itself (ie, SSH'ing into the router from the router via an existing remote 
> SSH session).
> 
> The failure appears to be at the tail end of the key exchange, before 
> authentication. I've tried varying the cipher (aes128-ctr / aes256-ctr), the 
> MAC (hmac-sha1 / hmac-sha2-256), and the key exchange algo (curve25519-sha256 
> / curve25519-sha...@libssh.org / diffie-hellman-group14-sha256 / 
> diffie-hellman-group14-sha1) but the intermittent failure still occurs. The 
> frequency of failure is about the same for all these configuration options 
> except for diffie-hellman-group14-sha256, which fails much more frequently - 
> it sometimes takes hundreds of attempts to succeed. Perhaps that will provide 
> a clue to the underlying cause.
> 
> Once an SSH login succeeds the connection is stable. However if I initiate a 
> manual rekey operation via ~R then the key re-exchange fails. The router is 
> otherwise very stable with no noticeable issues.
> 
> I'm an embedded firmware engineer but have never worked on DD-WRT/OpenWrt 
> firmware or dropbear. I have a conceptual understanding of the key exchange 
> algo but haven't looked at the actual code of any implementation including 
> Dropbear's. I'm seek ideas on how to troubleshoot this issue. Considering the 
> problem is intermittent I'm thinking it's some variant in the key 
> generation/exchange algorithm that's failing due to some issue with the 
> router, or a more remote possibility, an issue with the Dropbear 
> implementation.
> 
> Here are pastebin links to the PuTTY full debug logs (w/raw data dumps) for 
> both the failure and success cases:
> Failure Case: https://pastebin.com/MS2BtFmW 
> Success Case: https://pastebin.com/c4j66Ga9 
> 
> The only message I see from dropbear for a failed connection attempt is:
> 
> authpriv.info dropbear[15948]: Child connection from 192.168.1.249:54819
> authpriv.info dropbear[15948]: Exit before auth: Disconnect received
> 
> 
> Thanks!