i can exclude neon code for dd-wrt in dropbear if it helps. but would be
greater to nail down the problem. otherwise other programms would be
likelly affected too
Am 28.03.2020 um 21:06 schrieb Horshack :
As a postscript, I was able to refine the logic to produce the
corrupted result almost instantaneously. I'm also able to get it to
fail with an all-zero input dataset and a bitwise OR operation instead
of the original squaring multiplication operations, which allows me to
see what actual corrupted loads are. The result is very interesting -
sometimes the corrupted data is valid ARM instructions, other times
valid kernel-space addresses, so it seems clear this is an addressing
problem. Also interesting is how I'll see just one or a few corrupted
words, which implies the corruption is in the interface between DCACHE
and the processor rather than errant fetch of a line into DCACHE from
memory (otherwise the entire DCACHE line would hold corrupt data). You
can see a sample of the failure output here:
https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt
<https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt>
Finally, to exclude any possibility the issue is related to possible
kernel code running and corrupting register sets/memory (such as an
interrupt routine), I ported the test to a kernel module and ran the
logic within a local_irq_disable() block, which disables both
preemption and interrupts on the core. Still fails. I created a
separate repository for the kernel module version here:
https://github.com/horshack-dpreview/ipq8065-sqrbug-driver
<https://github.com/horshack-dpreview/ipq8065-sqrbug-driver>
------------------------------------------------------------------------
*From:* Horshack <horsh...@live.com>
*Sent:* Tuesday, March 24, 2020 9:25 PM
*To:* Sebastian Gottschall <s.gottsch...@dd-wrt.com>;
dropbear@ucc.asn.au <dropbear@ucc.asn.au>
*Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear
X4S R7800
I excluded context switches as a possible culprit by looping until a
corruption happened for which no context switches occurred while the
test was running (ie, at the start of the test I would save the # of
involuntary/voluntary context switches from /proc/<pid>/status, then
check those counts again after the failure - if they were different I
restarted the test and kept looping until a failure happened in which
the ctx switch counts were the same.
------------------------------------------------------------------------
*From:* dropbear-bounces+horshack=live....@ucc.asn.au
<dropbear-bounces+horshack=live....@ucc.asn.au> on behalf of Sebastian
Gottschall <s.gottsch...@dd-wrt.com>
*Sent:* Tuesday, March 24, 2020 9:13 PM
*To:* dropbear@ucc.asn.au <dropbear@ucc.asn.au>
*Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear
X4S R7800
if the corruption is caused by a context switch the problem can be
caused by the kernel.
try the following and disable "CONFIG_KERNEL_MODE_NEON"
in the kernel config. this will disable some kernel crypto assembly code
Am 24.03.2020 um 16:11 schrieb Matt Johnston:
Good work narrowing down a test case there.
That's an interesting finding - I guess it might be worth posting on
OpenWRT lists/forum to try find other testers.
Could it be power related if the tight multiplication loop is
stressing it somehow? It doesn't seem to be using the Neon
instruction for anything apart from loads/stores though - is there
something that the compiler should be doing mixing Neon and non-Neon
operations?
Cheers,
Matt
(Your emails got held up being over 100kB, I've trimmed the reply
below and let them through. Apologies to everyone for the stale old
one that got let through with them just now, I wasn't looking closely)
On Tue 24/3/2020, at 11:23 am, Horshack <horsh...@live.com
<mailto:horsh...@live.com>> wrote:
I was able to isolate the issue to just a handful of assembly
instructions within fast_s_mp_sqr(), related to the squaring loop. I
broke that code out into a separate utility that reproduces the
issue within a few seconds. The failure is somewhat sensitive to the
data pattern and very sensitive to timing, indicating a likely
memory/data path issue within my particular router. I'm guessing
it's the IPQ8065 and not the SDRAM because I can get it to fail with
a tiny data set easily fits within DCACHE. I can alter the frequency
of the failure with a single ARM memory barrier instruction, which
at first implied a superscalar data ordering condition but the
memory barrier also alters the timing through the DCACHE so that is
likely the effect it's having. I was able to exclude the VFP/Neon
register corruption as the cause with some test code. I also
excluded any context switch-speciifc issue by measuring the # of
context switches in /proc/<pid>/status and catching a failure where
no switches had occurred. I also modified the affinity so the
utility runs on just one processor to rule out a specific core
having the issue.
I put the source and binary of my utility on github - if anyone on
this mailing list has this model router can you give it a try if
possible? You only need the ipq8065-sqrbug (binary) and
run-ipq8065-sqrbug.sh (script). Here's the link to the
repository:https://github.com/horshack-dpreview/ipq8065-sqrbug
<https://github.com/horshack-dpreview/ipq8065-sqrbug>
------------------------------------------------------------------------
*From:*Horshack <horsh...@live.com <mailto:horsh...@live.com>>
*Sent:*Saturday, March 21, 2020 7:54 AM
*To:*dropbear@ucc.asn.au
<mailto:dropbear@ucc.asn.au><dropbear@ucc.asn.au
<mailto:dropbear@ucc.asn.au>>
*Subject:*SSH key exchange fails 30-70% of the time on Netgear X4S
R7800
Including mailing list for my last two messages below...
Begin forwarded message:
*From:*Horshack <horsh...@live.com <mailto:horsh...@live.com>>
*Date:*March 21, 2020 at 7:35:18 AM PDT
*To:*Matt Johnston <m...@ucc.asn.au <mailto:m...@ucc.asn.au>>
*Cc:*"dropbear@ucc.asn.au <mailto:dropbear@ucc.asn.au>"
<dropbear@ucc.asn.au <mailto:dropbear@ucc.asn.au>>
*Subject:**Re: SSH key exchange fails 30-70% of the time on
Netgear X4S R7800*
Disassembly of fast_s_mp_sqr() and other libtommath functions
reveals gcc is utilizing the arm NEON SIMD instructions and
registers for calculations involved with libtommath's mp_word
scalar. Based on the 64-bit word corruption I see I'm guessing the
SIMD registers aren't being preserved/restored properly somewhere,
probably during a context switch, specifically s16–s31 (d8–d15,
q4–q7), which AAPCS says must be preserved and which I see being
used in the disassembly of fast_s_mp_sqr(). I'lll write some test
code later today to see if this is the case, and if so, try to
track down where and why the registers aren't being preserved.
------------------------------------------------------------------------
*From:*Horshack <horsh...@live.com <mailto:horsh...@live.com>>
*Sent:*Saturday, March 21, 2020 1:11 AM
*To:*Matt Johnston <m...@ucc.asn.au <mailto:m...@ucc.asn.au>>
*Cc:*dropbear@ucc.asn.au <mailto:dropbear@ucc.asn.au>
<dropbear@ucc.asn.au <mailto:dropbear@ucc.asn.au>>
*Subject:*Re: SSH key exchange fails 30-70% of the time on Netgear
X4S R7800
I have one of the failure paths isolated down to a single corrupt
64-bit word in memory, which required a significant amount of code
instrumentation to achieve. I implemented a code execution history
buffer that gets filled at various checkpoints within
s_mp_exptmod() and some of the modules called by it. To facilitate
this history mechanism I packaged all of s_mp_exptmod()'s local
variables inside a structure , which consists of saving the local
scalar vars in addition to crc32's of all the mp_int data
structures with a separate crc32 of the mp_int.dp payload (data).
When a failure occurs, ie one or more of the three back-to-back
debug invocations of s_mp_exptmod yields a mismatching signed key
result, I dump out the history elements for each of the
invocations to determine the first code checkpoint where failing
invocation departed from the known correct invocation.
*snipped*