Update - I have isolated the intermittent issue down to the interchangeable 
functions s_mp_exptmod_fast() and s_mp_exptmod() - by default 
s_mp_exptmod_fast() is compiled instead of s_mp_exptmod() 
[BN_MP_EXPTMOD_FAST_C] but both functions intermittently fail and I decided to 
use s_mp_exptmod() as my focus because it's slightly simpler.

s_mp_exptmod() is called indirectly by rsa.c::buf_put_rsa_sign()'s call to 
mp_exptmod(). For the intermittent failing case if I call mp_exptmod() / 
s_mp_exptmod() immediately again with the same source mp_int structures it 
yields the correct data. Example - debug code bolded:

    DEF_MP_INT(rsa_s_backup);
    DEF_MP_INT(rsa_s_backup_2);

    mp_copy (&rsa_s, &rsa_s_backup);
    mp_copy (&rsa_s, &rsa_s_backup_2);

    if (mp_exptmod(&rsa_tmp1, key->d, key->n, &rsa_s) != MP_OKAY) {
        dropbear_exit("RSA error");
    }
    if (mp_exptmod(&rsa_tmp1, key->d, key->n, &rsa_s_backup) != MP_OKAY) {
        dropbear_exit("RSA error");
    }
    if (mp_exptmod(&rsa_tmp1, key->d, key->n, &rsa_s_backup_2) != MP_OKAY) {
        dropbear_exit("RSA error");
    }
    printf("after mp_exptmod\n");
    dump_mp_int("rsa_s", &rsa_s);
    dump_mp_int("rsa_s_backup", &rsa_s_backup);
    dump_mp_int("rsa_s_backup_2", &rsa_s_backup_2);
    comp_mp_int("rsa_s", "rsa_s_backup", &rsa_s, &rsa_s_backup);
    comp_mp_int("rsa_s_backup", "rsa_s_backup_2", &rsa_s_backup, 
&rsa_s_backup_2);
    mp_clear(&rsa_s_backup);
    mp_clear(&rsa_s_backup_2);

Sample output from a failure, which contains the first portion of each 
mp_int->dp. Bolded text has wrong data:

after mp_exptmod
rsa_s [0xbef6c358]:
  0000  4a 00 00 00 c0 00 00 00 00 00 00 00 30 e1 8f 00  J...........0...
rsa_s->dp [0x008fe130]:
  0000  05 fb c0 0f 68 91 ff 0a 9f 05 57 0b 35 a2 bd 05  ....h.....W.5...
  0010  57 ec a0 0b 34 3c b1 0f fa 8b b5 08 ed aa 9c 04  W...4<..........
  0020  7e 88 bb 04 12 42 51 05 9a 6d 7d 0a 98 ef 12 0c  ~....BQ..m}.....
  0030  76 e0 f4 0f ea 89 d7 0c 87 b0 76 03 12 a1 2d 0e  v.........v...-.
  0040  d7 3c df 06 0f 54 92 04 23 90                    .<...T..#.
rsa_s_backup [0xbef6c398]:
  0000  4a 00 00 00 c0 00 00 00 00 00 00 00 00 d8 8f 00  J...............
rsa_s_backup->dp [0x008fd800]:
  0000  ec 9f a0 01 d4 8e e8 07 c3 ae df 0b 45 61 e6 06  ............Ea..
  0010  a1 99 59 03 d7 49 24 02 50 a6 ac 0a de a2 5c 0d  ..Y..I$.P.....\.
  0020  cb b7 3c 05 33 cb da 08 28 10 f2 04 14 69 d6 07  ..<.3...(....i..
  0030  8c 8e a5 04 f5 fc 92 0c ba 88 d9 04 71 b4 b2 08  ............q...
  0040  bc 4f c7 0d de 73 f9 06 0d bf                    .O...s....
rsa_s_backup_2 [0xbef6c3a8]:
  0000  4a 00 00 00 c0 00 00 00 00 00 00 00 e0 d1 8f 00  J...............
rsa_s_backup_2->dp [0x008fd1e0]:
  0000  ec 9f a0 01 d4 8e e8 07 c3 ae df 0b 45 61 e6 06  ............Ea..
  0010  a1 99 59 03 d7 49 24 02 50 a6 ac 0a de a2 5c 0d  ..Y..I$.P.....\.
  0020  cb b7 3c 05 33 cb da 08 28 10 f2 04 14 69 d6 07  ..<.3...(....i..
  0030  8c 8e a5 04 f5 fc 92 0c ba 88 d9 04 71 b4 b2 08  ............q...
  0040  bc 4f c7 0d de 73 f9 06 0d bf                    .O...s....
rsa_s and rsa_s_backup differ

Sometimes it's the second or third call that yields the incorrect data. In this 
instance it was the second call:
after mp_exptmod
rsa_s [0xbe9a6358]:
  0000  4a 00 00 00 c0 00 00 00 00 00 00 00 30 c1 40 02  J...........0.@.
rsa_s->dp [0x0240c130]:
  0000  25 b9 db 00 ec 62 00 0d 80 2d b0 0d 00 13 d3 06  %....b...-......
  0010  3f ec 8b 0a af 5d e9 03 2d f4 4b 0c 6c 3c 72 08  ?....]..-.K.l<r.
  0020  5d 52 6a 08 21 4c dd 01 a2 59 1a 03 33 16 97 0f  ]Rj.!L...Y..3...
  0030  c7 69 c2 08 0b 61 d6 03 b9 86 fc 01 27 15 c8 0c  .i...a......'...
  0040  dd 03 b1 04 78 c7 9f 0f d8 9c                    ....x.....
rsa_s_backup [0xbe9a6398]:
  0000  4a 00 00 00 c0 00 00 00 00 00 00 00 00 b8 40 02  J.............@.
rsa_s_backup->dp [0x0240b800]:
  0000  df 86 0c 0a 6c 2f 68 09 f9 a1 37 01 26 02 e7 0b  ....l/h...7.&...
  0010  69 5c b8 0e 0b 95 3a 0d 26 24 00 0e 97 6f dc 0b  i\....:.&$...o..
  0020  64 95 ed 0a c0 75 53 03 66 3d ff 0b 26 4b ce 09  d....uS.f=..&K..
  0030  89 12 d2 03 9b 9b 0b 09 19 2c 5a 00 2c 99 fc 0b  .........,Z.,...
  0040  ea ad 61 09 38 e1 6a 0a 49 a5                    ..a.8.j.I.
rsa_s_backup_2 [0xbe9a63a8]:
  0000  4a 00 00 00 c0 00 00 00 00 00 00 00 e0 b1 40 02  J.............@.
rsa_s_backup_2->dp [0x0240b1e0]:
  0000  25 b9 db 00 ec 62 00 0d 80 2d b0 0d 00 13 d3 06  %....b...-......
  0010  3f ec 8b 0a af 5d e9 03 2d f4 4b 0c 6c 3c 72 08  ?....]..-.K.l<r.
  0020  5d 52 6a 08 21 4c dd 01 a2 59 1a 03 33 16 97 0f  ]Rj.!L...Y..3...
  0030  c7 69 c2 08 0b 61 d6 03 b9 86 fc 01 27 15 c8 0c  .i...a......'...
  0040  dd 03 b1 04 78 c7 9f 0f d8 9c                    ....x.....
rsa_s and rsa_s_backup differ

I have heavily instrumented s_mp_exptmod() but due to the complexity of the 
calcualtions performed it's proving very difficult to root down to the issue. 
What I can tell so far is the failure point within s_mp_exptmod() varies from 
instance to instance, which is odd because the only potential variant between 
my three, back-to-back invocations are the memory allocations (buffer 
locations) triggered by mp_exptmod(), although the invocations usually get 
provided the same buffer addresses. I tried various scaffolding code on the 
core memory allocation routines to isolate any buffer overruns/overwrites the 
logic might be performing, including padding each allocation by a large block 
of bytes, but the intermittent failure case still occurs. The behavior I'm 
observing almost appears as if the execution context is being corrupted (ie, 
processor registers) because the failure point moves around the various 
elements of the logic within the routine from one failure to the next - 
sometimes I see an early-stage mp_int structure with the wrong data, sometimes 
one that has undergone many transformations - all within s_mp_exptmod().

Do you know if OpenWRT has any way to disable SMP at runtime, or a method or 
technique to provide a critical section around a block of code to prevent any 
preemptive task switches?

________________________________
From: Horshack ‪‬ <horsh...@live.com>
Sent: Thursday, March 19, 2020 7:11 AM
To: Matt Johnston <m...@ucc.asn.au>
Cc: dropbear@ucc.asn.au <dropbear@ucc.asn.au>
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Thanks Matt, I'll give that a shot when I get a build environment set up for 
the server-side/openwrt.

I also plan to look at the RSA blinding logic in buf_put_rsa_sign(). 
Considering the intermittency of  the issue I'm thinking the issue has some 
correlation or dependency to the random data generated or transformed by that 
logic. Crypto is well outside my core competency so it'll be slow-going.

________________________________
From: Matt Johnston <m...@ucc.asn.au>
Sent: Thursday, March 19, 2020 7:04 AM
To: Horshack ‪‬ <horsh...@live.com>
Cc: dropbear@ucc.asn.au <dropbear@ucc.asn.au>
Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Hi,

The first thing I'd try would be to build with -O0 compilation flags to rule 
out compiler optimisations doing something strange.

Cheers,
Matt


On Thu 19/3/2020, at 3:42 pm, Horshack ‪‬ 
<horsh...@live.com<mailto:horsh...@live.com>> wrote:

Update - I cloned and built the dbclient source so I could enable the debug 
tracing facility to get more information about the 'Bad hostkey signature'. The 
intermittent failure is detected in recv_msg_kexdh_reply() -> buf_rsa_verify() 
-> mp_cmd(). If I bypass the buf_rsa_verify() call then the session proceeds 
normally without issue, which indicates everything else in the key exchange is 
working 100% of the time. I'll dig deeper to see why the signed host key sent 
by the server is wrong.

________________________________
From: Horshack ‪‬
Sent: Wednesday, March 18, 2020 9:36 AM
To: dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au> 
<dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au>>
Subject: SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Hi,

I have a strange issue on my Netgear X4S R7800. Running either DD-WRT or 
OpenWrt, approximately 30-70% of my SSH login attempts fail. For OpenSSH 
clients the error reported is "error in libcrypto". For the PuTTY client the 
error is more descriptive - "Signature from server's host key is invalid". The 
failure occurs even when using the OpenSSH client built in to OpenWrt itself 
(ie, SSH'ing into the router from the router via an existing remote SSH 
session).

The failure appears to be at the tail end of the key exchange, before 
authentication. I've tried varying the cipher (aes128-ctr / aes256-ctr), the 
MAC (hmac-sha1 / hmac-sha2-256), and the key exchange algo (curve25519-sha256 / 
curve25519-sha...@libssh.org<mailto:curve25519-sha...@libssh.org> / 
diffie-hellman-group14-sha256 / diffie-hellman-group14-sha1) but the 
intermittent failure still occurs. The frequency of failure is about the same 
for all these configuration options except for diffie-hellman-group14-sha256, 
which fails much more frequently - it sometimes takes hundreds of attempts to 
succeed. Perhaps that will provide a clue to the underlying cause.

Once an SSH login succeeds the connection is stable. However if I initiate a 
manual rekey operation via ~R then the key re-exchange fails. The router is 
otherwise very stable with no noticeable issues.

I'm an embedded firmware engineer but have never worked on DD-WRT/OpenWrt 
firmware or dropbear. I have a conceptual understanding of the key exchange 
algo but haven't looked at the actual code of any implementation including 
Dropbear's. I'm seek ideas on how to troubleshoot this issue. Considering the 
problem is intermittent I'm thinking it's some variant in the key 
generation/exchange algorithm that's failing due to some issue with the router, 
or a more remote possibility, an issue with the Dropbear implementation.

Here are pastebin links to the PuTTY full debug logs (w/raw data dumps) for 
both the failure and success cases:
Failure Case: https://pastebin.com/MS2BtFmW
Success Case: https://pastebin.com/c4j66Ga9

The only message I see from dropbear for a failed connection attempt is:

authpriv.info<http://authpriv.info> dropbear[15948]: Child connection from 
192.168.1.249:54819
authpriv.info<http://authpriv.info> dropbear[15948]: Exit before auth: 
Disconnect received


Thanks!

Reply via email to