Thanks Matt, I'll give that a shot when I get a build environment set up for the server-side/openwrt.
I also plan to look at the RSA blinding logic in buf_put_rsa_sign(). Considering the intermittency of the issue I'm thinking the issue has some correlation or dependency to the random data generated or transformed by that logic. Crypto is well outside my core competency so it'll be slow-going. ________________________________ From: Matt Johnston <m...@ucc.asn.au> Sent: Thursday, March 19, 2020 7:04 AM To: Horshack <horsh...@live.com> Cc: dropbear@ucc.asn.au <dropbear@ucc.asn.au> Subject: Re: SSH key exchange fails 30-70% of the time on Netgear X4S R7800 Hi, The first thing I'd try would be to build with -O0 compilation flags to rule out compiler optimisations doing something strange. Cheers, Matt On Thu 19/3/2020, at 3:42 pm, Horshack <horsh...@live.com<mailto:horsh...@live.com>> wrote: Update - I cloned and built the dbclient source so I could enable the debug tracing facility to get more information about the 'Bad hostkey signature'. The intermittent failure is detected in recv_msg_kexdh_reply() -> buf_rsa_verify() -> mp_cmd(). If I bypass the buf_rsa_verify() call then the session proceeds normally without issue, which indicates everything else in the key exchange is working 100% of the time. I'll dig deeper to see why the signed host key sent by the server is wrong. ________________________________ From: Horshack Sent: Wednesday, March 18, 2020 9:36 AM To: dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au> <dropbear@ucc.asn.au<mailto:dropbear@ucc.asn.au>> Subject: SSH key exchange fails 30-70% of the time on Netgear X4S R7800 Hi, I have a strange issue on my Netgear X4S R7800. Running either DD-WRT or OpenWrt, approximately 30-70% of my SSH login attempts fail. For OpenSSH clients the error reported is "error in libcrypto". For the PuTTY client the error is more descriptive - "Signature from server's host key is invalid". The failure occurs even when using the OpenSSH client built in to OpenWrt itself (ie, SSH'ing into the router from the router via an existing remote SSH session). The failure appears to be at the tail end of the key exchange, before authentication. I've tried varying the cipher (aes128-ctr / aes256-ctr), the MAC (hmac-sha1 / hmac-sha2-256), and the key exchange algo (curve25519-sha256 / curve25519-sha...@libssh.org<mailto:curve25519-sha...@libssh.org> / diffie-hellman-group14-sha256 / diffie-hellman-group14-sha1) but the intermittent failure still occurs. The frequency of failure is about the same for all these configuration options except for diffie-hellman-group14-sha256, which fails much more frequently - it sometimes takes hundreds of attempts to succeed. Perhaps that will provide a clue to the underlying cause. Once an SSH login succeeds the connection is stable. However if I initiate a manual rekey operation via ~R then the key re-exchange fails. The router is otherwise very stable with no noticeable issues. I'm an embedded firmware engineer but have never worked on DD-WRT/OpenWrt firmware or dropbear. I have a conceptual understanding of the key exchange algo but haven't looked at the actual code of any implementation including Dropbear's. I'm seek ideas on how to troubleshoot this issue. Considering the problem is intermittent I'm thinking it's some variant in the key generation/exchange algorithm that's failing due to some issue with the router, or a more remote possibility, an issue with the Dropbear implementation. Here are pastebin links to the PuTTY full debug logs (w/raw data dumps) for both the failure and success cases: Failure Case: https://pastebin.com/MS2BtFmW Success Case: https://pastebin.com/c4j66Ga9 The only message I see from dropbear for a failed connection attempt is: authpriv.info<http://authpriv.info> dropbear[15948]: Child connection from 192.168.1.249:54819 authpriv.info<http://authpriv.info> dropbear[15948]: Exit before auth: Disconnect received Thanks!