Hi:
         I am a software engineer of Alcatel-Lucent. In our product we use 
dropbear v071 under the OS: Linux version 3.4.24. At most time it works 
perfectly, but recently we got a problem: sometimes a child-process of dropbear 
occupied nearly 100% CPU (we use ARM1176, single-core). After I investigated it 
,I found it is cause by a misuse of KEX_REKEY_TIMEOUT.

KEX_REKEY_TIMEOUT is defined as 8hours. that means when a session lasts more 
than 8 hours, the server and client will re-exchange their KEY for security 
reason. The timestamp of last-time KEY-EXCHANGED is recorded in variable 
"ses.kexstate.lastkextime".

 The child dropbear process decides the "timeout" parameter of "select" 
function by calling "select_timeout". we can see it checks the timeout-events 
like KEX_REKEY_TIMEOUT, AUTH_TIMEOUT, keepalive_secs. If there is a timeout 
occurs, the "update_timeout" function returns a negative value, then 
"select_timeout" modifies it to ZREO by this:

/* clamp negative timeouts to zero - event has already triggered */

         return MAX(timeout, 0);

   if "select_timeout" returns ZERO, the next "select" call (in "session_loop") 
will return immediately. Then it will check timeout events by this:

/* check for auth timeout, rekeying required etc */

                   checktimeouts();

   in the function " checktimeouts ", when it find the timeout is reached or to 
many data has been sent, it will send a SSH_MSG_KEXINIT message to peer. 
Normally this message will trigger a new KEY-EXCHANGE. However, when there is a 
network problem that the peer can't receive the message , this bug occurs: the 
timestamp ses.kexstate.lastkextime is only updated by calling  
"switch_keys"-->" kexinitialise ", unfortunately this calling sequence is 
driven by ssh-messages, either SSH_MSG_KEXDH_INIT or SSH_MSG_NEWKEYS. When 
there is no ssh-message received , the child dropbear process enters dead-loop 
"select" with ZERO-timeout parameter caused by KEX_REKEY_TIMEOUT.

>      So there is a very simple way to reproduce this bug: first define the 
> KEX_REKEY_TIMEOUT as small as possible( I set it to 8 seconds), then start a 
> ssh-session , the child dropbear process is forked. then plug out the network 
> wire, after 8 seconds the child dropbear thread will occupy 100% CPU. Could 
> you kindly check it? thanks.



Best regards

Reply via email to