Re: a bug detected in dropbear v071

2016-05-17 Thread Thomas De Schampheleire
On Thu, May 12, 2016 at 5:21 PM, Matt Johnston  wrote:
> On Wed 11/5/2016, at 11:55 pm, Thomas De Schampheleire 
>  wrote:
>>>
>>> I expect the next release will be in perhaps a month's
>>> time - it could be longer though.
>>
>> Is there a certain strategy with respect to timing of releases? Could
>> you describe it?
>>
>> It seems a long time to me to wait a month before releasing a bug fix
>> of this type (100% CPU load). Meanwhile we can of course apply your
>> patch explicitly, but other users may be experiencing the same and may
>> not be aware of this fix.
>
> Hi Thomas,
>
> Releases usually occur once sufficient new CHANGES items have accumulated, 
> around a dozen or so is the trend. So far since 2016.73 there are about 5. 
> For the next release I intend to sort out being able to build without sha1, 
> it also needs some more thorough testing of the #ifdef->#if changes.
>
> If there's an important fix then I'll sometimes make a smaller release. How 
> frequently have you seen the 100% CPU rekey issue? As far as I can tell the 
> bug's been present since 2007 with no other reports, which is why I was 
> leaving it for the next release.

Thanks for the feedback.
The issue was seen consistently in a specific validation scenario, but
other than that we do not see it indeed.
I have applied the patch for now, and we will update to the new
release when it's ready.

Thanks for your support,
Thomas


Re: a bug detected in dropbear v071

2016-05-12 Thread Matt Johnston
On Wed 11/5/2016, at 11:55 pm, Thomas De Schampheleire 
 wrote:
>> 
>> I expect the next release will be in perhaps a month's
>> time - it could be longer though.
> 
> Is there a certain strategy with respect to timing of releases? Could
> you describe it?
> 
> It seems a long time to me to wait a month before releasing a bug fix
> of this type (100% CPU load). Meanwhile we can of course apply your
> patch explicitly, but other users may be experiencing the same and may
> not be aware of this fix.

Hi Thomas,

Releases usually occur once sufficient new CHANGES items have accumulated, 
around a dozen or so is the trend. So far since 2016.73 there are about 5. For 
the next release I intend to sort out being able to build without sha1, it also 
needs some more thorough testing of the #ifdef->#if changes.

If there's an important fix then I'll sometimes make a smaller release. How 
frequently have you seen the 100% CPU rekey issue? As far as I can tell the 
bug's been present since 2007 with no other reports, which is why I was leaving 
it for the next release.

Cheers,
Matt

Re: a bug detected in dropbear v071

2016-05-11 Thread Thomas De Schampheleire
Hi Matt,

On Wed, May 11, 2016 at 5:33 PM, Matt Johnston  wrote:
> Hi,
>
> I expect the next release will be in perhaps a month's
> time - it could be longer though.

Is there a certain strategy with respect to timing of releases? Could
you describe it?

It seems a long time to me to wait a month before releasing a bug fix
of this type (100% CPU load). Meanwhile we can of course apply your
patch explicitly, but other users may be experiencing the same and may
not be aware of this fix.

Thanks,
Thomas


Re: a bug detected in dropbear v071

2016-05-11 Thread Matt Johnston
Hi,

I expect the next release will be in perhaps a month's
time - it could be longer though.

Cheers,
Matt

On Tue, May 10, 2016 at 08:21:35AM +, ZHANG Hui P wrote:
> Hi ,
>  I have verified this commit, it works well. When can we got a formal 
>  release includes this commit?
> thanks.
> 
> From: Matt Johnston [mailto:m...@ucc.asn.au]
> Sent: 2016年4月29日 23:18
> To: ZHANG Hui P
> Cc: dropbear@ucc.asn.au
> Subject: Re: a bug detected in dropbear v071
> 
> Hi,
> 
> I think this problem should be solved by the commit
> https://secure.ucc.asn.au/hg/dropbear/rev/432b0a030fd6
> 
> Thank you for the detailed report.
> 
> Cheers,
> Matt
> 
> 
> On Wed 20/4/2016, at 2:44 pm, ZHANG Hui P 
> <hui.p.zh...@alcatel-sbell.com.cn<mailto:hui.p.zh...@alcatel-sbell.com.cn>> 
> wrote:
> 
> Hi:
>  I am a software engineer of Alcatel-Lucent. In our product we use 
> dropbear v071 under the OS: Linux version 3.4.24. At most time it works 
> perfectly, but recently we got a problem: sometimes a child-process of 
> dropbear occupied nearly 100% CPU (we use ARM1176, single-core). After I 
> investigated it ,I found it is cause by a misuse of KEX_REKEY_TIMEOUT.
> KEX_REKEY_TIMEOUT is defined as 8hours. that means when a session lasts more 
> than 8 hours, the server and client will re-exchange their KEY for security 
> reason. The timestamp of last-time KEY-EXCHANGED is recorded in variable 
> "ses.kexstate.lastkextime".
>  The child dropbear process decides the "timeout" parameter of "select" 
> function by calling "select_timeout". we can see it checks the timeout-events 
> like KEX_REKEY_TIMEOUT, AUTH_TIMEOUT, keepalive_secs. If there is a timeout 
> occurs, the "update_timeout" function returns a negative value, then 
> "select_timeout" modifies it to ZREO by this:
> /* clamp negative timeouts to zero - event has already triggered */
>  return MAX(timeout, 0);
>if "select_timeout" returns ZERO, the next "select" call (in 
> "session_loop") will return immediately. Then it will check timeout events by 
> this:
> /* check for auth timeout, rekeying required etc */
>checktimeouts();
>in the function " checktimeouts ", when it find the timeout is reached or 
> to many data has been sent, it will send a SSH_MSG_KEXINIT message to peer. 
> Normally this message will trigger a new KEY-EXCHANGE. However, when there is 
> a network problem that the peer can't receive the message , this bug occurs: 
> the timestamp ses.kexstate.lastkextime is only updated by calling  
> "switch_keys"-->" kexinitialise ", unfortunately this calling sequence is 
> driven by ssh-messages, either SSH_MSG_KEXDH_INIT or SSH_MSG_NEWKEYS. When 
> there is no ssh-message received , the child dropbear process enters 
> dead-loop "select" with ZERO-timeout parameter caused by KEX_REKEY_TIMEOUT.
> >  So there is a very simple way to reproduce this bug: first define the 
> > KEX_REKEY_TIMEOUT as small as possible( I set it to 8 seconds), then start 
> > a ssh-session , the child dropbear process is forked. then plug out the 
> > network wire, after 8 seconds the child dropbear thread will occupy 100% 
> > CPU. Could you kindly check it? thanks.
> 
> Best regards
> 


Re: a bug detected in dropbear v071

2016-04-29 Thread Matt Johnston
Hi,

I think this problem should be solved by the commit
https://secure.ucc.asn.au/hg/dropbear/rev/432b0a030fd6 


Thank you for the detailed report.

Cheers,
Matt


> On Wed 20/4/2016, at 2:44 pm, ZHANG Hui P  
> wrote:
> 
> Hi:
>  I am a software engineer of Alcatel-Lucent. In our product we use 
> dropbear v071 under the OS: Linux version 3.4.24. At most time it works 
> perfectly, but recently we got a problem: sometimes a child-process of 
> dropbear occupied nearly 100% CPU (we use ARM1176, single-core). After I 
> investigated it ,I found it is cause by a misuse of KEX_REKEY_TIMEOUT.
> KEX_REKEY_TIMEOUT is defined as 8hours. that means when a session lasts more 
> than 8 hours, the server and client will re-exchange their KEY for security 
> reason. The timestamp of last-time KEY-EXCHANGED is recorded in variable 
> "ses.kexstate.lastkextime". 
>  The child dropbear process decides the "timeout" parameter of "select" 
> function by calling "select_timeout". we can see it checks the timeout-events 
> like KEX_REKEY_TIMEOUT, AUTH_TIMEOUT, keepalive_secs. If there is a timeout 
> occurs, the "update_timeout" function returns a negative value, then 
> "select_timeout" modifies it to ZREO by this:
> /* clamp negative timeouts to zero - event has already triggered */
>  return MAX(timeout, 0);
>if "select_timeout" returns ZERO, the next "select" call (in 
> "session_loop") will return immediately. Then it will check timeout events by 
> this:
> /* check for auth timeout, rekeying required etc */
>checktimeouts();
>in the function " checktimeouts ", when it find the timeout is reached or 
> to many data has been sent, it will send a SSH_MSG_KEXINIT message to peer. 
> Normally this message will trigger a new KEY-EXCHANGE. However, when there is 
> a network problem that the peer can't receive the message , this bug occurs: 
> the timestamp ses.kexstate.lastkextime is only updated by calling  
> "switch_keys"-->" kexinitialise ", unfortunately this calling sequence is 
> driven by ssh-messages, either SSH_MSG_KEXDH_INIT or SSH_MSG_NEWKEYS. When 
> there is no ssh-message received , the child dropbear process enters 
> dead-loop "select" with ZERO-timeout parameter caused by KEX_REKEY_TIMEOUT.
> >  So there is a very simple way to reproduce this bug: first define the 
> > KEX_REKEY_TIMEOUT as small as possible( I set it to 8 seconds), then start 
> > a ssh-session , the child dropbear process is forked. then plug out the 
> > network wire, after 8 seconds the child dropbear thread will occupy 100% 
> > CPU. Could you kindly check it? thanks.
>  
> Best regards



Re: a bug detected in dropbear v071

2016-04-26 Thread Matt Johnston
Hi Thomas,

Hui's analysis look right, I'll try and test it myself later this week. (Sorry, 
replied privately).

Cheers,
Matt

On 25 April 2016 11:15:58 pm AWST, Thomas De Schampheleire 
 wrote:
>ZHANG Hui P  alcatel-sbell.com.cn> writes:
>
>> 
>> 
>> 
>> Hi:
>>  I am a software engineer of Alcatel-Lucent. In our product
>we use
>dropbear v071 under the OS: Linux version 3.4.24. At most time it works
>perfectly, but recently we got a problem: sometimes a child-process of
>>  dropbear occupied nearly 100% CPU (we use ARM1176, single-core).
>After I
>investigated it ,I found it is cause by a misuse of KEX_REKEY_TIMEOUT.
>> KEX_REKEY_TIMEOUT is defined as 8hours. that means when a session
>lasts
>more than 8 hours, the server and client will re-exchange their KEY for
>security reason. The timestamp of last-time
>>  KEY-EXCHANGED is recorded in variable "ses.kexstate.lastkextime". 
>>  The child dropbear process decides the "timeout" parameter of
>"select"
>function by calling "select_timeout". we can see it checks the
>timeout-events like KEX_REKEY_TIMEOUT, AUTH_TIMEOUT,
>>  keepalive_secs. If there is a timeout occurs, the "update_timeout"
>function returns a negative value, then "select_timeout" modifies it to
>ZREO
>by this:
>> /* clamp negative timeouts to zero - event has already triggered */
>>  return MAX(timeout, 0);
>>    if "select_timeout" returns ZERO, the next "select" call (in
>"session_loop") will return immediately. Then it will check timeout
>events
>by this:
>> /* check for auth timeout, rekeying required etc */
>>    checktimeouts();
>>    in the function " checktimeouts ", when it find the timeout is
>reached
>or to many data has been sent, it will send a SSH_MSG_KEXINIT message
>to
>peer. Normally this message will trigger a new KEY-EXCHANGE. However,
>>  when there is a network problem that the peer can't receive the
>message ,
>this bug occurs: the timestamp ses.kexstate.lastkextime is only updated
>by
>calling  "switch_keys"-->" kexinitialise ", unfortunately this calling
>sequence is driven by ssh-messages,
>>  either SSH_MSG_KEXDH_INIT or SSH_MSG_NEWKEYS. When there is no
>ssh-message received , the child dropbear process enters dead-loop
>"select"
>with ZERO-timeout parameter caused by KEX_REKEY_TIMEOUT.
>> >  So there is a very simple way to reproduce this bug: first
>define
>the KEX_REKEY_TIMEOUT as small as possible( I set it to 8 seconds),
>then
>start a ssh-session , the child dropbear process is forked. then plug
>>  out the network wire, after 8 seconds the child dropbear thread will
>occupy 100% CPU. Could you kindly check it? thanks.
>>  
>
>Any feedback regarding this reported issue?
>
>Thanks,
>Thomas



Re: a bug detected in dropbear v071

2016-04-25 Thread Thomas De Schampheleire
ZHANG Hui P  alcatel-sbell.com.cn> writes:

> 
> 
> 
> Hi:
>  I am a software engineer of Alcatel-Lucent. In our product we use
dropbear v071 under the OS: Linux version 3.4.24. At most time it works
perfectly, but recently we got a problem: sometimes a child-process of
>  dropbear occupied nearly 100% CPU (we use ARM1176, single-core). After I
investigated it ,I found it is cause by a misuse of KEX_REKEY_TIMEOUT.
> KEX_REKEY_TIMEOUT is defined as 8hours. that means when a session lasts
more than 8 hours, the server and client will re-exchange their KEY for
security reason. The timestamp of last-time
>  KEY-EXCHANGED is recorded in variable "ses.kexstate.lastkextime". 
>  The child dropbear process decides the "timeout" parameter of "select"
function by calling "select_timeout". we can see it checks the
timeout-events like KEX_REKEY_TIMEOUT, AUTH_TIMEOUT,
>  keepalive_secs. If there is a timeout occurs, the "update_timeout"
function returns a negative value, then "select_timeout" modifies it to ZREO
by this:
> /* clamp negative timeouts to zero - event has already triggered */
>  return MAX(timeout, 0);
>    if "select_timeout" returns ZERO, the next "select" call (in
"session_loop") will return immediately. Then it will check timeout events
by this:
> /* check for auth timeout, rekeying required etc */
>    checktimeouts();
>    in the function " checktimeouts ", when it find the timeout is reached
or to many data has been sent, it will send a SSH_MSG_KEXINIT message to
peer. Normally this message will trigger a new KEY-EXCHANGE. However,
>  when there is a network problem that the peer can't receive the message ,
this bug occurs: the timestamp ses.kexstate.lastkextime is only updated by
calling  "switch_keys"-->" kexinitialise ", unfortunately this calling
sequence is driven by ssh-messages,
>  either SSH_MSG_KEXDH_INIT or SSH_MSG_NEWKEYS. When there is no
ssh-message received , the child dropbear process enters dead-loop "select"
with ZERO-timeout parameter caused by KEX_REKEY_TIMEOUT.
> >  So there is a very simple way to reproduce this bug: first define
the KEX_REKEY_TIMEOUT as small as possible( I set it to 8 seconds), then
start a ssh-session , the child dropbear process is forked. then plug
>  out the network wire, after 8 seconds the child dropbear thread will
occupy 100% CPU. Could you kindly check it? thanks.
>  

Any feedback regarding this reported issue?

Thanks,
Thomas