Re: embedded dropbear (more)...

2013-04-16 Thread Ed Sutter

Fabrizio,
Don't ignore CPU horsepower needs.
Ed

Hmm interesting... now, 77K is kind of 'at reach'...
Depending on the chip I am going to finalize the project, but probably 
with some help from some external RAM  flash I might give it a shot.

Thanks a lot for your reports!
Fabrizio


On Mon, Apr 15, 2013 at 5:30 PM, Ed Sutter 
ed.sut...@alcatel-lucent.com mailto:ed.sut...@alcatel-lucent.com 
wrote:


One correction...
I realized that I reported my high-water mark with my allocator
in 'trace' mode.  This significantly screws up the allocation sizes in
runtime.  After rebuilding with that turned off, the high-water mark
that I get is around 77K.

Hi,
Just to put a few things in perspective regarding the
likelihood of this
working in a really small embedded system...

Regarding memory...
It really depends on just how small you need to be...

One session looks like it uses upwards of 2100 malloc calls.
Long term
fragmentation from one session to the next is not an issue
simply because I
have a dedicated heap, which I flush at the end of each
session; however
my heap analytics show that the high-water level is under 200K
of heap.
I'm hoping that some of these allocations can be replaced with
stack-based arrays,
but I haven't looked into that much yet.

Regarding speed...
No real data here, other than to say that I'm on a ~450Mhz
PowerPC (no FPU)
and it seems to be fine.


Ed








Re: embedded dropbear...

2013-04-16 Thread Matt Johnston
Hi,

I'm pretty sure there'd be interest in such a port, even if
there are no immediate takers. I guess it depends how much
effort you want to put in - a separate tarball (or hg
branch ease of merging future versions) might be enough for other
people to get going. It doesn't sound like the changes would
be _too_ intrusive, so could probably live in the main tree.
One concern would be avoiding it breaking from other changes
- would it be easy enough to build the embedded variant
targetting a normal Linux-type platform?

A few comments inline below.

   no fork(), no exec(), no pipes, etc..  This includes no use of
 fprintf(...) as well.

Missing fprintf() might make the code a bit messier - did
you encounter other uses than for logging/error messages?

 - I loosely base this on the no-inetd option; and I
 heavily chopped away at things in options.h (hopefully
 without breaking anything).
 - Since there is no shell, this simply hooks to an internal command
 line processor.
 - Currently the server is built to run as if the following command
 line were invoked:
   ./dropbear -s -F -b yada yada -r dropbear_rsa_host_key
   and since I do have an FS, I created the dropbear_rsa_host_key
 file using dropbearkey on my host machine, and simply
 copied it to my embedded system's FS for now.  The need
 for the FS could easily be eliminated.

A good source of random values is pretty important for
SSH security. If there are say 16 bytes of good random
values written at manufacturing, that could be read in as
input then saved out at Dropbear startup and occassionally
during operation (reusing the same seed twice would be very
bad). The write_urandom() would work for writing a value
back. A flash write per boot isn't great, but hard to see a
better way without random number generation hardware.

 DETAILS:
 My build puts the two math directories into a library, and
 then builds the server using portions of ~25 of the ~65 .c
 files that are in the main dropbear directory.

Did you have to change much in the libtom libraries? I'm
planning to merge in tomsfastmath support (using the ltc_mp
descriptor to keep libtommath working as a fallback), that
might help performance as well by reducing malloc()s.

 I simulate interaction with a shell by intercepting incoming
 characters in common_recv_msg_channel_data().  Each line
 of text is simply passed to a command line parser.  While
 that command line is being processed, all output from that
 embedded command is sent through the function
 ssh_putchar():
 
 static void
 ssh_putcharc(struct Channel *channel,char c)
 {
 CHECKCLEARTOWRITE();
 buf_putbyte(ses.writepayload, SSH_MSG_CHANNEL_DATA);
 buf_putint(ses.writepayload, channel-remotechan);
 buf_putint(ses.writepayload, 1);
 buf_putbyte(ses.writepayload, c);
 encrypt_packet();
 }

You may as well write out at least 12 bytes at once (I
think), since encrypt_packet pads out to MIN_PACKET_LEN (=16) 
with at least 4 bytes of padding.

 and one other important thing...
 At the bottom of encrypt_packet(), I call write_packet() so that the data
 is immediately pushed out the socket.

That sounds fine.

 SUMMARY:
 Thats about it in the nutshell.  The two big gotchas with this were
 issues that
 would not necessarily be important in a process-based environment:
 
 1. The use of dropbear_exit() for errors requires the use of
 setjmp/longjmp because
its in a thread that needs to cleanup properly.
 2. The heap is clean when exits are clean; but things get messy in a lot of
exception cases; hence, the need for a dropbear-specific heap which
allows me to force a clean heap when the session ends (simulating
 the cleanup that
is automatically done when the process exits).

It'd need a close look over releasing any resources other
than memory allocations, though there probably aren't many
things. libtom* might make use of some static variables.
Dropbear has a small number that can be fixed.

Cheers,
Matt


Re: embedded dropbear...

2013-04-16 Thread Ed Sutter

Matt,
Answers embedded...
Ed

Hi,

I'm pretty sure there'd be interest in such a port, even if
there are no immediate takers. I guess it depends how much
effort you want to put in - a separate tarball (or hg
branch ease of merging future versions) might be enough for other
people to get going. It doesn't sound like the changes would
be _too_ intrusive, so could probably live in the main tree.
One concern would be avoiding it breaking from other changes
- would it be easy enough to build the embedded variant
targetting a normal Linux-type platform?
Yea, that shouldn't be too hard to do.  The first level of change would 
be to

use threads instead of fork, and also not spawn a shell, just interface to a
dumb command interpreter that users could replace with something
project specific.  My plan is to use this in my uCon tool, so I'll have to
go down this path anyway, so maybe that would be the first step toward
integrating.
A few comments inline below.

   no fork(), no exec(), no pipes, etc..  This includes no use of
fprintf(...) as well.

Missing fprintf() might make the code a bit messier - did
you encounter other uses than for logging/error messages?

No, mostly just error messages...



- I loosely base this on the no-inetd option; and I
heavily chopped away at things in options.h (hopefully
without breaking anything).
- Since there is no shell, this simply hooks to an internal command
line processor.
- Currently the server is built to run as if the following command
line were invoked:
   ./dropbear -s -F -b yada yada -r dropbear_rsa_host_key
   and since I do have an FS, I created the dropbear_rsa_host_key
file using dropbearkey on my host machine, and simply
copied it to my embedded system's FS for now.  The need
for the FS could easily be eliminated.

A good source of random values is pretty important for
SSH security. If there are say 16 bytes of good random
values written at manufacturing, that could be read in as
input then saved out at Dropbear startup and occassionally
during operation (reusing the same seed twice would be very
bad). The write_urandom() would work for writing a value
back. A flash write per boot isn't great, but hard to see a
better way without random number generation hardware.

Yea I have a few system-specific ways to get a reasonably random value
out of my hardware, so that's not a problem; however, its also not portable.

DETAILS:
My build puts the two math directories into a library, and
then builds the server using portions of ~25 of the ~65 .c
files that are in the main dropbear directory.

Did you have to change much in the libtom libraries? I'm
planning to merge in tomsfastmath support (using the ltc_mp
descriptor to keep libtommath working as a fallback), that
might help performance as well by reducing malloc()s.

I don't think I changed anything there except for the malloc defines.
For all the dropbear code (I include libtom stuff in this) I replaced 
all uses

of malloc (m_malloc, malloc, XMALLOC) with DB_MALLOC (same applies
to calloc/realloc/free) throughout.  I had to do this so that I could easily
redefine all calls to malloc to pass __FILE__ and __LINE__ for debugging.
The point here is that it would be nice if ALL use of malloc used the same
name.




I simulate interaction with a shell by intercepting incoming
characters in common_recv_msg_channel_data().  Each line
of text is simply passed to a command line parser.  While
that command line is being processed, all output from that
embedded command is sent through the function
ssh_putchar():

static void
ssh_putcharc(struct Channel *channel,char c)
{
 CHECKCLEARTOWRITE();
 buf_putbyte(ses.writepayload, SSH_MSG_CHANNEL_DATA);
 buf_putint(ses.writepayload, channel-remotechan);
 buf_putint(ses.writepayload, 1);
 buf_putbyte(ses.writepayload, c);
 encrypt_packet();
}

You may as well write out at least 12 bytes at once (I
think), since encrypt_packet pads out to MIN_PACKET_LEN (=16)
with at least 4 bytes of padding.

Yea, understood, I buffer up when I know I can.  This is just worst case.



and one other important thing...
At the bottom of encrypt_packet(), I call write_packet() so that the data
is immediately pushed out the socket.

That sounds fine.


SUMMARY:
Thats about it in the nutshell.  The two big gotchas with this were
issues that
would not necessarily be important in a process-based environment:

1. The use of dropbear_exit() for errors requires the use of
setjmp/longjmp because
its in a thread that needs to cleanup properly.
2. The heap is clean when exits are clean; but things get messy in a lot of
exception cases; hence, the need for a dropbear-specific heap which
allows me to force a clean heap when the session ends (simulating
the cleanup that
is automatically done when the process exits).

It'd need a close look over releasing any resources other
than memory allocations, though there probably aren't many
things. libtom* might make use of some static 

Re: embedded dropbear (more)...

2013-04-15 Thread Ed Sutter

One correction...
I realized that I reported my high-water mark with my allocator
in 'trace' mode.  This significantly screws up the allocation sizes in
runtime.  After rebuilding with that turned off, the high-water mark
that I get is around 77K.

Hi,
Just to put a few things in perspective regarding the likelihood of this
working in a really small embedded system...

Regarding memory...
It really depends on just how small you need to be...

One session looks like it uses upwards of 2100 malloc calls. Long term
fragmentation from one session to the next is not an issue simply 
because I

have a dedicated heap, which I flush at the end of each session; however
my heap analytics show that the high-water level is under 200K of heap.
I'm hoping that some of these allocations can be replaced with 
stack-based arrays,

but I haven't looked into that much yet.

Regarding speed...
No real data here, other than to say that I'm on a ~450Mhz PowerPC 
(no FPU)

and it seems to be fine.


Ed






Re: embedded dropbear

2013-04-12 Thread Ed Sutter

Great explanation Rob,
Thanks much..
Ed

On 04/11/2013 04:56:54 PM, Ed Sutter wrote:

Hi,
I managed to get dropbear-ssh running under a uC/OS-II thread.
Obviously had to do a lot of hacking to make this work, and
I'm sure its not the most efficient way of doing it.

Not being an ssh/cryptography wizard by any stretch of the
imagination, I have two questions that may be trivial...

1..
Because I'm not on a Unix-ish system, I don't have any of
the /proc stuff or /dev/urandom for seedrandom().  Is it
essential that this function have *that* much random
input?  How does this affect the security of the connection?


If you can predict the random seed, you can decrypt the entire 
connection. All the other cryptography is based on exchanging 
unguessable numbers in both directions.


Public key cryptography does a one-way mathematical trick on a really 
big number to split it into two smaller numbers, so that each one is 
the antidote to the other's poison. You scramble a message with one, 
you need the OTHER to unscramble it. (You can't undo it with the one 
that created it, that's the clever bit.)


You keep one of this pair of numbers secret (doesn't matter which, 
they're symmetrical) as your private key and give the other out as 
your public key. Anybody can use your public key to send you a message 
which only you can read with your private key. And anyone can read 
messages you send with your private key but only you could have sent 
them. So in one direction it provides authentication, in the other it 
provides privacy.


When you want a bidirectional connection providing both, each side 
produces a pair of of keys (four keys total), then exchanges one and 
keeps the other. Then you encrypt each packet TWICE, with your private 
key and with the other guy's public key. At the other end they decrypt 
with your public key (so the message could only have been created with 
your private key, so it came from you) and their own private key (so 
only they can read it). Doesn't matter what order the two 
encryption/decryptions occur in as long as both sides agree.


Public key cryptography is really computationally expensive (I.E. 
slow) so what they do is exchange symmetrical keys with it, which are 
another unguessable secret number that's much faster to use, but which 
requires both sides to know the _same_ unguessable secret. (The poison 
is its own antidote, the key that encrypted is also the key that 
undoes it. Simpler/faster math that way, but it means you need an 
established relationship to use it.)


The rest of the connection is then encrypted with the symmetric keys. 
(Well, it generates and exchanges fresh symmetric keys every once in a 
while so that listeners won't have TOO much of the same kind of data 
to try various clever attacks with to guess that key.) The public key 
cryptography is just used to establish and verify the connection at 
the start.



2..
I essentially hard-coded the -r option (ssh server) to use a
pre-established rsa_host_key file.  Should this file be built
once for a given system, and then reused or is this something
that should be recreated each time the server is started?


The host key uniquely identifies the host. It's what gives you the 
host key has changed! warning when somebody reinstalls the server.


Otherwise, anybody could intercept the connection, insert their own 
ssh server, have you log into it, forward the credentials use use to 
the other server to log into that, and pass data through in both 
directions while logging all of it. This is called a man in the 
middle attack, and you prevent it by giving each server a unique way 
to identify itself that only itknows. (Basically you encrypt a packet 
at it using its public key, and it decrypts it using its private key 
and sends back the correct response based on its contents.)


And that's cryptography 101. :)

Rob




Re: embedded dropbear

2013-04-11 Thread Rob Landley

On 04/11/2013 04:56:54 PM, Ed Sutter wrote:

Hi,
I managed to get dropbear-ssh running under a uC/OS-II thread.
Obviously had to do a lot of hacking to make this work, and
I'm sure its not the most efficient way of doing it.

Not being an ssh/cryptography wizard by any stretch of the
imagination, I have two questions that may be trivial...

1..
Because I'm not on a Unix-ish system, I don't have any of
the /proc stuff or /dev/urandom for seedrandom().  Is it
essential that this function have *that* much random
input?  How does this affect the security of the connection?


If you can predict the random seed, you can decrypt the entire  
connection. All the other cryptography is based on exchanging  
unguessable numbers in both directions.


Public key cryptography does a one-way mathematical trick on a really  
big number to split it into two smaller numbers, so that each one is  
the antidote to the other's poison. You scramble a message with one,  
you need the OTHER to unscramble it. (You can't undo it with the one  
that created it, that's the clever bit.)


You keep one of this pair of numbers secret (doesn't matter which,  
they're symmetrical) as your private key and give the other out as your  
public key. Anybody can use your public key to send you a message which  
only you can read with your private key. And anyone can read messages  
you send with your private key but only you could have sent them. So in  
one direction it provides authentication, in the other it provides  
privacy.


When you want a bidirectional connection providing both, each side  
produces a pair of of keys (four keys total), then exchanges one and  
keeps the other. Then you encrypt each packet TWICE, with your private  
key and with the other guy's public key. At the other end they decrypt  
with your public key (so the message could only have been created with  
your private key, so it came from you) and their own private key (so  
only they can read it). Doesn't matter what order the two  
encryption/decryptions occur in as long as both sides agree.


Public key cryptography is really computationally expensive (I.E. slow)  
so what they do is exchange symmetrical keys with it, which are another  
unguessable secret number that's much faster to use, but which requires  
both sides to know the _same_ unguessable secret. (The poison is its  
own antidote, the key that encrypted is also the key that undoes it.  
Simpler/faster math that way, but it means you need an established  
relationship to use it.)


The rest of the connection is then encrypted with the symmetric keys.  
(Well, it generates and exchanges fresh symmetric keys every once in a  
while so that listeners won't have TOO much of the same kind of data to  
try various clever attacks with to guess that key.) The public key  
cryptography is just used to establish and verify the connection at the  
start.



2..
I essentially hard-coded the -r option (ssh server) to use a
pre-established rsa_host_key file.  Should this file be built
once for a given system, and then reused or is this something
that should be recreated each time the server is started?


The host key uniquely identifies the host. It's what gives you the  
host key has changed! warning when somebody reinstalls the server.


Otherwise, anybody could intercept the connection, insert their own ssh  
server, have you log into it, forward the credentials use use to the  
other server to log into that, and pass data through in both directions  
while logging all of it. This is called a man in the middle attack,  
and you prevent it by giving each server a unique way to identify  
itself that only itknows. (Basically you encrypt a packet at it using  
its public key, and it decrypts it using its private key and sends back  
the correct response based on its contents.)


And that's cryptography 101. :)

Rob