Re: embedded dropbear (more)...
Fabrizio, Don't ignore CPU horsepower needs. Ed Hmm interesting... now, 77K is kind of 'at reach'... Depending on the chip I am going to finalize the project, but probably with some help from some external RAM flash I might give it a shot. Thanks a lot for your reports! Fabrizio On Mon, Apr 15, 2013 at 5:30 PM, Ed Sutter ed.sut...@alcatel-lucent.com mailto:ed.sut...@alcatel-lucent.com wrote: One correction... I realized that I reported my high-water mark with my allocator in 'trace' mode. This significantly screws up the allocation sizes in runtime. After rebuilding with that turned off, the high-water mark that I get is around 77K. Hi, Just to put a few things in perspective regarding the likelihood of this working in a really small embedded system... Regarding memory... It really depends on just how small you need to be... One session looks like it uses upwards of 2100 malloc calls. Long term fragmentation from one session to the next is not an issue simply because I have a dedicated heap, which I flush at the end of each session; however my heap analytics show that the high-water level is under 200K of heap. I'm hoping that some of these allocations can be replaced with stack-based arrays, but I haven't looked into that much yet. Regarding speed... No real data here, other than to say that I'm on a ~450Mhz PowerPC (no FPU) and it seems to be fine. Ed
Re: embedded dropbear...
Hi, I'm pretty sure there'd be interest in such a port, even if there are no immediate takers. I guess it depends how much effort you want to put in - a separate tarball (or hg branch ease of merging future versions) might be enough for other people to get going. It doesn't sound like the changes would be _too_ intrusive, so could probably live in the main tree. One concern would be avoiding it breaking from other changes - would it be easy enough to build the embedded variant targetting a normal Linux-type platform? A few comments inline below. no fork(), no exec(), no pipes, etc.. This includes no use of fprintf(...) as well. Missing fprintf() might make the code a bit messier - did you encounter other uses than for logging/error messages? - I loosely base this on the no-inetd option; and I heavily chopped away at things in options.h (hopefully without breaking anything). - Since there is no shell, this simply hooks to an internal command line processor. - Currently the server is built to run as if the following command line were invoked: ./dropbear -s -F -b yada yada -r dropbear_rsa_host_key and since I do have an FS, I created the dropbear_rsa_host_key file using dropbearkey on my host machine, and simply copied it to my embedded system's FS for now. The need for the FS could easily be eliminated. A good source of random values is pretty important for SSH security. If there are say 16 bytes of good random values written at manufacturing, that could be read in as input then saved out at Dropbear startup and occassionally during operation (reusing the same seed twice would be very bad). The write_urandom() would work for writing a value back. A flash write per boot isn't great, but hard to see a better way without random number generation hardware. DETAILS: My build puts the two math directories into a library, and then builds the server using portions of ~25 of the ~65 .c files that are in the main dropbear directory. Did you have to change much in the libtom libraries? I'm planning to merge in tomsfastmath support (using the ltc_mp descriptor to keep libtommath working as a fallback), that might help performance as well by reducing malloc()s. I simulate interaction with a shell by intercepting incoming characters in common_recv_msg_channel_data(). Each line of text is simply passed to a command line parser. While that command line is being processed, all output from that embedded command is sent through the function ssh_putchar(): static void ssh_putcharc(struct Channel *channel,char c) { CHECKCLEARTOWRITE(); buf_putbyte(ses.writepayload, SSH_MSG_CHANNEL_DATA); buf_putint(ses.writepayload, channel-remotechan); buf_putint(ses.writepayload, 1); buf_putbyte(ses.writepayload, c); encrypt_packet(); } You may as well write out at least 12 bytes at once (I think), since encrypt_packet pads out to MIN_PACKET_LEN (=16) with at least 4 bytes of padding. and one other important thing... At the bottom of encrypt_packet(), I call write_packet() so that the data is immediately pushed out the socket. That sounds fine. SUMMARY: Thats about it in the nutshell. The two big gotchas with this were issues that would not necessarily be important in a process-based environment: 1. The use of dropbear_exit() for errors requires the use of setjmp/longjmp because its in a thread that needs to cleanup properly. 2. The heap is clean when exits are clean; but things get messy in a lot of exception cases; hence, the need for a dropbear-specific heap which allows me to force a clean heap when the session ends (simulating the cleanup that is automatically done when the process exits). It'd need a close look over releasing any resources other than memory allocations, though there probably aren't many things. libtom* might make use of some static variables. Dropbear has a small number that can be fixed. Cheers, Matt
Re: embedded dropbear...
Matt, Answers embedded... Ed Hi, I'm pretty sure there'd be interest in such a port, even if there are no immediate takers. I guess it depends how much effort you want to put in - a separate tarball (or hg branch ease of merging future versions) might be enough for other people to get going. It doesn't sound like the changes would be _too_ intrusive, so could probably live in the main tree. One concern would be avoiding it breaking from other changes - would it be easy enough to build the embedded variant targetting a normal Linux-type platform? Yea, that shouldn't be too hard to do. The first level of change would be to use threads instead of fork, and also not spawn a shell, just interface to a dumb command interpreter that users could replace with something project specific. My plan is to use this in my uCon tool, so I'll have to go down this path anyway, so maybe that would be the first step toward integrating. A few comments inline below. no fork(), no exec(), no pipes, etc.. This includes no use of fprintf(...) as well. Missing fprintf() might make the code a bit messier - did you encounter other uses than for logging/error messages? No, mostly just error messages... - I loosely base this on the no-inetd option; and I heavily chopped away at things in options.h (hopefully without breaking anything). - Since there is no shell, this simply hooks to an internal command line processor. - Currently the server is built to run as if the following command line were invoked: ./dropbear -s -F -b yada yada -r dropbear_rsa_host_key and since I do have an FS, I created the dropbear_rsa_host_key file using dropbearkey on my host machine, and simply copied it to my embedded system's FS for now. The need for the FS could easily be eliminated. A good source of random values is pretty important for SSH security. If there are say 16 bytes of good random values written at manufacturing, that could be read in as input then saved out at Dropbear startup and occassionally during operation (reusing the same seed twice would be very bad). The write_urandom() would work for writing a value back. A flash write per boot isn't great, but hard to see a better way without random number generation hardware. Yea I have a few system-specific ways to get a reasonably random value out of my hardware, so that's not a problem; however, its also not portable. DETAILS: My build puts the two math directories into a library, and then builds the server using portions of ~25 of the ~65 .c files that are in the main dropbear directory. Did you have to change much in the libtom libraries? I'm planning to merge in tomsfastmath support (using the ltc_mp descriptor to keep libtommath working as a fallback), that might help performance as well by reducing malloc()s. I don't think I changed anything there except for the malloc defines. For all the dropbear code (I include libtom stuff in this) I replaced all uses of malloc (m_malloc, malloc, XMALLOC) with DB_MALLOC (same applies to calloc/realloc/free) throughout. I had to do this so that I could easily redefine all calls to malloc to pass __FILE__ and __LINE__ for debugging. The point here is that it would be nice if ALL use of malloc used the same name. I simulate interaction with a shell by intercepting incoming characters in common_recv_msg_channel_data(). Each line of text is simply passed to a command line parser. While that command line is being processed, all output from that embedded command is sent through the function ssh_putchar(): static void ssh_putcharc(struct Channel *channel,char c) { CHECKCLEARTOWRITE(); buf_putbyte(ses.writepayload, SSH_MSG_CHANNEL_DATA); buf_putint(ses.writepayload, channel-remotechan); buf_putint(ses.writepayload, 1); buf_putbyte(ses.writepayload, c); encrypt_packet(); } You may as well write out at least 12 bytes at once (I think), since encrypt_packet pads out to MIN_PACKET_LEN (=16) with at least 4 bytes of padding. Yea, understood, I buffer up when I know I can. This is just worst case. and one other important thing... At the bottom of encrypt_packet(), I call write_packet() so that the data is immediately pushed out the socket. That sounds fine. SUMMARY: Thats about it in the nutshell. The two big gotchas with this were issues that would not necessarily be important in a process-based environment: 1. The use of dropbear_exit() for errors requires the use of setjmp/longjmp because its in a thread that needs to cleanup properly. 2. The heap is clean when exits are clean; but things get messy in a lot of exception cases; hence, the need for a dropbear-specific heap which allows me to force a clean heap when the session ends (simulating the cleanup that is automatically done when the process exits). It'd need a close look over releasing any resources other than memory allocations, though there probably aren't many things. libtom* might make use of some static
Re: embedded dropbear (more)...
One correction... I realized that I reported my high-water mark with my allocator in 'trace' mode. This significantly screws up the allocation sizes in runtime. After rebuilding with that turned off, the high-water mark that I get is around 77K. Hi, Just to put a few things in perspective regarding the likelihood of this working in a really small embedded system... Regarding memory... It really depends on just how small you need to be... One session looks like it uses upwards of 2100 malloc calls. Long term fragmentation from one session to the next is not an issue simply because I have a dedicated heap, which I flush at the end of each session; however my heap analytics show that the high-water level is under 200K of heap. I'm hoping that some of these allocations can be replaced with stack-based arrays, but I haven't looked into that much yet. Regarding speed... No real data here, other than to say that I'm on a ~450Mhz PowerPC (no FPU) and it seems to be fine. Ed
Re: embedded dropbear
Great explanation Rob, Thanks much.. Ed On 04/11/2013 04:56:54 PM, Ed Sutter wrote: Hi, I managed to get dropbear-ssh running under a uC/OS-II thread. Obviously had to do a lot of hacking to make this work, and I'm sure its not the most efficient way of doing it. Not being an ssh/cryptography wizard by any stretch of the imagination, I have two questions that may be trivial... 1.. Because I'm not on a Unix-ish system, I don't have any of the /proc stuff or /dev/urandom for seedrandom(). Is it essential that this function have *that* much random input? How does this affect the security of the connection? If you can predict the random seed, you can decrypt the entire connection. All the other cryptography is based on exchanging unguessable numbers in both directions. Public key cryptography does a one-way mathematical trick on a really big number to split it into two smaller numbers, so that each one is the antidote to the other's poison. You scramble a message with one, you need the OTHER to unscramble it. (You can't undo it with the one that created it, that's the clever bit.) You keep one of this pair of numbers secret (doesn't matter which, they're symmetrical) as your private key and give the other out as your public key. Anybody can use your public key to send you a message which only you can read with your private key. And anyone can read messages you send with your private key but only you could have sent them. So in one direction it provides authentication, in the other it provides privacy. When you want a bidirectional connection providing both, each side produces a pair of of keys (four keys total), then exchanges one and keeps the other. Then you encrypt each packet TWICE, with your private key and with the other guy's public key. At the other end they decrypt with your public key (so the message could only have been created with your private key, so it came from you) and their own private key (so only they can read it). Doesn't matter what order the two encryption/decryptions occur in as long as both sides agree. Public key cryptography is really computationally expensive (I.E. slow) so what they do is exchange symmetrical keys with it, which are another unguessable secret number that's much faster to use, but which requires both sides to know the _same_ unguessable secret. (The poison is its own antidote, the key that encrypted is also the key that undoes it. Simpler/faster math that way, but it means you need an established relationship to use it.) The rest of the connection is then encrypted with the symmetric keys. (Well, it generates and exchanges fresh symmetric keys every once in a while so that listeners won't have TOO much of the same kind of data to try various clever attacks with to guess that key.) The public key cryptography is just used to establish and verify the connection at the start. 2.. I essentially hard-coded the -r option (ssh server) to use a pre-established rsa_host_key file. Should this file be built once for a given system, and then reused or is this something that should be recreated each time the server is started? The host key uniquely identifies the host. It's what gives you the host key has changed! warning when somebody reinstalls the server. Otherwise, anybody could intercept the connection, insert their own ssh server, have you log into it, forward the credentials use use to the other server to log into that, and pass data through in both directions while logging all of it. This is called a man in the middle attack, and you prevent it by giving each server a unique way to identify itself that only itknows. (Basically you encrypt a packet at it using its public key, and it decrypts it using its private key and sends back the correct response based on its contents.) And that's cryptography 101. :) Rob
Re: embedded dropbear
On 04/11/2013 04:56:54 PM, Ed Sutter wrote: Hi, I managed to get dropbear-ssh running under a uC/OS-II thread. Obviously had to do a lot of hacking to make this work, and I'm sure its not the most efficient way of doing it. Not being an ssh/cryptography wizard by any stretch of the imagination, I have two questions that may be trivial... 1.. Because I'm not on a Unix-ish system, I don't have any of the /proc stuff or /dev/urandom for seedrandom(). Is it essential that this function have *that* much random input? How does this affect the security of the connection? If you can predict the random seed, you can decrypt the entire connection. All the other cryptography is based on exchanging unguessable numbers in both directions. Public key cryptography does a one-way mathematical trick on a really big number to split it into two smaller numbers, so that each one is the antidote to the other's poison. You scramble a message with one, you need the OTHER to unscramble it. (You can't undo it with the one that created it, that's the clever bit.) You keep one of this pair of numbers secret (doesn't matter which, they're symmetrical) as your private key and give the other out as your public key. Anybody can use your public key to send you a message which only you can read with your private key. And anyone can read messages you send with your private key but only you could have sent them. So in one direction it provides authentication, in the other it provides privacy. When you want a bidirectional connection providing both, each side produces a pair of of keys (four keys total), then exchanges one and keeps the other. Then you encrypt each packet TWICE, with your private key and with the other guy's public key. At the other end they decrypt with your public key (so the message could only have been created with your private key, so it came from you) and their own private key (so only they can read it). Doesn't matter what order the two encryption/decryptions occur in as long as both sides agree. Public key cryptography is really computationally expensive (I.E. slow) so what they do is exchange symmetrical keys with it, which are another unguessable secret number that's much faster to use, but which requires both sides to know the _same_ unguessable secret. (The poison is its own antidote, the key that encrypted is also the key that undoes it. Simpler/faster math that way, but it means you need an established relationship to use it.) The rest of the connection is then encrypted with the symmetric keys. (Well, it generates and exchanges fresh symmetric keys every once in a while so that listeners won't have TOO much of the same kind of data to try various clever attacks with to guess that key.) The public key cryptography is just used to establish and verify the connection at the start. 2.. I essentially hard-coded the -r option (ssh server) to use a pre-established rsa_host_key file. Should this file be built once for a given system, and then reused or is this something that should be recreated each time the server is started? The host key uniquely identifies the host. It's what gives you the host key has changed! warning when somebody reinstalls the server. Otherwise, anybody could intercept the connection, insert their own ssh server, have you log into it, forward the credentials use use to the other server to log into that, and pass data through in both directions while logging all of it. This is called a man in the middle attack, and you prevent it by giving each server a unique way to identify itself that only itknows. (Basically you encrypt a packet at it using its public key, and it decrypts it using its private key and sends back the correct response based on its contents.) And that's cryptography 101. :) Rob