Re: slow logins -- some data for comparison

2013-06-01 Thread Matt Johnston
Hi William,

That's puzzling. I wonder if the hotspot is different on the Microblaze device 
versus what you see with valgrind - googling I came across a setup that sounds 
similar to yours [1], they thought the SDRAM interface was the bottleneck. Do 
you know what kind of memory setup your CPU has? Perhaps OpenSSL is better at 
keeping everything in registers, or something like that. It could be worth 
asking on the libtom google group if anyone has ideas?

[1] bottom of 
https://groups.google.com/forum/#!msg/sci.crypt/3pg1ngSaQpc/Tr-gcbqxfvEJ

I'd be keen to see your modifications to integrate them into the main tree - 
for more normal slow CPUs I think tomsfastmath will be worthwhile.

Cheers,
Matt



On Sat 1/6/2013, at 5:15 am, William Welch bvwe...@gmail.com wrote:

 Greetings,
 
 I completed a first cut of the fast math version -- sadly, it did not help 
 much.  I went back to my linux PC and did some testing with 
 valgrind/callgrind -- and, according to callgrind, for both the tommath and 
 the fast math versions, the hot spots are in tight loops inside of 
 fp_montgomery_reduce and fast_mp_montgomery_reduce. The 2nd hot spot is in 
 fast_s_mp_sqr and the similar routine in fast math.
 
 I am pondering my next step -- I guess I will study those inner loops and see 
 why they are so slow on the microblaze -- but I am a bit confused by the fact 
 that openssh / sshd completes in 5 to 10 seconds on the same system...
 
 If there is interest in these modifications to dropbox for fast math, let me 
 know and I will send them or put them online.  The changes are pretty clean 
 -- just a couple of things as mentioned previously.
 
 Suggestions welcome!
 
 William
 
 
 
 On Sat, May 25, 2013 at 10:19 AM, Matt Johnston m...@ucc.asn.au wrote:
 I'd start from 2013.58. It's mostly a case of search/replace
 of function calls, though mp_init is a bit different I
 think (it allocates whereas the straight libtommath version
 doesn't?). Take a look at
 https://secure.ucc.asn.au/hg/dropbear/file/75509065db53/ecdsa.c#l169
 - the ltc_mp variable is set up at
 https://secure.ucc.asn.au/hg/dropbear/file/75509065db53/crypto_desc.c#l71
 so it could be set to tfm_desc instead.
 
 Tomsfastmath 0.12 would be best from libtom.org
 
 Cheers,
 Matt
 
 
 
 On Sat, May 25, 2013 at 10:01:16AM -0500, William Welch wrote:
  Thank you for your reply.
 
  If I were to attempt to add support for tomsfastmath, using ltc_mp as you
  described, which version of dropbear should I start from?  And where should
  I obtain the tomsfastmath library?
 
  Thank you,
 
  William
 
 
 
  On Sat, May 25, 2013 at 3:41 AM, Matt Johnston m...@ucc.asn.au wrote:
 
   Hi,
  
   I think the solution is to use tomsfastmath instead. There was a patched
   version posted a while ago on this list. Eventually I'd like to have
   Dropbear able to build against either tomsfastmath (for speed) or
   libtommath (for portability) using the ltc_mp mechanism in libtomcrypt.
  
   There's also ECC support nearly complete in the 'ecc' mercurial branch.
   That's a few times faster than normal kexdh. It adds around 30kB to binary
   size on x86. That should make it into the next Dropbear release, though
   only will help for recent OpenSSH peers.
  
   Matt
  
  
   William Welch bvwe...@gmail.com wrote:
  
   Greetings,
  
   First -- thank you for dropbear!  I have enjoyed using dropbear on
   various smallish systems for years now!
  
   But I have a problem with a specific system -- admittedly it is rather
   slow -- only 50 BogoMips according to the linux kernel. It is a 
   Microblaze.
  
   I use the Buildroot system for many different routers and other small
   systems here.  I have compared different versions of dropbear, against
   openssh.
  
   My issue is with the server mode -- sshd --  I note that on dropbear 0.52
   (which I happen to run on other routers here), I can connect from my 
   ubuntu
   or mac, to dropbear sshd, in about 45 seconds.  This is having disabled 
   the
   RSA host key, and already generated the DSS host key.   But on more 
   recent
   versions of dropbear, e.g. 2013.58, several minutes elapse without a
   connection.
  
   In contrast, switching to openssh in buildroot, and also disabling the
   RSA host key, connection time is 5 to 10 seconds!  Unfortunately, the
   openssh has a huge 'footprint' in the flash filesystem that I would 
   rather
   avoid.
  
   The issue seems to be in the key exchange ( I can watch this by doing
   'ssh -v ' from my client connection).  Meanwhile, running 'top' on my
   Microblaze shows near 100% cpu used.  the debug message is: expecting
   SSH2_MSG_KEXDH_REPLY
  
   Buildroot has the gnu cross tool chain set to 'optimize for size' in all
   cases.
  
   Suggestions welcome!
  
   thank you,
  
   William
  
  
 



Re: slow logins -- some data for comparison

2013-06-01 Thread William Welch
Hi Matt,

Thank you for your interest.  I put the modified files, here:
https://github.com/bvwelch/dropbear
If you prefer a fork, or other approach, let me know and I will revise the
upload.

I can go back to my microblaze and capture the dropbear race output, and
probably could run gprof on the Microblaze as well.

Since this all started by selecting dropbear versus openssh from the
typical Buildroot, that means that the same toolchain
is used, so my guess is that openssh just has a different algorithm.  By
the way I tried various versions of gcc toolchains.

I did notice that if I disabled group14, I saw about the same login time as
dropbear 0.52 -- 45 seconds, and if I switched
to fastmath, the time dropped to 30 seconds.

But the 'footprint' of dropbear, built with tomsfastmath, is large, since
it has a lot of unrolled loops. It may be about as large
as openssh.

I will check out your other suggestions.

Thank you,

William


On Saturday, June 1, 2013, Matt Johnston wrote:

 Hi William,

 That's puzzling. I wonder if the hotspot is different on the Microblaze
 device versus what you see with valgrind - googling I came across a setup
 that sounds similar to yours [1], they thought the SDRAM interface was the
 bottleneck. Do you know what kind of memory setup your CPU has? Perhaps
 OpenSSL is better at keeping everything in registers, or something like
 that. It could be worth asking on the libtom google group if anyone has
 ideas?

 [1] bottom of
 https://groups.google.com/forum/#!msg/sci.crypt/3pg1ngSaQpc/Tr-gcbqxfvEJ

 I'd be keen to see your modifications to integrate them into the main tree
 - for more normal slow CPUs I think tomsfastmath will be worthwhile.

 Cheers,
 Matt



 On Sat 1/6/2013, at 5:15 am, William Welch bvwe...@gmail.com wrote:

  Greetings,
 
  I completed a first cut of the fast math version -- sadly, it did not
 help much.  I went back to my linux PC and did some testing with
 valgrind/callgrind -- and, according to callgrind, for both the tommath and
 the fast math versions, the hot spots are in tight loops inside of
 fp_montgomery_reduce and fast_mp_montgomery_reduce. The 2nd hot spot is in
 fast_s_mp_sqr and the similar routine in fast math.
 
  I am pondering my next step -- I guess I will study those inner loops
 and see why they are so slow on the microblaze -- but I am a bit confused
 by the fact that openssh / sshd completes in 5 to 10 seconds on the same
 system...
 
  If there is interest in these modifications to dropbox for fast math,
 let me know and I will send them or put them online.  The changes are
 pretty clean -- just a couple of things as mentioned previously.
 
  Suggestions welcome!
 
  William
 
 
 
  On Sat, May 25, 2013 at 10:19 AM, Matt Johnston m...@ucc.asn.au wrote:
  I'd start from 2013.58. It's mostly a case of search/replace
  of function calls, though mp_init is a bit different I
  think (it allocates whereas the straight libtommath version
  doesn't?). Take a look at
  https://secure.ucc.asn.au/hg/dropbear/file/75509065db53/ecdsa.c#l169
  - the ltc_mp variable is set up at
 
 https://secure.ucc.asn.au/hg/dropbear/file/75509065db53/crypto_desc.c#l71
  so it could be set to tfm_desc instead.
 
  Tomsfastmath 0.12 would be best from libtom.org
 
  Cheers,
  Matt
 
 
 
  On Sat, May 25, 2013 at 10:01:16AM -0500, William Welch wrote:
   Thank you for your reply.
  
   If I were to attempt to add support for tomsfastmath, using ltc_mp as
 you
   described, which version of dropbear should I start from?  And where
 should
   I obtain the tomsfastmath library?
  
   Thank you,
  
   William
  
  
  
   On Sat, May 25, 2013 at 3:41 AM, Matt Johnston m...@ucc.asn.au
 wrote:
  
Hi,
   
I think the solution is to use tomsfastmath instead. There was a
 patched
version posted a while ago on this list. Eventually I'd like to have
Dropbear able to build against either tomsfastmath (for speed) or
libtommath (for portability) using the ltc_mp mechanism in
 libtomcrypt.
   
There's also ECC support nearly complete in the 'ecc' mercurial
 branch.
That's a few times faster than normal kexdh. It adds around 30kB to
 binary
size on x86. That should make it into the next Dropbear release,
 though
only will help for recent OpenSSH peers.
   
Matt
   
   
William Welch bvwe...@gmail.com wrote:
   
Greetings,
   
First -- thank you for dropbear!  I have enjoyed using dropbear on
various smallish systems for years now!
   
But I have a problem with a specific system -- admittedly it is
 rather
slow -- only 50 BogoMips according to the linux kernel. It is a
 Microblaze.
   
I use the Buildroot system for many different routers and other
 small
systems here.  I have compared different versions of dropbear,
 against
openssh.
   
My issue is with the server mode -- sshd --  I note that on
 dropbear 0.52
(which I happen to run on other routers here), I can connect from
 my ubuntu
or mac, to