Re: PPC bn_div_words routine rewrite

2005-07-08 Thread Andy Polyakov

Please do not use previously mentioned routine, it missed 1 corner
case where 32=num_bits_word(d)

Revised routine that passes (cd test; make bntest).  


Does it mean that previous version didn't actually pass the test? I mean 
if it did on your CPU, but not mine, probably we could learn something 
else about ways PPC can be implemented...



All I had to do is add one more instruction to the routine.

Please test on your ppc32 machines.

Once we are all happy,


Is this your agenda? Make everybody happy:-):-):-) Good luck:-):-):-)

it's a matter of adding the core dump at the beginning.  
Thus you have a fast,


32*(div latency + mul latency) is fast? If I call BN_bn2dec in loop it 
spins 4 times slower than with current implementation. Well, at least on 
computer I have access to...



easy to understand, predictable bn_div_words, as
opposed to that monster in 0.9.8.


Hostility again? Are you saying that nobody understands current 
implementation and that it produces unpredictable results? I disagree:-)



Other architectures will benefit if this C function is used in bn_asm.c


How? And which architectures exactly? Virtually all 32-bit 
architectures, including PPC32, opt for 
(BN_ULONG)(BN_ULLONG)h)BN_BITS2)|l)/(BN_ULLONG)d). A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PPC bn_div_words routine rewrite

2005-07-08 Thread David Ho
Forgive my lack of knowledge in your existing code.  But it is really
designed with optimization in mind?  What was the driving force for
the C function?

If it is optimized what is the time required?

I jumped way to early at the fast conclusion I must admit. Because I
really never had speed in mind.  As I explained my goal is to make it
easy to understand.  If it has any performance advantage it is purely
a side effect. (You never answer my comment about performance in my
last email so I can only guess what the design intent was for you
code).

I mean if you choose to optimize my code for speed, it's perfectly
doable and I have full comfidence anyone else who have read this email
thread can do it.  But again, I have no idea how much time you spend
on your routine so I guess I should refrain from dissing it.  My
mistake once again.

What else will you be teaching me today? =)

David

On 7/8/05, Andy Polyakov [EMAIL PROTECTED] wrote:
  Please do not use previously mentioned routine, it missed 1 corner
  case where 32=num_bits_word(d)
 
  Revised routine that passes (cd test; make bntest).
 
 Does it mean that previous version didn't actually pass the test? I mean
 if it did on your CPU, but not mine, probably we could learn something
 else about ways PPC can be implemented...
 
  All I had to do is add one more instruction to the routine.
 
  Please test on your ppc32 machines.
 
  Once we are all happy,
 
 Is this your agenda? Make everybody happy:-):-):-) Good luck:-):-):-)
 
  it's a matter of adding the core dump at the beginning.
  Thus you have a fast,
 
 32*(div latency + mul latency) is fast? If I call BN_bn2dec in loop it
 spins 4 times slower than with current implementation. Well, at least on
 computer I have access to...
 
  easy to understand, predictable bn_div_words, as
  opposed to that monster in 0.9.8.
 
 Hostility again? Are you saying that nobody understands current
 implementation and that it produces unpredictable results? I disagree:-)
 
  Other architectures will benefit if this C function is used in bn_asm.c
 
 How? And which architectures exactly? Virtually all 32-bit
 architectures, including PPC32, opt for
 (BN_ULONG)(BN_ULLONG)h)BN_BITS2)|l)/(BN_ULLONG)d). A.
 __
 OpenSSL Project http://www.openssl.org
 Development Mailing List   openssl-dev@openssl.org
 Automated List Manager   [EMAIL PROTECTED]

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PPC bn_div_words routine rewrite

2005-07-07 Thread David Ho
Please do not use previously mentioned routine, it missed 1 corner
case where 32=num_bits_word(d)

Revised routine that passes (cd test; make bntest).  
All I had to do is add one more instruction to the routine.

Please test on your ppc32 machines.

Once we are all happy, it's a matter of adding the core dump at the beginning.  
Thus you have a fast, easy to understand, predictable bn_div_words, as
opposed to that monster in 0.9.8.

#
#   Handcrafted version of bn_div_words
#
#   r3 = h
#   r4 = l
#   r5 = d

cmplwi  0,r5,0  # compare r5 and 0
bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
li  r3,-1   # d=0 return -1
bclrBO_ALWAYS,CR0_LT
.Lppcasm_div1:
cmplwi  0,r3,0  # compare r3 and 0
bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
divwu   r3,r4,r5# ret_q = l/d
bclrBO_ALWAYS,CR0_LT# return result in r3
.Lppcasm_div2:
divwu   r9,r3,r5# i_q = h/d
mullw   r10,r9,r5   # i_r = h - (i_q*d)
subfr10,r10,r3
mr  r3,r9   # req_q = i_q
.Lppcasm_set_ctr:
li  r12,32  # ctr = bitsizeof(d)
mtctr   r12
.Lppcasm_div_loop:
addcr4,r4,r4# l = l  1 - i_carry
adder11,r10,r10 # i_h = (i_r  1) | i_carry
divwu   r9,r11,r5   # i_q = i_h/d
addze   r9,r9   # very important! - DKWH
mullw   r10,r9,r5   # i_r = i_h - (i_q*d)
subfr10,r10,r11
add r3,r3,r3# ret_q = ret_q  1 | i_q
add r3,r3,r9
bc  BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
.Lppc_div_end:
bclrBO_ALWAYS,CR0_LT# return result in r3
.long   0x


Regards,
David


On 7/5/05, Peter Waltenberg [EMAIL PROTECTED] wrote:
  
 Thanks for finding and fixing this.  Particularly for finding and fixing it
 before 0.9.8 hit the streets. 
  
 Peter 
  
 Peter Waltenberg
  Architect
  IBM Crypto for C Team
  IBM/Tivoli Gold Coast Office
  
  
  
  
  Andy Polyakov [EMAIL PROTECTED] 
 Sent by: [EMAIL PROTECTED] 
 
 06/07/2005 07:49 AM 
  
 Please respond to
  openssl-dev 
  
  
 To openssl-dev@openssl.org 
  
 cc [EMAIL PROTECTED] 
  
 Subject Re: PPC bn_div_words routine rewrite 
  
  
  
  
  
  Okay, having actually did what Andy suggested, i.e. the one liner fix
   in the assembly code, bn_div_words returns the correct results.
  
  Note that the final version, one committed to all relevant OpenSSL 
  branches since couple of days ago and one which actually made to just 
  released 0.9.8, is a bit different from originally suggested one-line 
  fix, see for example
 http://cvs.openssl.org/chngview?cn=14199.
  
   At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
   bn_div_words routine generated from crypto/bn/ppc.pl is still busted.
  
  Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the 
  bug condition. Recall that original problem report was for 0.9.7.
  
   Why do you signal an overflow condition when it appears functions that
   call bn_div_words do not check for overflow conditions?
  
  That's question to IBM. By the time they submitted the code, I've 
  explicitly asked what would be appropriate way to generate *fatal* 
  condition at that point, i.e. one which would result in a core dump, and 
  it came out as division by 0 instruction. By that time I had no access 
  to any PPC machine and had to just go with it. Now it actually came as 
  surprise that division by 0 does not raise an exception, but silently 
  returns implementation-specific value... A.
 __
  OpenSSL Project http://www.openssl.org
  Development Mailing List   openssl-dev@openssl.org
  Automated List Manager   [EMAIL PROTECTED]
  

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PPC bn_div_words routine rewrite

2005-07-07 Thread David Ho
C function corresponding to assembly routine below.
It's provided to ease review of the assembly.
Other architectures will benefit if this C function is used in bn_asm.c

Regards,
David

unsigned long div_words (unsigned long h, 
 unsigned long l,
 unsigned long d)
{
  unsigned long i_h; /* intermediate dividend */
  unsigned long i_q; /* quotient of i/d */
  unsigned long i_r; /* remainder of i/d */

  unsigned long i_cntr;
  unsigned long i_carry;
  unsigned long i_overflow;

  unsigned long ret_q; /* return quotient */

  /* cannot divide by zero */
  if (d == 0) return 0x;

  /* do simple 32-bit divide */
  if (h == 0) return l/d;
 
  i_q = h/d;
  i_r = h - (i_q*d);
  ret_q = i_q;

  i_cntr = 32;

  while (i_cntr--)
  {
i_carry = (l  0x8000) ? 1:0;
l = l  1;

i_overflow = (i_r  0x8000) ? 1:0;
i_h = (i_r  1) | i_carry;
i_q = i_h/d;
i_q = i_q + i_overflow;
i_s = i_q*d;
i_r = i_h - (i_q*d);

ret_q = (ret_q  1) | i_q;

  }

  return ret_q;
}

On 7/7/05, David Ho [EMAIL PROTECTED] wrote:
 Please do not use previously mentioned routine, it missed 1 corner
 case where 32=num_bits_word(d)
 
 Revised routine that passes (cd test; make bntest).
 All I had to do is add one more instruction to the routine.
 
 Please test on your ppc32 machines.
 
 Once we are all happy, it's a matter of adding the core dump at the beginning.
 Thus you have a fast, easy to understand, predictable bn_div_words, as
 opposed to that monster in 0.9.8.
 
 #
 #   Handcrafted version of bn_div_words
 #
 #   r3 = h
 #   r4 = l
 #   r5 = d
 
 cmplwi  0,r5,0  # compare r5 and 0
 bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
 li  r3,-1   # d=0 return -1
 bclrBO_ALWAYS,CR0_LT
 .Lppcasm_div1:
 cmplwi  0,r3,0  # compare r3 and 0
 bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
 divwu   r3,r4,r5# ret_q = l/d
 bclrBO_ALWAYS,CR0_LT# return result in r3
 .Lppcasm_div2:
 divwu   r9,r3,r5# i_q = h/d
 mullw   r10,r9,r5   # i_r = h - (i_q*d)
 subfr10,r10,r3
 mr  r3,r9   # req_q = i_q
 .Lppcasm_set_ctr:
 li  r12,32  # ctr = bitsizeof(d)
 mtctr   r12
 .Lppcasm_div_loop:
 addcr4,r4,r4# l = l  1 - i_carry
 adder11,r10,r10 # i_h = (i_r  1) | i_carry
 divwu   r9,r11,r5   # i_q = i_h/d
 addze   r9,r9   # very important! - DKWH
 mullw   r10,r9,r5   # i_r = i_h - (i_q*d)
 subfr10,r10,r11
 add r3,r3,r3# ret_q = ret_q  1 | i_q
 add r3,r3,r9
 bc  BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
 .Lppc_div_end:
 bclrBO_ALWAYS,CR0_LT# return result in r3
 .long   0x
 
 
 Regards,
 David
 
 
 On 7/5/05, Peter Waltenberg [EMAIL PROTECTED] wrote:
 
  Thanks for finding and fixing this.  Particularly for finding and fixing it
  before 0.9.8 hit the streets.
 
  Peter
 
  Peter Waltenberg
   Architect
   IBM Crypto for C Team
   IBM/Tivoli Gold Coast Office
 
 
 
 
   Andy Polyakov [EMAIL PROTECTED]
  Sent by: [EMAIL PROTECTED]
 
  06/07/2005 07:49 AM
 
  Please respond to
   openssl-dev
 
 
  To openssl-dev@openssl.org
 
  cc [EMAIL PROTECTED]
 
  Subject Re: PPC bn_div_words routine rewrite
 
 
 
 
 
   Okay, having actually did what Andy suggested, i.e. the one liner fix
in the assembly code, bn_div_words returns the correct results.
 
   Note that the final version, one committed to all relevant OpenSSL
   branches since couple of days ago and one which actually made to just
   released 0.9.8, is a bit different from originally suggested one-line
   fix, see for example
  http://cvs.openssl.org/chngview?cn=14199.
 
At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
bn_div_words routine generated from crypto/bn/ppc.pl is still busted.
 
   Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the
   bug condition. Recall that original problem report was for 0.9.7.
 
Why do you signal an overflow condition when it appears functions that
call bn_div_words do not check for overflow conditions?
 
   That's question to IBM. By the time they submitted the code, I've
   explicitly asked what would be appropriate way to generate *fatal*
   condition at that point, i.e. one which would result in a core dump, and
   it came out as division by 0 instruction. By that time I had no access
   to any PPC machine and had to just go with it. Now it actually came as
   surprise that division by 0 does not raise an exception, but silently
   returns implementation-specific value

Re: PPC bn_div_words routine rewrite

2005-07-05 Thread Andy Polyakov

Let's start the week off with less hostility and more productive
criticism on this topic.


If you want productivity, then provide real evidence in form of stack 
backtrace at segmentation violation point, disassemble output in the 
vicinity of segmentation violation point and 'info registers' output at 
the same point. As for hostility I leave it without comment, as you're 
apparently can outrank anybody in that area:-)



But you're apparently right about a bug being present in PPC assembler.



So you are saying there is a bug in the GCC assembler? How confident
are you in that?  Is the first correct step to examine the assembly
code for errors before jumping to any conclusion that the GCC
assembler is bad?


Did I say GCC assembler? I said PPC assembler, which refers to 
crypto/bn/asm/ppc.pl.



This is a rewrite of the bn_div_words routine for the PowerPC arch,
tested on a MPC8xx processor.


Well, suggested routine apparently sends ssh-keygen on the PPC-based
32-bit system I have access to to an end-less loop... 



If you care to read the c function I supplied or if you don't believe
it:  If you understand ppc 32-bit instructions, as specified in the
PowerPC Microprocessor Family: Programming Environments for 32-Bit
Microprocessors.  My routine would not be able to find a condition
that will make it go into an end-less loop,unless you messed up bad
somewhere.


I didn't say that suggested routine goes into an end-less loop, but that 
it *apparently* sends ssh-keygen into end-less loop. I made no claims 
about which routine exactly loops, and I even admit that I don't know 
for sure if it was in fact end-less loop, because I've chosen to kill 
the process after couple of minutes. Note that normally it takes just 
few seconds on the machine I've tested on, so that couple of minutes is 
essentially unacceptable and by all means *appears* as end-less loop.



In summary, what I am trying to provide the community is an
alternative to ... the current implementation of which is
very questionable.


crypto/bn/asm/ppc.pl distributed with OpenSSL was designed for and 
explicitly tested by IBM under 32- and 64-bit PPC Linux, 32- and 64-bit 
AIX, as well as 32-bit MacOS X. Special care was taken to make sure that 
neither of ABIs/calling conventions used by above listed platforms are 
violated, so that module can be safely invoked by compiler-generated 
code for above mentioned OSes. Afterwards there were reports that it was 
successfully used on unspecified [in report] embedded PPC-based 
platform. Despite this on Friday I could personally confirm on 32-bit 
MacOS X that there admittedly was a bug in ppc.pl, which manifests as 
failure to generate sane decimal ASCII presentation of a BIGNUM, which 
is exactly the kind of operation taking place when you run ssh-keygen -t 
rsa1 [it should be noted decimal ASCII is unfortunately not covered by 
'make test_bn' suite]. But under no circumstances segmentation violation 
was observed. At the same time I could personally confirm that if pasted 
into osx32_ppc.s, suggested implementation induces 'make test_bn' 
failure on 32-bit MacOS X. In particular test/bntest terminates with


print test BN_sqr\n
-FF554CAEAE * -FF554CAEAE - FEAB0B30019BBA80FE44
Square test failed!
1

while test/exptest:

BN_mod_exp_recp() problems
14482:error:03082065:bignum routines:BN_div_recp:bad 
reciprocal:bn_recp.c:194:


For me it's enough reasons to become sceptical to submission and conduct 
trouble-shooting of my own. Currently available ppc.pl was verified to 
pass 'make test_bn' on 32-bit MacOS X, 32- and 64-bit AIX [tested by 
IBM], as well as to generate correct decimal ASCII presentation on the 
mentioned platforms. If it doesn't work for you, then submit information 
listed in the beginning of the letter. A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Fwd: PPC bn_div_words routine rewrite

2005-07-05 Thread David Ho
This is the second confirmed report of the same problem on the ppc8xx.

After reading my email.  I must say I was the unfriendly one, I
apologize for that.

More debugging evidence to come.  

-- Forwarded message --
From: Murch, Christopher [EMAIL PROTECTED]
Date: Jul 1, 2005 9:46 AM
Subject: RE: PPC bn_div_words routine rewrite
To: David Ho [EMAIL PROTECTED]


David,
I had observed the same issue on ppc 8xx machines after upgrading to the asm
version of the BN routines.  Thank you very much for your work for the fix.
My question is, do you have high confidence in the other new asm ppc BN
routines after observing this issue or do you think they might have similiar
problems?
Thanks.
Chris

-Original Message-
From: David Ho [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 30, 2005 6:22 PM
To: openssl-dev@openssl.org; [EMAIL PROTECTED]
Subject: Re: PPC bn_div_words routine rewrite


The reason I had to redo this routine, in case anyone is wondering, is
because ssh-keygen  segfaults when this assembly routine returns junk
to the BN_div_word function. On a ppc, if you issue the command

ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N 

The program craps out when it tries to write the public key in ascii
decimal.

Regards,
David

On 6/30/05, David Ho [EMAIL PROTECTED] wrote:
 Hi all,

 This is a rewrite of the bn_div_words routine for the PowerPC arch,
 tested on a MPC8xx processor.
 I initially thought there is maybe a small mistake in the code that
 requires a one-liner change but it turns out I have to redo the
 routine.
 I guess this routine is not called very often as I see that most other
 routines are hand-crafted, whereas this routine is compiled from a C
 function that apparently has not gone through a whole lot of testing.

 I wrote a C function to confirm correctness of the code.

 unsigned long div_words (unsigned long h,
  unsigned long l,
  unsigned long d)
 {
   unsigned long i_h; /* intermediate dividend */
   unsigned long i_q; /* quotient of i_h/d */
   unsigned long i_r; /* remainder of i_h/d */

   unsigned long i_cntr;
   unsigned long i_carry;

   unsigned long ret_q; /* return quotient */

   /* cannot divide by zero */
   if (d == 0) return 0x;

   /* do simple 32-bit divide */
   if (h == 0) return l/d;

   i_q = h/d;
   i_r = h - (i_q*d);
   ret_q = i_q;

   i_cntr = 32;

   while (i_cntr--)
   {
 i_carry = (l  0x8000) ? 1:0;
 l = l  1;

 i_h = (i_r  1) | i_carry;
 i_q = i_h/d;
 i_r = i_h - (i_q*d);

 ret_q = (ret_q  1) | i_q;
   }

   return ret_q;
 }


 Then I handcrafted the routine in PPC assembly.
 The result is a 26 line assembly that is easy to understand and
 predictable as opposed to a 81liner that I am still trying to
 decipher...
 If anyone is interested in incorporating this routine to the openssl
 code I'll be happy to assist.
 At this point I think I will be taking a bit of a break from this 3
 day debugging/fixing marathon.

 Regards,
 David Ho


 #
 #   Handcrafted version of bn_div_words
 #
 #   r3 = h
 #   r4 = l
 #   r5 = d

 cmplwi  0,r5,0  # compare r5 and 0
 bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
 li  r3,-1   # d=0 return -1
 bclrBO_ALWAYS,CR0_LT
 .Lppcasm_div1:
 cmplwi  0,r3,0  # compare r3 and 0
 bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
 divwu   r3,r4,r5# ret_q = l/d
 bclrBO_ALWAYS,CR0_LT# return result in r3
 .Lppcasm_div2:
 divwu   r9,r3,r5# i_q = h/d
 mullw   r10,r9,r5   # i_r = h - (i_q*d)
 subfr10,r10,r3
 mr  r3,r9   # req_q = i_q
 .Lppcasm_set_ctr:
 li  r12,32  # ctr = bitsizeof(d)
 mtctr   r12
 .Lppcasm_div_loop:
 addcr4,r4,r4# l = l  1 - i_carry
 adder11,r10,r10 # i_h = (i_r  1) | i_carry
 divwu   r9,r11,r5   # i_q = i_h/d
 mullw   r10,r9,r5   # i_r = i_h - (i_q*d)
 subfr10,r10,r11
 add r3,r3,r3# ret_q = ret_q  1 | i_q
 add r3,r3,r9
 bc  BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
 .Lppc_div_end:
 bclrBO_ALWAYS,CR0_LT# return result in r3
 .long   0x

___
Linuxppc-embedded mailing list
[EMAIL PROTECTED]
https://ozlabs.org/mailman/listinfo/linuxppc-embedded
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PPC bn_div_words routine rewrite

2005-07-05 Thread David Ho
First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
with following arguments.

(gdb) show args
Argument list to give program being debugged when it is started is
-t rsa1 -f /etc/ssh/ssh_host_key -N .

Program received signal SIGSEGV, Segmentation fault.
BN_bn2dec (a=0x1002d9f0) at bn_print.c:136
136 *lp=BN_div_word(t,BN_DEC_CONV);

(gdb) i r
r0 0x0  0
r1 0x7fffd580   2147472768
r2 0x30012868   805382248
r3 0x8000   2147483648
r4 0xfef33fc267334652
r5 0x25 37
r6 0xfccdef8265084664
r7 0x7fffd4c0   2147472576
r8 0xfbad2887   4222429319
r9 0x84044022   2214871074
r100x0  0
r110x2  2
r120xfef2054267329620
r130x10030bc8   268635080
r140x0  0
r150x0  0
r160x0  0
r170x0  0
r180x0  0
r190x0  0
r200x0  0
r210x0  0
r220x0  0
r230x64 100
r240x5  5
r250x1002d438   268620856
r260x1002d9f0   268622320
r270x1002c578   268617080
r280x1  1
r290x10031000   268636160
r300xffbf7d0268171216
r310x1002d9f0   268622320
pc 0xfef2058267329624
ps 0xd032   53298
cr 0x24044022   604258338
lr 0xfef2054267329620
ctr0xfccefa0265088928
xer0x2000   536870912
fpscr  0x0  0
vscr   0x0  0
vrsave 0x0  0

(gdb) p/x $pc
$1 = 0xfef2058

0x0fef2058 BN_bn2dec+472: stw r3,0(r29)

(gdb) x 0x10031000
0x10031000: Cannot access memory at address 0x10031000










On 7/5/05, David Ho [EMAIL PROTECTED] wrote:
 This is the second confirmed report of the same problem on the ppc8xx.
 
 After reading my email.  I must say I was the unfriendly one, I
 apologize for that.
 
 More debugging evidence to come.
 
 -- Forwarded message --
 From: Murch, Christopher [EMAIL PROTECTED]
 Date: Jul 1, 2005 9:46 AM
 Subject: RE: PPC bn_div_words routine rewrite
 To: David Ho [EMAIL PROTECTED]
 
 
 David,
 I had observed the same issue on ppc 8xx machines after upgrading to the asm
 version of the BN routines.  Thank you very much for your work for the fix.
 My question is, do you have high confidence in the other new asm ppc BN
 routines after observing this issue or do you think they might have similiar
 problems?
 Thanks.
 Chris
 
 -Original Message-
 From: David Ho [mailto:[EMAIL PROTECTED]
 Sent: Thursday, June 30, 2005 6:22 PM
 To: openssl-dev@openssl.org; [EMAIL PROTECTED]
 Subject: Re: PPC bn_div_words routine rewrite
 
 
 The reason I had to redo this routine, in case anyone is wondering, is
 because ssh-keygen  segfaults when this assembly routine returns junk
 to the BN_div_word function. On a ppc, if you issue the command
 
 ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N 
 
 The program craps out when it tries to write the public key in ascii
 decimal.
 
 Regards,
 David
 
 On 6/30/05, David Ho [EMAIL PROTECTED] wrote:
  Hi all,
 
  This is a rewrite of the bn_div_words routine for the PowerPC arch,
  tested on a MPC8xx processor.
  I initially thought there is maybe a small mistake in the code that
  requires a one-liner change but it turns out I have to redo the
  routine.
  I guess this routine is not called very often as I see that most other
  routines are hand-crafted, whereas this routine is compiled from a C
  function that apparently has not gone through a whole lot of testing.
 
  I wrote a C function to confirm correctness of the code.
 
  unsigned long div_words (unsigned long h,
   unsigned long l,
   unsigned long d)
  {
unsigned long i_h; /* intermediate dividend */
unsigned long i_q; /* quotient of i_h/d */
unsigned long i_r; /* remainder of i_h/d */
 
unsigned long i_cntr;
unsigned long i_carry;
 
unsigned long ret_q; /* return quotient */
 
/* cannot divide by zero */
if (d == 0) return 0x;
 
/* do simple 32-bit divide */
if (h == 0) return l/d;
 
i_q = h/d;
i_r = h - (i_q*d);
ret_q = i_q;
 
i_cntr = 32;
 
while (i_cntr--)
{
  i_carry = (l  0x8000) ? 1:0;
  l = l  1;
 
  i_h = (i_r  1) | i_carry;
  i_q = i_h/d;
  i_r = i_h - (i_q*d);
 
  ret_q = (ret_q  1) | i_q;
}
 
return ret_q;
  }
 
 
  Then I handcrafted the routine in PPC assembly.
  The result is a 26 line assembly that is easy to understand and
  predictable as opposed to a 81liner that I am still trying to
  decipher...
  If anyone is interested in incorporating this routine

Re: PPC bn_div_words routine rewrite

2005-07-05 Thread David Ho
I can tell you with certainty, with reference to the function
BN_bn2dec, that since lp is a pointer, and within the while loop
around bn_print.c:136 lp is being incremented.  Because the test
BN_is_zero(t) is always false, you have a pointer that is going off
into the stratosphere, hence the segfault on ppc8xx.

More analysis to come.

On 7/5/05, David Ho [EMAIL PROTECTED] wrote:
 First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
 with following arguments.
 
 (gdb) show args
 Argument list to give program being debugged when it is started is
 -t rsa1 -f /etc/ssh/ssh_host_key -N .
 
 Program received signal SIGSEGV, Segmentation fault.
 BN_bn2dec (a=0x1002d9f0) at bn_print.c:136
 136 *lp=BN_div_word(t,BN_DEC_CONV);
 
 (gdb) i r
 r0 0x0  0
 r1 0x7fffd580   2147472768
 r2 0x30012868   805382248
 r3 0x8000   2147483648
 r4 0xfef33fc267334652
 r5 0x25 37
 r6 0xfccdef8265084664
 r7 0x7fffd4c0   2147472576
 r8 0xfbad2887   4222429319
 r9 0x84044022   2214871074
 r100x0  0
 r110x2  2
 r120xfef2054267329620
 r130x10030bc8   268635080
 r140x0  0
 r150x0  0
 r160x0  0
 r170x0  0
 r180x0  0
 r190x0  0
 r200x0  0
 r210x0  0
 r220x0  0
 r230x64 100
 r240x5  5
 r250x1002d438   268620856
 r260x1002d9f0   268622320
 r270x1002c578   268617080
 r280x1  1
 r290x10031000   268636160
 r300xffbf7d0268171216
 r310x1002d9f0   268622320
 pc 0xfef2058267329624
 ps 0xd032   53298
 cr 0x24044022   604258338
 lr 0xfef2054267329620
 ctr0xfccefa0265088928
 xer0x2000   536870912
 fpscr  0x0  0
 vscr   0x0  0
 vrsave 0x0  0
 
 (gdb) p/x $pc
 $1 = 0xfef2058
 
 0x0fef2058 BN_bn2dec+472: stw r3,0(r29)
 
 (gdb) x 0x10031000
 0x10031000: Cannot access memory at address 0x10031000
 
 
 
 
 
 
 
 
 
 
 On 7/5/05, David Ho [EMAIL PROTECTED] wrote:
  This is the second confirmed report of the same problem on the ppc8xx.
 
  After reading my email.  I must say I was the unfriendly one, I
  apologize for that.
 
  More debugging evidence to come.
 
  -- Forwarded message --
  From: Murch, Christopher [EMAIL PROTECTED]
  Date: Jul 1, 2005 9:46 AM
  Subject: RE: PPC bn_div_words routine rewrite
  To: David Ho [EMAIL PROTECTED]
 
 
  David,
  I had observed the same issue on ppc 8xx machines after upgrading to the asm
  version of the BN routines.  Thank you very much for your work for the fix.
  My question is, do you have high confidence in the other new asm ppc BN
  routines after observing this issue or do you think they might have similiar
  problems?
  Thanks.
  Chris
 
  -Original Message-
  From: David Ho [mailto:[EMAIL PROTECTED]
  Sent: Thursday, June 30, 2005 6:22 PM
  To: openssl-dev@openssl.org; [EMAIL PROTECTED]
  Subject: Re: PPC bn_div_words routine rewrite
 
 
  The reason I had to redo this routine, in case anyone is wondering, is
  because ssh-keygen  segfaults when this assembly routine returns junk
  to the BN_div_word function. On a ppc, if you issue the command
 
  ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N 
 
  The program craps out when it tries to write the public key in ascii
  decimal.
 
  Regards,
  David
 
  On 6/30/05, David Ho [EMAIL PROTECTED] wrote:
   Hi all,
  
   This is a rewrite of the bn_div_words routine for the PowerPC arch,
   tested on a MPC8xx processor.
   I initially thought there is maybe a small mistake in the code that
   requires a one-liner change but it turns out I have to redo the
   routine.
   I guess this routine is not called very often as I see that most other
   routines are hand-crafted, whereas this routine is compiled from a C
   function that apparently has not gone through a whole lot of testing.
  
   I wrote a C function to confirm correctness of the code.
  
   unsigned long div_words (unsigned long h,
unsigned long l,
unsigned long d)
   {
 unsigned long i_h; /* intermediate dividend */
 unsigned long i_q; /* quotient of i_h/d */
 unsigned long i_r; /* remainder of i_h/d */
  
 unsigned long i_cntr;
 unsigned long i_carry;
  
 unsigned long ret_q; /* return quotient */
  
 /* cannot divide by zero */
 if (d == 0) return 0x;
  
 /* do simple 32-bit divide */
 if (h == 0) return l/d;
  
 i_q = h/d;
 i_r = h - (i_q*d

Re: PPC bn_div_words routine rewrite

2005-07-05 Thread David Ho
Let's take first call to BN_div_word for example from BN_bn2dec, the
parameter being passed to BN_div_word is (a=35, w=10) (decimal
numbers).  It then calls the bn_div_words with (h=0, l=35,
d=10)  if you examine the code in linux_ppc32.s it will exit
early on because h is 0.  the routine returns a divide by 0, which is
undefined according to the manual.  In the case of ppc8xx the result
is 0x8000.  So this is the return value from bn_div_words, as seen
in register R3.

So what happens next is BN_div_word modifies a (1st parameter) with
the result (0x8000) and returns 23 as the remainder of the
division. So a is never zero as a result and hence the test for
BN_is_zero is always false.  The problem fails the very first time it
uses bn_div_words.

The next thing I did naturally was to fix the case when you have h=0,
which you can quite easy do it with the native divwu instruction.  Lo
and behold I was once again disappointed when h is not equal to 0.

More to come...


On 7/5/05, David Ho [EMAIL PROTECTED] wrote:
 I can tell you with certainty, with reference to the function
 BN_bn2dec, that since lp is a pointer, and within the while loop
 around bn_print.c:136 lp is being incremented.  Because the test
 BN_is_zero(t) is always false, you have a pointer that is going off
 into the stratosphere, hence the segfault on ppc8xx.
 
 More analysis to come.
 
 On 7/5/05, David Ho [EMAIL PROTECTED] wrote:
  First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
  with following arguments.
 
  (gdb) show args
  Argument list to give program being debugged when it is started is
  -t rsa1 -f /etc/ssh/ssh_host_key -N .
 
  Program received signal SIGSEGV, Segmentation fault.
  BN_bn2dec (a=0x1002d9f0) at bn_print.c:136
  136 *lp=BN_div_word(t,BN_DEC_CONV);
 
  (gdb) i r
  r0 0x0  0
  r1 0x7fffd580   2147472768
  r2 0x30012868   805382248
  r3 0x8000   2147483648
  r4 0xfef33fc267334652
  r5 0x25 37
  r6 0xfccdef8265084664
  r7 0x7fffd4c0   2147472576
  r8 0xfbad2887   4222429319
  r9 0x84044022   2214871074
  r100x0  0
  r110x2  2
  r120xfef2054267329620
  r130x10030bc8   268635080
  r140x0  0
  r150x0  0
  r160x0  0
  r170x0  0
  r180x0  0
  r190x0  0
  r200x0  0
  r210x0  0
  r220x0  0
  r230x64 100
  r240x5  5
  r250x1002d438   268620856
  r260x1002d9f0   268622320
  r270x1002c578   268617080
  r280x1  1
  r290x10031000   268636160
  r300xffbf7d0268171216
  r310x1002d9f0   268622320
  pc 0xfef2058267329624
  ps 0xd032   53298
  cr 0x24044022   604258338
  lr 0xfef2054267329620
  ctr0xfccefa0265088928
  xer0x2000   536870912
  fpscr  0x0  0
  vscr   0x0  0
  vrsave 0x0  0
 
  (gdb) p/x $pc
  $1 = 0xfef2058
 
  0x0fef2058 BN_bn2dec+472: stw r3,0(r29)
 
  (gdb) x 0x10031000
  0x10031000: Cannot access memory at address 0x10031000
 
 
 
 
 
 
 
 
 
 
  On 7/5/05, David Ho [EMAIL PROTECTED] wrote:
   This is the second confirmed report of the same problem on the ppc8xx.
  
   After reading my email.  I must say I was the unfriendly one, I
   apologize for that.
  
   More debugging evidence to come.
  
   -- Forwarded message --
   From: Murch, Christopher [EMAIL PROTECTED]
   Date: Jul 1, 2005 9:46 AM
   Subject: RE: PPC bn_div_words routine rewrite
   To: David Ho [EMAIL PROTECTED]
  
  
   David,
   I had observed the same issue on ppc 8xx machines after upgrading to the 
   asm
   version of the BN routines.  Thank you very much for your work for the 
   fix.
   My question is, do you have high confidence in the other new asm ppc BN
   routines after observing this issue or do you think they might have 
   similiar
   problems?
   Thanks.
   Chris
  
   -Original Message-
   From: David Ho [mailto:[EMAIL PROTECTED]
   Sent: Thursday, June 30, 2005 6:22 PM
   To: openssl-dev@openssl.org; [EMAIL PROTECTED]
   Subject: Re: PPC bn_div_words routine rewrite
  
  
   The reason I had to redo this routine, in case anyone is wondering, is
   because ssh-keygen  segfaults when this assembly routine returns junk
   to the BN_div_word function. On a ppc, if you issue the command
  
   ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N 
  
   The program craps out when it tries to write the public key in ascii
   decimal.
  
   Regards,
   David
  
   On 6/30/05

Re: PPC bn_div_words routine rewrite

2005-07-05 Thread Andy Polyakov

Let's take first call to BN_div_word for example from BN_bn2dec, the
parameter being passed to BN_div_word is (a=35, w=10) (decimal
numbers).  It then calls the bn_div_words with (h=0, l=35,
d=10)  if you examine the code in linux_ppc32.s it will exit
early on because h is 0.  the routine returns a divide by 0,  which is
undefined according to the manual.  In the case of ppc8xx the result
is 0x8000.


And on the PPC machine I have access to it returns 0. This is 
explanation why I never experienced any SEGV, but a sparse decimal 
output. And it does explain why BN_is_zero condition never met on your 
system and you hit sbrk(0) limit and suffer the penalty. However! Note 
that updated routine, 
http://cvs.openssl.org/getfile/openssl/crypto/bn/asm/ppc.pl?v=1.4 never 
issues divide by 0 [it traps instead, but the condition is never met now 
when called from BN_div_words] and it does return correct answer to me. 
Can you really confirm that updated subroutine doesn't work for you? And 
if so, how does problem manifest? Still SEGV? At same point?


It should pointed out that bug in ppc.pl is encountered only in 0.9.7 
context, as 0.9.8 avoids it by normalizing divisor [and adjusting 
dividend accordingly]. BTW, I can confirm that 0.9.7 produces correct 
decimal ASCII with your routine [but no luck with make test_bn], but in 
0.9.8 context decimal printout comes out truncated [not sparse with some 
significant digits there and there, but truncated] if your routine is 
pasted in. A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PPC bn_div_words routine rewrite

2005-07-05 Thread David Ho
:
This is the second confirmed report of the same problem on the ppc8xx.
   
After reading my email.  I must say I was the unfriendly one, I
apologize for that.
   
More debugging evidence to come.
   
-- Forwarded message --
From: Murch, Christopher [EMAIL PROTECTED]
Date: Jul 1, 2005 9:46 AM
Subject: RE: PPC bn_div_words routine rewrite
To: David Ho [EMAIL PROTECTED]
   
   
David,
I had observed the same issue on ppc 8xx machines after upgrading to 
the asm
version of the BN routines.  Thank you very much for your work for the 
fix.
My question is, do you have high confidence in the other new asm ppc BN
routines after observing this issue or do you think they might have 
similiar
problems?
Thanks.
Chris
   
-Original Message-
From: David Ho [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 30, 2005 6:22 PM
To: openssl-dev@openssl.org; [EMAIL PROTECTED]
Subject: Re: PPC bn_div_words routine rewrite
   
   
The reason I had to redo this routine, in case anyone is wondering, is
because ssh-keygen  segfaults when this assembly routine returns junk
to the BN_div_word function. On a ppc, if you issue the command
   
ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N 
   
The program craps out when it tries to write the public key in ascii
decimal.
   
Regards,
David
   
On 6/30/05, David Ho [EMAIL PROTECTED] wrote:
 Hi all,

 This is a rewrite of the bn_div_words routine for the PowerPC arch,
 tested on a MPC8xx processor.
 I initially thought there is maybe a small mistake in the code that
 requires a one-liner change but it turns out I have to redo the
 routine.
 I guess this routine is not called very often as I see that most other
 routines are hand-crafted, whereas this routine is compiled from a C
 function that apparently has not gone through a whole lot of testing.

 I wrote a C function to confirm correctness of the code.

 unsigned long div_words (unsigned long h,
  unsigned long l,
  unsigned long d)
 {
   unsigned long i_h; /* intermediate dividend */
   unsigned long i_q; /* quotient of i_h/d */
   unsigned long i_r; /* remainder of i_h/d */

   unsigned long i_cntr;
   unsigned long i_carry;

   unsigned long ret_q; /* return quotient */

   /* cannot divide by zero */
   if (d == 0) return 0x;

   /* do simple 32-bit divide */
   if (h == 0) return l/d;

   i_q = h/d;
   i_r = h - (i_q*d);
   ret_q = i_q;

   i_cntr = 32;

   while (i_cntr--)
   {
 i_carry = (l  0x8000) ? 1:0;
 l = l  1;

 i_h = (i_r  1) | i_carry;
 i_q = i_h/d;
 i_r = i_h - (i_q*d);

 ret_q = (ret_q  1) | i_q;
   }

   return ret_q;
 }


 Then I handcrafted the routine in PPC assembly.
 The result is a 26 line assembly that is easy to understand and
 predictable as opposed to a 81liner that I am still trying to
 decipher...
 If anyone is interested in incorporating this routine to the openssl
 code I'll be happy to assist.
 At this point I think I will be taking a bit of a break from this 3
 day debugging/fixing marathon.

 Regards,
 David Ho


 #
 #   Handcrafted version of bn_div_words
 #
 #   r3 = h
 #   r4 = l
 #   r5 = d

 cmplwi  0,r5,0  # compare r5 and 0
 bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
 li  r3,-1   # d=0 return -1
 bclrBO_ALWAYS,CR0_LT
 .Lppcasm_div1:
 cmplwi  0,r3,0  # compare r3 and 0
 bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
 divwu   r3,r4,r5# ret_q = l/d
 bclrBO_ALWAYS,CR0_LT# return result in r3
 .Lppcasm_div2:
 divwu   r9,r3,r5# i_q = h/d
 mullw   r10,r9,r5   # i_r = h - (i_q*d)
 subfr10,r10,r3
 mr  r3,r9   # req_q = i_q
 .Lppcasm_set_ctr:
 li  r12,32  # ctr = bitsizeof(d)
 mtctr   r12
 .Lppcasm_div_loop:
 addcr4,r4,r4# l = l  1 - i_carry
 adder11,r10,r10 # i_h = (i_r  1) | i_carry
 divwu   r9,r11,r5   # i_q = i_h/d
 mullw   r10,r9,r5   # i_r = i_h - (i_q*d)
 subfr10,r10,r11
 add r3,r3,r3# ret_q = ret_q  1 | i_q
 add r3,r3,r9
 bc  BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
 .Lppc_div_end:
 bclrBO_ALWAYS

Re: PPC bn_div_words routine rewrite

2005-07-05 Thread Andy Polyakov

Okay, having actually did what Andy suggested, i.e. the one liner fix
in the assembly code, bn_div_words returns the correct results.


Note that the final version, one committed to all relevant OpenSSL 
branches since couple of days ago and one which actually made to just 
released 0.9.8, is a bit different from originally suggested one-line 
fix, see for example http://cvs.openssl.org/chngview?cn=14199.



At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
bn_div_words routine generated from crypto/bn/ppc.pl is still busted.


Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the 
bug condition. Recall that original problem report was for 0.9.7.



Why do you signal an overflow condition when it appears functions that
call bn_div_words do not check for overflow conditions?


That's question to IBM. By the time they submitted the code, I've 
explicitly asked what would be appropriate way to generate *fatal* 
condition at that point, i.e. one which would result in a core dump, and 
it came out as division by 0 instruction. By that time I had no access 
to any PPC machine and had to just go with it. Now it actually came as 
surprise that division by 0 does not raise an exception, but silently 
returns implementation-specific value... A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PPC bn_div_words routine rewrite

2005-07-01 Thread Andy Polyakov

The reason I had to redo this routine, in case anyone is wondering, is
because ssh-keygen  segfaults when this assembly routine returns junk
to the BN_div_word function. On a ppc, if you issue the command

ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N 

The program craps out when it tries to write the public key in ascii decimal.


If would help if you provide evidence such as debugger stack trace and 
program output. Provided description makes no sense. seg-faults when 
routine returns junk to BN_div_word? Seg-fault [segmentation violation] 
can occur when you write something to memory and nothing gets written to 
memory upon result return. BN_div_word does write to memory, but I fail 
to see how a bogus value could possibly trigger seg-fault. The only 
possibility is that assembler doesn't follow ABI convention and corrupts 
registers, which caller is using/expects to be preserved by callee. 
There're several PPC ABI flavors in use, but OpenSSL routines were 
designed ABI-neutral, Well, neutrality really means common 
denominator for ABI specs examined at the moment of coding, so there is 
a window of opportunity that it won't be neutral to future ABI, but is 
it really case? That your system uses some newly designed PPC ABI? You 
never mentioned what's your system...


But you're apparently right about a bug being present in PPC assembler. 
I too have got insane [with very few significant digits] decimal 
printout of public key generated by ssh-keygen...



This is a rewrite of the bn_div_words routine for the PowerPC arch,
tested on a MPC8xx processor.


Well, suggested routine apparently sends ssh-keygen on the PPC-based 
32-bit system I have access to to an end-less loop... And (cd test; make 
test_bn) fails early in BN_sqr... And test/exptest fails miserably with 
bad reciprocal...



I initially thought there is maybe a small mistake in the code that
requires a one-liner change


But apparently this appears to be the case! Please verify following:

--- crypto/bn/asm/ppc.pl.orig2004-04-28 00:05:50.0 +0200
+++ crypto/bn/asm/ppc.pl  2005-07-01 18:58:21.105656512 +0200
@@ -1717,7 +1717,7 @@
li  r9,1# r9=1
$SHLr10,r9,r8   # r9=r8
$UCMP   0,r3,r10#
-   bc  BO_IF,CR0_GT,Lppcasm_div2   #or if (h  (1r8))
+   bc  BO_IF_NOT,CR0_GT,Lppcasm_div2   #or if (h  (1r8))
$UDIV   r3,r3,r0#if not assert(0) divide by 0!
#that's how we signal overflow
bclrBO_ALWAYS,CR0_LT#return. NEVER REACHED.

A.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


PPC bn_div_words routine rewrite

2005-06-30 Thread David Ho
Hi all, 

This is a rewrite of the bn_div_words routine for the PowerPC arch,
tested on a MPC8xx processor.
I initially thought there is maybe a small mistake in the code that
requires a one-liner change but it turns out I have to redo the
routine.
I guess this routine is not called very often as I see that most other
routines are hand-crafted, whereas this routine is compiled from a C
function that apparently has not gone through a whole lot of testing.

I wrote a C function to confirm correctness of the code.

unsigned long div_words (unsigned long h, 
 unsigned long l,
 unsigned long d)
{
  unsigned long i_h; /* intermediate dividend */
  unsigned long i_q; /* quotient of i_h/d */
  unsigned long i_r; /* remainder of i_h/d */

  unsigned long i_cntr;
  unsigned long i_carry;

  unsigned long ret_q; /* return quotient */

  /* cannot divide by zero */
  if (d == 0) return 0x;

  /* do simple 32-bit divide */
  if (h == 0) return l/d;
 
  i_q = h/d;
  i_r = h - (i_q*d);
  ret_q = i_q;

  i_cntr = 32;

  while (i_cntr--)
  {
i_carry = (l  0x8000) ? 1:0;
l = l  1;

i_h = (i_r  1) | i_carry;
i_q = i_h/d;
i_r = i_h - (i_q*d);

ret_q = (ret_q  1) | i_q;
  }

  return ret_q;
}


Then I handcrafted the routine in PPC assembly. 
The result is a 26 line assembly that is easy to understand and
predictable as opposed to a 81liner that I am still trying to
decipher...
If anyone is interested in incorporating this routine to the openssl
code I'll be happy to assist.
At this point I think I will be taking a bit of a break from this 3
day debugging/fixing marathon.

Regards, 
David Ho


#
#   Handcrafted version of bn_div_words
#
#   r3 = h
#   r4 = l
#   r5 = d

cmplwi  0,r5,0  # compare r5 and 0
bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
li  r3,-1   # d=0 return -1
bclrBO_ALWAYS,CR0_LT
.Lppcasm_div1:
cmplwi  0,r3,0  # compare r3 and 0
bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
divwu   r3,r4,r5# ret_q = l/d
bclrBO_ALWAYS,CR0_LT# return result in r3
.Lppcasm_div2:
divwu   r9,r3,r5# i_q = h/d
mullw   r10,r9,r5   # i_r = h - (i_q*d)
subfr10,r10,r3
mr  r3,r9   # req_q = i_q
.Lppcasm_set_ctr:
li  r12,32  # ctr = bitsizeof(d)
mtctr   r12
.Lppcasm_div_loop:
addcr4,r4,r4# l = l  1 - i_carry
adder11,r10,r10 # i_h = (i_r  1) | i_carry
divwu   r9,r11,r5   # i_q = i_h/d
mullw   r10,r9,r5   # i_r = i_h - (i_q*d)
subfr10,r10,r11
add r3,r3,r3# ret_q = ret_q  1 | i_q
add r3,r3,r9
bc  BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
.Lppc_div_end:
bclrBO_ALWAYS,CR0_LT# return result in r3
.long   0x
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


PPC bn_div_words routine rewrite - resend

2005-06-30 Thread David Ho
Hi all,

This is a rewrite of the bn_div_words routine for the PowerPC arch, tested
on a MPC8xx processor.
I initially thought there is maybe a small mistake in the code that
requires a one-liner change but it turns out I have to redo the routine.
I guess this routine is not called very often as I see that most other
routines are hand-crafted, whereas this routine is compiled from a C
function that apparently has not gone through a whole lot of testing.

I wrote a C function to confirm correctness of the code.

unsigned long div_words (unsigned long h,
 unsigned long l,
 unsigned long d)
{
  unsigned long i_h; /* intermediate dividend */
  unsigned long i_q; /* quotient of i_h/d */
  unsigned long i_r; /* remainder of i_h/d */

  unsigned long i_cntr;
  unsigned long i_carry;

  unsigned long ret_q; /* return quotient */

  /* cannot divide by zero */
  if (d == 0) return 0x;

  /* do simple 32-bit divide */
  if (h == 0) return l/d;

  i_q = h/d;
  i_r = h - (i_q*d);
  ret_q = i_q;

  i_cntr = 32;

  while (i_cntr--)
  {
i_carry = (l  0x8000) ? 1:0;
l = l  1;

i_h = (i_r  1) | i_carry;
i_q = i_h/d;
i_r = i_h - (i_q*d);

ret_q = (ret_q  1) | i_q;
  }

  return ret_q;
}


Then I handcrafted the routine in PPC assembly.
The result is a 26 line assembly that is easy to understand and predictable
as opposed to a 81liner that I am still trying to decipher...
If anyone is interested in incorporating this routine to the openssl code
I'll be happy to assist.
At this point I think I will be taking a bit of a break from this 3 day
debugging/fixing marathon.

Regards,
David Ho


#
#   Handcrafted version of bn_div_words
#
#   r3 = h
#   r4 = l
#   r5 = d

cmplwi  0,r5,0  # compare r5 and 0
bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
li  r3,-1   # d=0 return -1
bclrBO_ALWAYS,CR0_LT
.Lppcasm_div1:
cmplwi  0,r3,0  # compare r3 and 0
bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
divwu   r3,r4,r5# ret_q = l/d
bclrBO_ALWAYS,CR0_LT# return result in r3
.Lppcasm_div2:
divwu   r9,r3,r5# i_q = h/d
mullw   r10,r9,r5   # i_r = h - (i_q*d)
subfr10,r10,r3
mr  r3,r9   # req_q = i_q
.Lppcasm_set_ctr:
li  r12,32  # ctr = bitsizeof(d)
mtctr   r12
.Lppcasm_div_loop:
addcr4,r4,r4# l = l  1 - i_carry
adder11,r10,r10 # i_h = (i_r  1) | i_carry
divwu   r9,r11,r5   # i_q = i_h/d
mullw   r10,r9,r5   # i_r = i_h - (i_q*d)
subfr10,r10,r11
add r3,r3,r3# ret_q = ret_q  1 | i_q
add r3,r3,r9
bc  BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
.Lppc_div_end:
bclrBO_ALWAYS,CR0_LT# return result in r3
.long   0x
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PPC bn_div_words routine rewrite

2005-06-30 Thread David Ho
The reason I had to redo this routine, in case anyone is wondering, is
because ssh-keygen  segfaults when this assembly routine returns junk
to the BN_div_word function. On a ppc, if you issue the command

ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N 

The program craps out when it tries to write the public key in ascii decimal.

Regards,
David 

On 6/30/05, David Ho [EMAIL PROTECTED] wrote:
 Hi all,
 
 This is a rewrite of the bn_div_words routine for the PowerPC arch,
 tested on a MPC8xx processor.
 I initially thought there is maybe a small mistake in the code that
 requires a one-liner change but it turns out I have to redo the
 routine.
 I guess this routine is not called very often as I see that most other
 routines are hand-crafted, whereas this routine is compiled from a C
 function that apparently has not gone through a whole lot of testing.
 
 I wrote a C function to confirm correctness of the code.
 
 unsigned long div_words (unsigned long h,
  unsigned long l,
  unsigned long d)
 {
   unsigned long i_h; /* intermediate dividend */
   unsigned long i_q; /* quotient of i_h/d */
   unsigned long i_r; /* remainder of i_h/d */
 
   unsigned long i_cntr;
   unsigned long i_carry;
 
   unsigned long ret_q; /* return quotient */
 
   /* cannot divide by zero */
   if (d == 0) return 0x;
 
   /* do simple 32-bit divide */
   if (h == 0) return l/d;
 
   i_q = h/d;
   i_r = h - (i_q*d);
   ret_q = i_q;
 
   i_cntr = 32;
 
   while (i_cntr--)
   {
 i_carry = (l  0x8000) ? 1:0;
 l = l  1;
 
 i_h = (i_r  1) | i_carry;
 i_q = i_h/d;
 i_r = i_h - (i_q*d);
 
 ret_q = (ret_q  1) | i_q;
   }
 
   return ret_q;
 }
 
 
 Then I handcrafted the routine in PPC assembly.
 The result is a 26 line assembly that is easy to understand and
 predictable as opposed to a 81liner that I am still trying to
 decipher...
 If anyone is interested in incorporating this routine to the openssl
 code I'll be happy to assist.
 At this point I think I will be taking a bit of a break from this 3
 day debugging/fixing marathon.
 
 Regards,
 David Ho
 
 
 #
 #   Handcrafted version of bn_div_words
 #
 #   r3 = h
 #   r4 = l
 #   r5 = d
 
 cmplwi  0,r5,0  # compare r5 and 0
 bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
 li  r3,-1   # d=0 return -1
 bclrBO_ALWAYS,CR0_LT
 .Lppcasm_div1:
 cmplwi  0,r3,0  # compare r3 and 0
 bc  BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
 divwu   r3,r4,r5# ret_q = l/d
 bclrBO_ALWAYS,CR0_LT# return result in r3
 .Lppcasm_div2:
 divwu   r9,r3,r5# i_q = h/d
 mullw   r10,r9,r5   # i_r = h - (i_q*d)
 subfr10,r10,r3
 mr  r3,r9   # req_q = i_q
 .Lppcasm_set_ctr:
 li  r12,32  # ctr = bitsizeof(d)
 mtctr   r12
 .Lppcasm_div_loop:
 addcr4,r4,r4# l = l  1 - i_carry
 adder11,r10,r10 # i_h = (i_r  1) | i_carry
 divwu   r9,r11,r5   # i_q = i_h/d
 mullw   r10,r9,r5   # i_r = i_h - (i_q*d)
 subfr10,r10,r11
 add r3,r3,r3# ret_q = ret_q  1 | i_q
 add r3,r3,r9
 bc  BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
 .Lppc_div_end:
 bclrBO_ALWAYS,CR0_LT# return result in r3
 .long   0x

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]