Re: PPC bn_div_words routine rewrite
Please do not use previously mentioned routine, it missed 1 corner case where 32=num_bits_word(d) Revised routine that passes (cd test; make bntest). Does it mean that previous version didn't actually pass the test? I mean if it did on your CPU, but not mine, probably we could learn something else about ways PPC can be implemented... All I had to do is add one more instruction to the routine. Please test on your ppc32 machines. Once we are all happy, Is this your agenda? Make everybody happy:-):-):-) Good luck:-):-):-) it's a matter of adding the core dump at the beginning. Thus you have a fast, 32*(div latency + mul latency) is fast? If I call BN_bn2dec in loop it spins 4 times slower than with current implementation. Well, at least on computer I have access to... easy to understand, predictable bn_div_words, as opposed to that monster in 0.9.8. Hostility again? Are you saying that nobody understands current implementation and that it produces unpredictable results? I disagree:-) Other architectures will benefit if this C function is used in bn_asm.c How? And which architectures exactly? Virtually all 32-bit architectures, including PPC32, opt for (BN_ULONG)(BN_ULLONG)h)BN_BITS2)|l)/(BN_ULLONG)d). A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: PPC bn_div_words routine rewrite
Forgive my lack of knowledge in your existing code. But it is really designed with optimization in mind? What was the driving force for the C function? If it is optimized what is the time required? I jumped way to early at the fast conclusion I must admit. Because I really never had speed in mind. As I explained my goal is to make it easy to understand. If it has any performance advantage it is purely a side effect. (You never answer my comment about performance in my last email so I can only guess what the design intent was for you code). I mean if you choose to optimize my code for speed, it's perfectly doable and I have full comfidence anyone else who have read this email thread can do it. But again, I have no idea how much time you spend on your routine so I guess I should refrain from dissing it. My mistake once again. What else will you be teaching me today? =) David On 7/8/05, Andy Polyakov [EMAIL PROTECTED] wrote: Please do not use previously mentioned routine, it missed 1 corner case where 32=num_bits_word(d) Revised routine that passes (cd test; make bntest). Does it mean that previous version didn't actually pass the test? I mean if it did on your CPU, but not mine, probably we could learn something else about ways PPC can be implemented... All I had to do is add one more instruction to the routine. Please test on your ppc32 machines. Once we are all happy, Is this your agenda? Make everybody happy:-):-):-) Good luck:-):-):-) it's a matter of adding the core dump at the beginning. Thus you have a fast, 32*(div latency + mul latency) is fast? If I call BN_bn2dec in loop it spins 4 times slower than with current implementation. Well, at least on computer I have access to... easy to understand, predictable bn_div_words, as opposed to that monster in 0.9.8. Hostility again? Are you saying that nobody understands current implementation and that it produces unpredictable results? I disagree:-) Other architectures will benefit if this C function is used in bn_asm.c How? And which architectures exactly? Virtually all 32-bit architectures, including PPC32, opt for (BN_ULONG)(BN_ULLONG)h)BN_BITS2)|l)/(BN_ULLONG)d). A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: PPC bn_div_words routine rewrite
Please do not use previously mentioned routine, it missed 1 corner case where 32=num_bits_word(d) Revised routine that passes (cd test; make bntest). All I had to do is add one more instruction to the routine. Please test on your ppc32 machines. Once we are all happy, it's a matter of adding the core dump at the beginning. Thus you have a fast, easy to understand, predictable bn_div_words, as opposed to that monster in 0.9.8. # # Handcrafted version of bn_div_words # # r3 = h # r4 = l # r5 = d cmplwi 0,r5,0 # compare r5 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=0 li r3,-1 # d=0 return -1 bclrBO_ALWAYS,CR0_LT .Lppcasm_div1: cmplwi 0,r3,0 # compare r3 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != 0 divwu r3,r4,r5# ret_q = l/d bclrBO_ALWAYS,CR0_LT# return result in r3 .Lppcasm_div2: divwu r9,r3,r5# i_q = h/d mullw r10,r9,r5 # i_r = h - (i_q*d) subfr10,r10,r3 mr r3,r9 # req_q = i_q .Lppcasm_set_ctr: li r12,32 # ctr = bitsizeof(d) mtctr r12 .Lppcasm_div_loop: addcr4,r4,r4# l = l 1 - i_carry adder11,r10,r10 # i_h = (i_r 1) | i_carry divwu r9,r11,r5 # i_q = i_h/d addze r9,r9 # very important! - DKWH mullw r10,r9,r5 # i_r = i_h - (i_q*d) subfr10,r10,r11 add r3,r3,r3# ret_q = ret_q 1 | i_q add r3,r3,r9 bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop .Lppc_div_end: bclrBO_ALWAYS,CR0_LT# return result in r3 .long 0x Regards, David On 7/5/05, Peter Waltenberg [EMAIL PROTECTED] wrote: Thanks for finding and fixing this. Particularly for finding and fixing it before 0.9.8 hit the streets. Peter Peter Waltenberg Architect IBM Crypto for C Team IBM/Tivoli Gold Coast Office Andy Polyakov [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 06/07/2005 07:49 AM Please respond to openssl-dev To openssl-dev@openssl.org cc [EMAIL PROTECTED] Subject Re: PPC bn_div_words routine rewrite Okay, having actually did what Andy suggested, i.e. the one liner fix in the assembly code, bn_div_words returns the correct results. Note that the final version, one committed to all relevant OpenSSL branches since couple of days ago and one which actually made to just released 0.9.8, is a bit different from originally suggested one-line fix, see for example http://cvs.openssl.org/chngview?cn=14199. At this point, my conclusion is, up to openssl-0.9.8-beta6, the ppc32 bn_div_words routine generated from crypto/bn/ppc.pl is still busted. Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the bug condition. Recall that original problem report was for 0.9.7. Why do you signal an overflow condition when it appears functions that call bn_div_words do not check for overflow conditions? That's question to IBM. By the time they submitted the code, I've explicitly asked what would be appropriate way to generate *fatal* condition at that point, i.e. one which would result in a core dump, and it came out as division by 0 instruction. By that time I had no access to any PPC machine and had to just go with it. Now it actually came as surprise that division by 0 does not raise an exception, but silently returns implementation-specific value... A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: PPC bn_div_words routine rewrite
C function corresponding to assembly routine below. It's provided to ease review of the assembly. Other architectures will benefit if this C function is used in bn_asm.c Regards, David unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i/d */ unsigned long i_r; /* remainder of i/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long i_overflow; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0x; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d); ret_q = i_q; i_cntr = 32; while (i_cntr--) { i_carry = (l 0x8000) ? 1:0; l = l 1; i_overflow = (i_r 0x8000) ? 1:0; i_h = (i_r 1) | i_carry; i_q = i_h/d; i_q = i_q + i_overflow; i_s = i_q*d; i_r = i_h - (i_q*d); ret_q = (ret_q 1) | i_q; } return ret_q; } On 7/7/05, David Ho [EMAIL PROTECTED] wrote: Please do not use previously mentioned routine, it missed 1 corner case where 32=num_bits_word(d) Revised routine that passes (cd test; make bntest). All I had to do is add one more instruction to the routine. Please test on your ppc32 machines. Once we are all happy, it's a matter of adding the core dump at the beginning. Thus you have a fast, easy to understand, predictable bn_div_words, as opposed to that monster in 0.9.8. # # Handcrafted version of bn_div_words # # r3 = h # r4 = l # r5 = d cmplwi 0,r5,0 # compare r5 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=0 li r3,-1 # d=0 return -1 bclrBO_ALWAYS,CR0_LT .Lppcasm_div1: cmplwi 0,r3,0 # compare r3 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != 0 divwu r3,r4,r5# ret_q = l/d bclrBO_ALWAYS,CR0_LT# return result in r3 .Lppcasm_div2: divwu r9,r3,r5# i_q = h/d mullw r10,r9,r5 # i_r = h - (i_q*d) subfr10,r10,r3 mr r3,r9 # req_q = i_q .Lppcasm_set_ctr: li r12,32 # ctr = bitsizeof(d) mtctr r12 .Lppcasm_div_loop: addcr4,r4,r4# l = l 1 - i_carry adder11,r10,r10 # i_h = (i_r 1) | i_carry divwu r9,r11,r5 # i_q = i_h/d addze r9,r9 # very important! - DKWH mullw r10,r9,r5 # i_r = i_h - (i_q*d) subfr10,r10,r11 add r3,r3,r3# ret_q = ret_q 1 | i_q add r3,r3,r9 bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop .Lppc_div_end: bclrBO_ALWAYS,CR0_LT# return result in r3 .long 0x Regards, David On 7/5/05, Peter Waltenberg [EMAIL PROTECTED] wrote: Thanks for finding and fixing this. Particularly for finding and fixing it before 0.9.8 hit the streets. Peter Peter Waltenberg Architect IBM Crypto for C Team IBM/Tivoli Gold Coast Office Andy Polyakov [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 06/07/2005 07:49 AM Please respond to openssl-dev To openssl-dev@openssl.org cc [EMAIL PROTECTED] Subject Re: PPC bn_div_words routine rewrite Okay, having actually did what Andy suggested, i.e. the one liner fix in the assembly code, bn_div_words returns the correct results. Note that the final version, one committed to all relevant OpenSSL branches since couple of days ago and one which actually made to just released 0.9.8, is a bit different from originally suggested one-line fix, see for example http://cvs.openssl.org/chngview?cn=14199. At this point, my conclusion is, up to openssl-0.9.8-beta6, the ppc32 bn_div_words routine generated from crypto/bn/ppc.pl is still busted. Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the bug condition. Recall that original problem report was for 0.9.7. Why do you signal an overflow condition when it appears functions that call bn_div_words do not check for overflow conditions? That's question to IBM. By the time they submitted the code, I've explicitly asked what would be appropriate way to generate *fatal* condition at that point, i.e. one which would result in a core dump, and it came out as division by 0 instruction. By that time I had no access to any PPC machine and had to just go with it. Now it actually came as surprise that division by 0 does not raise an exception, but silently returns implementation-specific value
Re: PPC bn_div_words routine rewrite
Let's start the week off with less hostility and more productive criticism on this topic. If you want productivity, then provide real evidence in form of stack backtrace at segmentation violation point, disassemble output in the vicinity of segmentation violation point and 'info registers' output at the same point. As for hostility I leave it without comment, as you're apparently can outrank anybody in that area:-) But you're apparently right about a bug being present in PPC assembler. So you are saying there is a bug in the GCC assembler? How confident are you in that? Is the first correct step to examine the assembly code for errors before jumping to any conclusion that the GCC assembler is bad? Did I say GCC assembler? I said PPC assembler, which refers to crypto/bn/asm/ppc.pl. This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. Well, suggested routine apparently sends ssh-keygen on the PPC-based 32-bit system I have access to to an end-less loop... If you care to read the c function I supplied or if you don't believe it: If you understand ppc 32-bit instructions, as specified in the PowerPC Microprocessor Family: Programming Environments for 32-Bit Microprocessors. My routine would not be able to find a condition that will make it go into an end-less loop,unless you messed up bad somewhere. I didn't say that suggested routine goes into an end-less loop, but that it *apparently* sends ssh-keygen into end-less loop. I made no claims about which routine exactly loops, and I even admit that I don't know for sure if it was in fact end-less loop, because I've chosen to kill the process after couple of minutes. Note that normally it takes just few seconds on the machine I've tested on, so that couple of minutes is essentially unacceptable and by all means *appears* as end-less loop. In summary, what I am trying to provide the community is an alternative to ... the current implementation of which is very questionable. crypto/bn/asm/ppc.pl distributed with OpenSSL was designed for and explicitly tested by IBM under 32- and 64-bit PPC Linux, 32- and 64-bit AIX, as well as 32-bit MacOS X. Special care was taken to make sure that neither of ABIs/calling conventions used by above listed platforms are violated, so that module can be safely invoked by compiler-generated code for above mentioned OSes. Afterwards there were reports that it was successfully used on unspecified [in report] embedded PPC-based platform. Despite this on Friday I could personally confirm on 32-bit MacOS X that there admittedly was a bug in ppc.pl, which manifests as failure to generate sane decimal ASCII presentation of a BIGNUM, which is exactly the kind of operation taking place when you run ssh-keygen -t rsa1 [it should be noted decimal ASCII is unfortunately not covered by 'make test_bn' suite]. But under no circumstances segmentation violation was observed. At the same time I could personally confirm that if pasted into osx32_ppc.s, suggested implementation induces 'make test_bn' failure on 32-bit MacOS X. In particular test/bntest terminates with print test BN_sqr\n -FF554CAEAE * -FF554CAEAE - FEAB0B30019BBA80FE44 Square test failed! 1 while test/exptest: BN_mod_exp_recp() problems 14482:error:03082065:bignum routines:BN_div_recp:bad reciprocal:bn_recp.c:194: For me it's enough reasons to become sceptical to submission and conduct trouble-shooting of my own. Currently available ppc.pl was verified to pass 'make test_bn' on 32-bit MacOS X, 32- and 64-bit AIX [tested by IBM], as well as to generate correct decimal ASCII presentation on the mentioned platforms. If it doesn't work for you, then submit information listed in the beginning of the letter. A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Fwd: PPC bn_div_words routine rewrite
This is the second confirmed report of the same problem on the ppc8xx. After reading my email. I must say I was the unfriendly one, I apologize for that. More debugging evidence to come. -- Forwarded message -- From: Murch, Christopher [EMAIL PROTECTED] Date: Jul 1, 2005 9:46 AM Subject: RE: PPC bn_div_words routine rewrite To: David Ho [EMAIL PROTECTED] David, I had observed the same issue on ppc 8xx machines after upgrading to the asm version of the BN routines. Thank you very much for your work for the fix. My question is, do you have high confidence in the other new asm ppc BN routines after observing this issue or do you think they might have similiar problems? Thanks. Chris -Original Message- From: David Ho [mailto:[EMAIL PROTECTED] Sent: Thursday, June 30, 2005 6:22 PM To: openssl-dev@openssl.org; [EMAIL PROTECTED] Subject: Re: PPC bn_div_words routine rewrite The reason I had to redo this routine, in case anyone is wondering, is because ssh-keygen segfaults when this assembly routine returns junk to the BN_div_word function. On a ppc, if you issue the command ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N The program craps out when it tries to write the public key in ascii decimal. Regards, David On 6/30/05, David Ho [EMAIL PROTECTED] wrote: Hi all, This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. I initially thought there is maybe a small mistake in the code that requires a one-liner change but it turns out I have to redo the routine. I guess this routine is not called very often as I see that most other routines are hand-crafted, whereas this routine is compiled from a C function that apparently has not gone through a whole lot of testing. I wrote a C function to confirm correctness of the code. unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i_h/d */ unsigned long i_r; /* remainder of i_h/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0x; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d); ret_q = i_q; i_cntr = 32; while (i_cntr--) { i_carry = (l 0x8000) ? 1:0; l = l 1; i_h = (i_r 1) | i_carry; i_q = i_h/d; i_r = i_h - (i_q*d); ret_q = (ret_q 1) | i_q; } return ret_q; } Then I handcrafted the routine in PPC assembly. The result is a 26 line assembly that is easy to understand and predictable as opposed to a 81liner that I am still trying to decipher... If anyone is interested in incorporating this routine to the openssl code I'll be happy to assist. At this point I think I will be taking a bit of a break from this 3 day debugging/fixing marathon. Regards, David Ho # # Handcrafted version of bn_div_words # # r3 = h # r4 = l # r5 = d cmplwi 0,r5,0 # compare r5 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=0 li r3,-1 # d=0 return -1 bclrBO_ALWAYS,CR0_LT .Lppcasm_div1: cmplwi 0,r3,0 # compare r3 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != 0 divwu r3,r4,r5# ret_q = l/d bclrBO_ALWAYS,CR0_LT# return result in r3 .Lppcasm_div2: divwu r9,r3,r5# i_q = h/d mullw r10,r9,r5 # i_r = h - (i_q*d) subfr10,r10,r3 mr r3,r9 # req_q = i_q .Lppcasm_set_ctr: li r12,32 # ctr = bitsizeof(d) mtctr r12 .Lppcasm_div_loop: addcr4,r4,r4# l = l 1 - i_carry adder11,r10,r10 # i_h = (i_r 1) | i_carry divwu r9,r11,r5 # i_q = i_h/d mullw r10,r9,r5 # i_r = i_h - (i_q*d) subfr10,r10,r11 add r3,r3,r3# ret_q = ret_q 1 | i_q add r3,r3,r9 bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop .Lppc_div_end: bclrBO_ALWAYS,CR0_LT# return result in r3 .long 0x ___ Linuxppc-embedded mailing list [EMAIL PROTECTED] https://ozlabs.org/mailman/listinfo/linuxppc-embedded __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: PPC bn_div_words routine rewrite
First pass debugging results from gdb on ppc8xx. Executing ssh-keygen with following arguments. (gdb) show args Argument list to give program being debugged when it is started is -t rsa1 -f /etc/ssh/ssh_host_key -N . Program received signal SIGSEGV, Segmentation fault. BN_bn2dec (a=0x1002d9f0) at bn_print.c:136 136 *lp=BN_div_word(t,BN_DEC_CONV); (gdb) i r r0 0x0 0 r1 0x7fffd580 2147472768 r2 0x30012868 805382248 r3 0x8000 2147483648 r4 0xfef33fc267334652 r5 0x25 37 r6 0xfccdef8265084664 r7 0x7fffd4c0 2147472576 r8 0xfbad2887 4222429319 r9 0x84044022 2214871074 r100x0 0 r110x2 2 r120xfef2054267329620 r130x10030bc8 268635080 r140x0 0 r150x0 0 r160x0 0 r170x0 0 r180x0 0 r190x0 0 r200x0 0 r210x0 0 r220x0 0 r230x64 100 r240x5 5 r250x1002d438 268620856 r260x1002d9f0 268622320 r270x1002c578 268617080 r280x1 1 r290x10031000 268636160 r300xffbf7d0268171216 r310x1002d9f0 268622320 pc 0xfef2058267329624 ps 0xd032 53298 cr 0x24044022 604258338 lr 0xfef2054267329620 ctr0xfccefa0265088928 xer0x2000 536870912 fpscr 0x0 0 vscr 0x0 0 vrsave 0x0 0 (gdb) p/x $pc $1 = 0xfef2058 0x0fef2058 BN_bn2dec+472: stw r3,0(r29) (gdb) x 0x10031000 0x10031000: Cannot access memory at address 0x10031000 On 7/5/05, David Ho [EMAIL PROTECTED] wrote: This is the second confirmed report of the same problem on the ppc8xx. After reading my email. I must say I was the unfriendly one, I apologize for that. More debugging evidence to come. -- Forwarded message -- From: Murch, Christopher [EMAIL PROTECTED] Date: Jul 1, 2005 9:46 AM Subject: RE: PPC bn_div_words routine rewrite To: David Ho [EMAIL PROTECTED] David, I had observed the same issue on ppc 8xx machines after upgrading to the asm version of the BN routines. Thank you very much for your work for the fix. My question is, do you have high confidence in the other new asm ppc BN routines after observing this issue or do you think they might have similiar problems? Thanks. Chris -Original Message- From: David Ho [mailto:[EMAIL PROTECTED] Sent: Thursday, June 30, 2005 6:22 PM To: openssl-dev@openssl.org; [EMAIL PROTECTED] Subject: Re: PPC bn_div_words routine rewrite The reason I had to redo this routine, in case anyone is wondering, is because ssh-keygen segfaults when this assembly routine returns junk to the BN_div_word function. On a ppc, if you issue the command ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N The program craps out when it tries to write the public key in ascii decimal. Regards, David On 6/30/05, David Ho [EMAIL PROTECTED] wrote: Hi all, This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. I initially thought there is maybe a small mistake in the code that requires a one-liner change but it turns out I have to redo the routine. I guess this routine is not called very often as I see that most other routines are hand-crafted, whereas this routine is compiled from a C function that apparently has not gone through a whole lot of testing. I wrote a C function to confirm correctness of the code. unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i_h/d */ unsigned long i_r; /* remainder of i_h/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0x; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d); ret_q = i_q; i_cntr = 32; while (i_cntr--) { i_carry = (l 0x8000) ? 1:0; l = l 1; i_h = (i_r 1) | i_carry; i_q = i_h/d; i_r = i_h - (i_q*d); ret_q = (ret_q 1) | i_q; } return ret_q; } Then I handcrafted the routine in PPC assembly. The result is a 26 line assembly that is easy to understand and predictable as opposed to a 81liner that I am still trying to decipher... If anyone is interested in incorporating this routine
Re: PPC bn_div_words routine rewrite
I can tell you with certainty, with reference to the function BN_bn2dec, that since lp is a pointer, and within the while loop around bn_print.c:136 lp is being incremented. Because the test BN_is_zero(t) is always false, you have a pointer that is going off into the stratosphere, hence the segfault on ppc8xx. More analysis to come. On 7/5/05, David Ho [EMAIL PROTECTED] wrote: First pass debugging results from gdb on ppc8xx. Executing ssh-keygen with following arguments. (gdb) show args Argument list to give program being debugged when it is started is -t rsa1 -f /etc/ssh/ssh_host_key -N . Program received signal SIGSEGV, Segmentation fault. BN_bn2dec (a=0x1002d9f0) at bn_print.c:136 136 *lp=BN_div_word(t,BN_DEC_CONV); (gdb) i r r0 0x0 0 r1 0x7fffd580 2147472768 r2 0x30012868 805382248 r3 0x8000 2147483648 r4 0xfef33fc267334652 r5 0x25 37 r6 0xfccdef8265084664 r7 0x7fffd4c0 2147472576 r8 0xfbad2887 4222429319 r9 0x84044022 2214871074 r100x0 0 r110x2 2 r120xfef2054267329620 r130x10030bc8 268635080 r140x0 0 r150x0 0 r160x0 0 r170x0 0 r180x0 0 r190x0 0 r200x0 0 r210x0 0 r220x0 0 r230x64 100 r240x5 5 r250x1002d438 268620856 r260x1002d9f0 268622320 r270x1002c578 268617080 r280x1 1 r290x10031000 268636160 r300xffbf7d0268171216 r310x1002d9f0 268622320 pc 0xfef2058267329624 ps 0xd032 53298 cr 0x24044022 604258338 lr 0xfef2054267329620 ctr0xfccefa0265088928 xer0x2000 536870912 fpscr 0x0 0 vscr 0x0 0 vrsave 0x0 0 (gdb) p/x $pc $1 = 0xfef2058 0x0fef2058 BN_bn2dec+472: stw r3,0(r29) (gdb) x 0x10031000 0x10031000: Cannot access memory at address 0x10031000 On 7/5/05, David Ho [EMAIL PROTECTED] wrote: This is the second confirmed report of the same problem on the ppc8xx. After reading my email. I must say I was the unfriendly one, I apologize for that. More debugging evidence to come. -- Forwarded message -- From: Murch, Christopher [EMAIL PROTECTED] Date: Jul 1, 2005 9:46 AM Subject: RE: PPC bn_div_words routine rewrite To: David Ho [EMAIL PROTECTED] David, I had observed the same issue on ppc 8xx machines after upgrading to the asm version of the BN routines. Thank you very much for your work for the fix. My question is, do you have high confidence in the other new asm ppc BN routines after observing this issue or do you think they might have similiar problems? Thanks. Chris -Original Message- From: David Ho [mailto:[EMAIL PROTECTED] Sent: Thursday, June 30, 2005 6:22 PM To: openssl-dev@openssl.org; [EMAIL PROTECTED] Subject: Re: PPC bn_div_words routine rewrite The reason I had to redo this routine, in case anyone is wondering, is because ssh-keygen segfaults when this assembly routine returns junk to the BN_div_word function. On a ppc, if you issue the command ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N The program craps out when it tries to write the public key in ascii decimal. Regards, David On 6/30/05, David Ho [EMAIL PROTECTED] wrote: Hi all, This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. I initially thought there is maybe a small mistake in the code that requires a one-liner change but it turns out I have to redo the routine. I guess this routine is not called very often as I see that most other routines are hand-crafted, whereas this routine is compiled from a C function that apparently has not gone through a whole lot of testing. I wrote a C function to confirm correctness of the code. unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i_h/d */ unsigned long i_r; /* remainder of i_h/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0x; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d
Re: PPC bn_div_words routine rewrite
Let's take first call to BN_div_word for example from BN_bn2dec, the parameter being passed to BN_div_word is (a=35, w=10) (decimal numbers). It then calls the bn_div_words with (h=0, l=35, d=10) if you examine the code in linux_ppc32.s it will exit early on because h is 0. the routine returns a divide by 0, which is undefined according to the manual. In the case of ppc8xx the result is 0x8000. So this is the return value from bn_div_words, as seen in register R3. So what happens next is BN_div_word modifies a (1st parameter) with the result (0x8000) and returns 23 as the remainder of the division. So a is never zero as a result and hence the test for BN_is_zero is always false. The problem fails the very first time it uses bn_div_words. The next thing I did naturally was to fix the case when you have h=0, which you can quite easy do it with the native divwu instruction. Lo and behold I was once again disappointed when h is not equal to 0. More to come... On 7/5/05, David Ho [EMAIL PROTECTED] wrote: I can tell you with certainty, with reference to the function BN_bn2dec, that since lp is a pointer, and within the while loop around bn_print.c:136 lp is being incremented. Because the test BN_is_zero(t) is always false, you have a pointer that is going off into the stratosphere, hence the segfault on ppc8xx. More analysis to come. On 7/5/05, David Ho [EMAIL PROTECTED] wrote: First pass debugging results from gdb on ppc8xx. Executing ssh-keygen with following arguments. (gdb) show args Argument list to give program being debugged when it is started is -t rsa1 -f /etc/ssh/ssh_host_key -N . Program received signal SIGSEGV, Segmentation fault. BN_bn2dec (a=0x1002d9f0) at bn_print.c:136 136 *lp=BN_div_word(t,BN_DEC_CONV); (gdb) i r r0 0x0 0 r1 0x7fffd580 2147472768 r2 0x30012868 805382248 r3 0x8000 2147483648 r4 0xfef33fc267334652 r5 0x25 37 r6 0xfccdef8265084664 r7 0x7fffd4c0 2147472576 r8 0xfbad2887 4222429319 r9 0x84044022 2214871074 r100x0 0 r110x2 2 r120xfef2054267329620 r130x10030bc8 268635080 r140x0 0 r150x0 0 r160x0 0 r170x0 0 r180x0 0 r190x0 0 r200x0 0 r210x0 0 r220x0 0 r230x64 100 r240x5 5 r250x1002d438 268620856 r260x1002d9f0 268622320 r270x1002c578 268617080 r280x1 1 r290x10031000 268636160 r300xffbf7d0268171216 r310x1002d9f0 268622320 pc 0xfef2058267329624 ps 0xd032 53298 cr 0x24044022 604258338 lr 0xfef2054267329620 ctr0xfccefa0265088928 xer0x2000 536870912 fpscr 0x0 0 vscr 0x0 0 vrsave 0x0 0 (gdb) p/x $pc $1 = 0xfef2058 0x0fef2058 BN_bn2dec+472: stw r3,0(r29) (gdb) x 0x10031000 0x10031000: Cannot access memory at address 0x10031000 On 7/5/05, David Ho [EMAIL PROTECTED] wrote: This is the second confirmed report of the same problem on the ppc8xx. After reading my email. I must say I was the unfriendly one, I apologize for that. More debugging evidence to come. -- Forwarded message -- From: Murch, Christopher [EMAIL PROTECTED] Date: Jul 1, 2005 9:46 AM Subject: RE: PPC bn_div_words routine rewrite To: David Ho [EMAIL PROTECTED] David, I had observed the same issue on ppc 8xx machines after upgrading to the asm version of the BN routines. Thank you very much for your work for the fix. My question is, do you have high confidence in the other new asm ppc BN routines after observing this issue or do you think they might have similiar problems? Thanks. Chris -Original Message- From: David Ho [mailto:[EMAIL PROTECTED] Sent: Thursday, June 30, 2005 6:22 PM To: openssl-dev@openssl.org; [EMAIL PROTECTED] Subject: Re: PPC bn_div_words routine rewrite The reason I had to redo this routine, in case anyone is wondering, is because ssh-keygen segfaults when this assembly routine returns junk to the BN_div_word function. On a ppc, if you issue the command ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N The program craps out when it tries to write the public key in ascii decimal. Regards, David On 6/30/05
Re: PPC bn_div_words routine rewrite
Let's take first call to BN_div_word for example from BN_bn2dec, the parameter being passed to BN_div_word is (a=35, w=10) (decimal numbers). It then calls the bn_div_words with (h=0, l=35, d=10) if you examine the code in linux_ppc32.s it will exit early on because h is 0. the routine returns a divide by 0, which is undefined according to the manual. In the case of ppc8xx the result is 0x8000. And on the PPC machine I have access to it returns 0. This is explanation why I never experienced any SEGV, but a sparse decimal output. And it does explain why BN_is_zero condition never met on your system and you hit sbrk(0) limit and suffer the penalty. However! Note that updated routine, http://cvs.openssl.org/getfile/openssl/crypto/bn/asm/ppc.pl?v=1.4 never issues divide by 0 [it traps instead, but the condition is never met now when called from BN_div_words] and it does return correct answer to me. Can you really confirm that updated subroutine doesn't work for you? And if so, how does problem manifest? Still SEGV? At same point? It should pointed out that bug in ppc.pl is encountered only in 0.9.7 context, as 0.9.8 avoids it by normalizing divisor [and adjusting dividend accordingly]. BTW, I can confirm that 0.9.7 produces correct decimal ASCII with your routine [but no luck with make test_bn], but in 0.9.8 context decimal printout comes out truncated [not sparse with some significant digits there and there, but truncated] if your routine is pasted in. A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: PPC bn_div_words routine rewrite
: This is the second confirmed report of the same problem on the ppc8xx. After reading my email. I must say I was the unfriendly one, I apologize for that. More debugging evidence to come. -- Forwarded message -- From: Murch, Christopher [EMAIL PROTECTED] Date: Jul 1, 2005 9:46 AM Subject: RE: PPC bn_div_words routine rewrite To: David Ho [EMAIL PROTECTED] David, I had observed the same issue on ppc 8xx machines after upgrading to the asm version of the BN routines. Thank you very much for your work for the fix. My question is, do you have high confidence in the other new asm ppc BN routines after observing this issue or do you think they might have similiar problems? Thanks. Chris -Original Message- From: David Ho [mailto:[EMAIL PROTECTED] Sent: Thursday, June 30, 2005 6:22 PM To: openssl-dev@openssl.org; [EMAIL PROTECTED] Subject: Re: PPC bn_div_words routine rewrite The reason I had to redo this routine, in case anyone is wondering, is because ssh-keygen segfaults when this assembly routine returns junk to the BN_div_word function. On a ppc, if you issue the command ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N The program craps out when it tries to write the public key in ascii decimal. Regards, David On 6/30/05, David Ho [EMAIL PROTECTED] wrote: Hi all, This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. I initially thought there is maybe a small mistake in the code that requires a one-liner change but it turns out I have to redo the routine. I guess this routine is not called very often as I see that most other routines are hand-crafted, whereas this routine is compiled from a C function that apparently has not gone through a whole lot of testing. I wrote a C function to confirm correctness of the code. unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i_h/d */ unsigned long i_r; /* remainder of i_h/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0x; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d); ret_q = i_q; i_cntr = 32; while (i_cntr--) { i_carry = (l 0x8000) ? 1:0; l = l 1; i_h = (i_r 1) | i_carry; i_q = i_h/d; i_r = i_h - (i_q*d); ret_q = (ret_q 1) | i_q; } return ret_q; } Then I handcrafted the routine in PPC assembly. The result is a 26 line assembly that is easy to understand and predictable as opposed to a 81liner that I am still trying to decipher... If anyone is interested in incorporating this routine to the openssl code I'll be happy to assist. At this point I think I will be taking a bit of a break from this 3 day debugging/fixing marathon. Regards, David Ho # # Handcrafted version of bn_div_words # # r3 = h # r4 = l # r5 = d cmplwi 0,r5,0 # compare r5 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=0 li r3,-1 # d=0 return -1 bclrBO_ALWAYS,CR0_LT .Lppcasm_div1: cmplwi 0,r3,0 # compare r3 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != 0 divwu r3,r4,r5# ret_q = l/d bclrBO_ALWAYS,CR0_LT# return result in r3 .Lppcasm_div2: divwu r9,r3,r5# i_q = h/d mullw r10,r9,r5 # i_r = h - (i_q*d) subfr10,r10,r3 mr r3,r9 # req_q = i_q .Lppcasm_set_ctr: li r12,32 # ctr = bitsizeof(d) mtctr r12 .Lppcasm_div_loop: addcr4,r4,r4# l = l 1 - i_carry adder11,r10,r10 # i_h = (i_r 1) | i_carry divwu r9,r11,r5 # i_q = i_h/d mullw r10,r9,r5 # i_r = i_h - (i_q*d) subfr10,r10,r11 add r3,r3,r3# ret_q = ret_q 1 | i_q add r3,r3,r9 bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop .Lppc_div_end: bclrBO_ALWAYS
Re: PPC bn_div_words routine rewrite
Okay, having actually did what Andy suggested, i.e. the one liner fix in the assembly code, bn_div_words returns the correct results. Note that the final version, one committed to all relevant OpenSSL branches since couple of days ago and one which actually made to just released 0.9.8, is a bit different from originally suggested one-line fix, see for example http://cvs.openssl.org/chngview?cn=14199. At this point, my conclusion is, up to openssl-0.9.8-beta6, the ppc32 bn_div_words routine generated from crypto/bn/ppc.pl is still busted. Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the bug condition. Recall that original problem report was for 0.9.7. Why do you signal an overflow condition when it appears functions that call bn_div_words do not check for overflow conditions? That's question to IBM. By the time they submitted the code, I've explicitly asked what would be appropriate way to generate *fatal* condition at that point, i.e. one which would result in a core dump, and it came out as division by 0 instruction. By that time I had no access to any PPC machine and had to just go with it. Now it actually came as surprise that division by 0 does not raise an exception, but silently returns implementation-specific value... A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: PPC bn_div_words routine rewrite
The reason I had to redo this routine, in case anyone is wondering, is because ssh-keygen segfaults when this assembly routine returns junk to the BN_div_word function. On a ppc, if you issue the command ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N The program craps out when it tries to write the public key in ascii decimal. If would help if you provide evidence such as debugger stack trace and program output. Provided description makes no sense. seg-faults when routine returns junk to BN_div_word? Seg-fault [segmentation violation] can occur when you write something to memory and nothing gets written to memory upon result return. BN_div_word does write to memory, but I fail to see how a bogus value could possibly trigger seg-fault. The only possibility is that assembler doesn't follow ABI convention and corrupts registers, which caller is using/expects to be preserved by callee. There're several PPC ABI flavors in use, but OpenSSL routines were designed ABI-neutral, Well, neutrality really means common denominator for ABI specs examined at the moment of coding, so there is a window of opportunity that it won't be neutral to future ABI, but is it really case? That your system uses some newly designed PPC ABI? You never mentioned what's your system... But you're apparently right about a bug being present in PPC assembler. I too have got insane [with very few significant digits] decimal printout of public key generated by ssh-keygen... This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. Well, suggested routine apparently sends ssh-keygen on the PPC-based 32-bit system I have access to to an end-less loop... And (cd test; make test_bn) fails early in BN_sqr... And test/exptest fails miserably with bad reciprocal... I initially thought there is maybe a small mistake in the code that requires a one-liner change But apparently this appears to be the case! Please verify following: --- crypto/bn/asm/ppc.pl.orig2004-04-28 00:05:50.0 +0200 +++ crypto/bn/asm/ppc.pl 2005-07-01 18:58:21.105656512 +0200 @@ -1717,7 +1717,7 @@ li r9,1# r9=1 $SHLr10,r9,r8 # r9=r8 $UCMP 0,r3,r10# - bc BO_IF,CR0_GT,Lppcasm_div2 #or if (h (1r8)) + bc BO_IF_NOT,CR0_GT,Lppcasm_div2 #or if (h (1r8)) $UDIV r3,r3,r0#if not assert(0) divide by 0! #that's how we signal overflow bclrBO_ALWAYS,CR0_LT#return. NEVER REACHED. A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
PPC bn_div_words routine rewrite
Hi all, This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. I initially thought there is maybe a small mistake in the code that requires a one-liner change but it turns out I have to redo the routine. I guess this routine is not called very often as I see that most other routines are hand-crafted, whereas this routine is compiled from a C function that apparently has not gone through a whole lot of testing. I wrote a C function to confirm correctness of the code. unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i_h/d */ unsigned long i_r; /* remainder of i_h/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0x; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d); ret_q = i_q; i_cntr = 32; while (i_cntr--) { i_carry = (l 0x8000) ? 1:0; l = l 1; i_h = (i_r 1) | i_carry; i_q = i_h/d; i_r = i_h - (i_q*d); ret_q = (ret_q 1) | i_q; } return ret_q; } Then I handcrafted the routine in PPC assembly. The result is a 26 line assembly that is easy to understand and predictable as opposed to a 81liner that I am still trying to decipher... If anyone is interested in incorporating this routine to the openssl code I'll be happy to assist. At this point I think I will be taking a bit of a break from this 3 day debugging/fixing marathon. Regards, David Ho # # Handcrafted version of bn_div_words # # r3 = h # r4 = l # r5 = d cmplwi 0,r5,0 # compare r5 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=0 li r3,-1 # d=0 return -1 bclrBO_ALWAYS,CR0_LT .Lppcasm_div1: cmplwi 0,r3,0 # compare r3 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != 0 divwu r3,r4,r5# ret_q = l/d bclrBO_ALWAYS,CR0_LT# return result in r3 .Lppcasm_div2: divwu r9,r3,r5# i_q = h/d mullw r10,r9,r5 # i_r = h - (i_q*d) subfr10,r10,r3 mr r3,r9 # req_q = i_q .Lppcasm_set_ctr: li r12,32 # ctr = bitsizeof(d) mtctr r12 .Lppcasm_div_loop: addcr4,r4,r4# l = l 1 - i_carry adder11,r10,r10 # i_h = (i_r 1) | i_carry divwu r9,r11,r5 # i_q = i_h/d mullw r10,r9,r5 # i_r = i_h - (i_q*d) subfr10,r10,r11 add r3,r3,r3# ret_q = ret_q 1 | i_q add r3,r3,r9 bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop .Lppc_div_end: bclrBO_ALWAYS,CR0_LT# return result in r3 .long 0x __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
PPC bn_div_words routine rewrite - resend
Hi all, This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. I initially thought there is maybe a small mistake in the code that requires a one-liner change but it turns out I have to redo the routine. I guess this routine is not called very often as I see that most other routines are hand-crafted, whereas this routine is compiled from a C function that apparently has not gone through a whole lot of testing. I wrote a C function to confirm correctness of the code. unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i_h/d */ unsigned long i_r; /* remainder of i_h/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0x; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d); ret_q = i_q; i_cntr = 32; while (i_cntr--) { i_carry = (l 0x8000) ? 1:0; l = l 1; i_h = (i_r 1) | i_carry; i_q = i_h/d; i_r = i_h - (i_q*d); ret_q = (ret_q 1) | i_q; } return ret_q; } Then I handcrafted the routine in PPC assembly. The result is a 26 line assembly that is easy to understand and predictable as opposed to a 81liner that I am still trying to decipher... If anyone is interested in incorporating this routine to the openssl code I'll be happy to assist. At this point I think I will be taking a bit of a break from this 3 day debugging/fixing marathon. Regards, David Ho # # Handcrafted version of bn_div_words # # r3 = h # r4 = l # r5 = d cmplwi 0,r5,0 # compare r5 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=0 li r3,-1 # d=0 return -1 bclrBO_ALWAYS,CR0_LT .Lppcasm_div1: cmplwi 0,r3,0 # compare r3 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != 0 divwu r3,r4,r5# ret_q = l/d bclrBO_ALWAYS,CR0_LT# return result in r3 .Lppcasm_div2: divwu r9,r3,r5# i_q = h/d mullw r10,r9,r5 # i_r = h - (i_q*d) subfr10,r10,r3 mr r3,r9 # req_q = i_q .Lppcasm_set_ctr: li r12,32 # ctr = bitsizeof(d) mtctr r12 .Lppcasm_div_loop: addcr4,r4,r4# l = l 1 - i_carry adder11,r10,r10 # i_h = (i_r 1) | i_carry divwu r9,r11,r5 # i_q = i_h/d mullw r10,r9,r5 # i_r = i_h - (i_q*d) subfr10,r10,r11 add r3,r3,r3# ret_q = ret_q 1 | i_q add r3,r3,r9 bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop .Lppc_div_end: bclrBO_ALWAYS,CR0_LT# return result in r3 .long 0x __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: PPC bn_div_words routine rewrite
The reason I had to redo this routine, in case anyone is wondering, is because ssh-keygen segfaults when this assembly routine returns junk to the BN_div_word function. On a ppc, if you issue the command ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N The program craps out when it tries to write the public key in ascii decimal. Regards, David On 6/30/05, David Ho [EMAIL PROTECTED] wrote: Hi all, This is a rewrite of the bn_div_words routine for the PowerPC arch, tested on a MPC8xx processor. I initially thought there is maybe a small mistake in the code that requires a one-liner change but it turns out I have to redo the routine. I guess this routine is not called very often as I see that most other routines are hand-crafted, whereas this routine is compiled from a C function that apparently has not gone through a whole lot of testing. I wrote a C function to confirm correctness of the code. unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i_h/d */ unsigned long i_r; /* remainder of i_h/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0x; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d); ret_q = i_q; i_cntr = 32; while (i_cntr--) { i_carry = (l 0x8000) ? 1:0; l = l 1; i_h = (i_r 1) | i_carry; i_q = i_h/d; i_r = i_h - (i_q*d); ret_q = (ret_q 1) | i_q; } return ret_q; } Then I handcrafted the routine in PPC assembly. The result is a 26 line assembly that is easy to understand and predictable as opposed to a 81liner that I am still trying to decipher... If anyone is interested in incorporating this routine to the openssl code I'll be happy to assist. At this point I think I will be taking a bit of a break from this 3 day debugging/fixing marathon. Regards, David Ho # # Handcrafted version of bn_div_words # # r3 = h # r4 = l # r5 = d cmplwi 0,r5,0 # compare r5 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=0 li r3,-1 # d=0 return -1 bclrBO_ALWAYS,CR0_LT .Lppcasm_div1: cmplwi 0,r3,0 # compare r3 and 0 bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != 0 divwu r3,r4,r5# ret_q = l/d bclrBO_ALWAYS,CR0_LT# return result in r3 .Lppcasm_div2: divwu r9,r3,r5# i_q = h/d mullw r10,r9,r5 # i_r = h - (i_q*d) subfr10,r10,r3 mr r3,r9 # req_q = i_q .Lppcasm_set_ctr: li r12,32 # ctr = bitsizeof(d) mtctr r12 .Lppcasm_div_loop: addcr4,r4,r4# l = l 1 - i_carry adder11,r10,r10 # i_h = (i_r 1) | i_carry divwu r9,r11,r5 # i_q = i_h/d mullw r10,r9,r5 # i_r = i_h - (i_q*d) subfr10,r10,r11 add r3,r3,r3# ret_q = ret_q 1 | i_q add r3,r3,r9 bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop .Lppc_div_end: bclrBO_ALWAYS,CR0_LT# return result in r3 .long 0x __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]