C function corresponding to assembly routine below. It's provided to ease review of the assembly. Other architectures will benefit if this C function is used in bn_asm.c
Regards, David unsigned long div_words (unsigned long h, unsigned long l, unsigned long d) { unsigned long i_h; /* intermediate dividend */ unsigned long i_q; /* quotient of i/d */ unsigned long i_r; /* remainder of i/d */ unsigned long i_cntr; unsigned long i_carry; unsigned long i_overflow; unsigned long ret_q; /* return quotient */ /* cannot divide by zero */ if (d == 0) return 0xffffffff; /* do simple 32-bit divide */ if (h == 0) return l/d; i_q = h/d; i_r = h - (i_q*d); ret_q = i_q; i_cntr = 32; while (i_cntr--) { i_carry = (l & 0x80000000) ? 1:0; l = l << 1; i_overflow = (i_r & 0x80000000) ? 1:0; i_h = (i_r << 1) | i_carry; i_q = i_h/d; i_q = i_q + i_overflow; i_s = i_q*d; i_r = i_h - (i_q*d); ret_q = (ret_q << 1) | i_q; } return ret_q; } On 7/7/05, David Ho <[EMAIL PROTECTED]> wrote: > Please do not use previously mentioned routine, it missed 1 corner > case where 32=num_bits_word(d) > > Revised routine that passes (cd test; make bntest). > All I had to do is add one more instruction to the routine. > > Please test on your ppc32 machines. > > Once we are all happy, it's a matter of adding the core dump at the beginning. > Thus you have a fast, easy to understand, predictable bn_div_words, as > opposed to that monster in 0.9.8. > > # > # Handcrafted version of bn_div_words > # > # r3 = h > # r4 = l > # r5 = d > > cmplwi 0,r5,0 # compare r5 and 0 > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=0 > li r3,-1 # d=0 return -1 > bclr BO_ALWAYS,CR0_LT > .Lppcasm_div1: > cmplwi 0,r3,0 # compare r3 and 0 > bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h != 0 > divwu r3,r4,r5 # ret_q = l/d > bclr BO_ALWAYS,CR0_LT # return result in r3 > .Lppcasm_div2: > divwu r9,r3,r5 # i_q = h/d > mullw r10,r9,r5 # i_r = h - (i_q*d) > subf r10,r10,r3 > mr r3,r9 # req_q = i_q > .Lppcasm_set_ctr: > li r12,32 # ctr = bitsizeof(d) > mtctr r12 > .Lppcasm_div_loop: > addc r4,r4,r4 # l = l << 1 -> i_carry > adde r11,r10,r10 # i_h = (i_r << 1) | i_carry > divwu r9,r11,r5 # i_q = i_h/d > addze r9,r9 # very important! - DKWH > mullw r10,r9,r5 # i_r = i_h - (i_q*d) > subf r10,r10,r11 > add r3,r3,r3 # ret_q = ret_q << 1 | i_q > add r3,r3,r9 > bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop > .Lppc_div_end: > bclr BO_ALWAYS,CR0_LT # return result in r3 > .long 0x00000000 > > > Regards, > David > > > On 7/5/05, Peter Waltenberg <[EMAIL PROTECTED]> wrote: > > > > Thanks for finding and fixing this. Particularly for finding and fixing it > > before 0.9.8 hit the streets. > > > > Peter > > > > Peter Waltenberg > > Architect > > IBM Crypto for C Team > > IBM/Tivoli Gold Coast Office > > > > > > > > > > Andy Polyakov <[EMAIL PROTECTED]> > > Sent by: [EMAIL PROTECTED] > > > > 06/07/2005 07:49 AM > > > > Please respond to > > openssl-dev > > > > > > To openssl-dev@openssl.org > > > > cc [EMAIL PROTECTED] > > > > Subject Re: PPC bn_div_words routine rewrite > > > > > > > > > > > > > Okay, having actually did what Andy suggested, i.e. the one liner fix > > > in the assembly code, bn_div_words returns the correct results. > > > > Note that the final version, one committed to all relevant OpenSSL > > branches since couple of days ago and one which actually made to just > > released 0.9.8, is a bit different from originally suggested one-line > > fix, see for example > > http://cvs.openssl.org/chngview?cn=14199. > > > > > At this point, my conclusion is, up to openssl-0.9.8-beta6, the ppc32 > > > bn_div_words routine generated from crypto/bn/ppc.pl is still busted. > > > > Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the > > bug condition. Recall that original problem report was for 0.9.7. > > > > > Why do you signal an overflow condition when it appears functions that > > > call bn_div_words do not check for overflow conditions? > > > > That's question to IBM. By the time they submitted the code, I've > > explicitly asked what would be appropriate way to generate *fatal* > > condition at that point, i.e. one which would result in a core dump, and > > it came out as division by 0 instruction. By that time I had no access > > to any PPC machine and had to just go with it. Now it actually came as > > surprise that division by 0 does not raise an exception, but silently > > returns implementation-specific value... A. > > ______________________________________________________________________ > > OpenSSL Project http://www.openssl.org > > Development Mailing List openssl-dev@openssl.org > > Automated List Manager [EMAIL PROTECTED] > > > > > ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]