Re: [U-Boot] [PATCH v3] fsl_ddr: Don't use full 64-bit divides on 32-bit PowerPC

2011-04-13 Thread Wolfgang Denk
Dear Kyle Moffett,

In message <1300202627-7245-1-git-send-email-kyle.d.moff...@boeing.com> you 
wrote:
> The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
> integer divide operations to convert between nanoseconds and DDR clock
> cycles given arbitrary DDR clock frequencies.
...
> +/* To avoid 64-bit full-divides, we factor this here */
> +#define ULL_2e12 2ULL
> +#define UL_5pow12 244140625UL
> +#define UL_2pow13 (1UL << 13)
> +
> +#define ULL_8Fs 0xULL

Unfortunately this is already in mainline.  Can you please send a
cleanup patch to fix the CamelCaps macro names?

Thanks.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
Every program has at least one bug and can be shortened by  at  least
one  instruction  --  from  which,  by induction, one can deduce that
every program can be reduced to one instruction which doesn't work.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v3] fsl_ddr: Don't use full 64-bit divides on 32-bit PowerPC

2011-03-31 Thread Kumar Gala

On Mar 15, 2011, at 10:23 AM, Kyle Moffett wrote:

> The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
> integer divide operations to convert between nanoseconds and DDR clock
> cycles given arbitrary DDR clock frequencies.
> 
> Since all of the inputs to this are 32-bit (nanoseconds, clock cycles,
> and DDR frequencies), we can easily restructure the computation to use
> the "do_div()" function to perform 64-bit/32-bit divide operations.
> 
> On 64-bit this change is basically a no-op, because do_div is
> implemented as a literal 64-bit divide operation and the instruction
> scheduling works out almost the same.
> 
> On 32-bit PowerPC a fully accurate 64/64 divide (__udivdi3 in libgcc) is
> over 1.1kB of code and thousands of heavily dependent cycles to compute,
> all of which is linked from libgcc.  Another 1.2kB of code comes in for
> the function __umoddi3.
> 
> It should be noted that nothing else in U-Boot or the Linux kernel seems
> to require a full 64-bit divide on my 32-bit PowerPC.
> 
> Build-and-boot-tested on the HWW-1U-1A board using DDR2 SPD detection.
> 
> Signed-off-by: Kyle Moffett 
> Acked-by: York Sun 
> Cc: Andy Fleming 
> Cc: Kumar Gala 
> 
> --
> Changelog:
> v2: Resubmitted separately from the other HWW-1U-1A patches
> v3: Rebased on the 'next' branch of git://git.denx.de/u-boot-mpc85xx.git
> 
> arch/powerpc/cpu/mpc8xxx/ddr/util.c |   56 +--
> 1 files changed, 40 insertions(+), 16 deletions(-)

applied to 85xx next

- k
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


[U-Boot] [PATCH v3] fsl_ddr: Don't use full 64-bit divides on 32-bit PowerPC

2011-03-15 Thread Kyle Moffett
The current FreeScale MPC-8xxx DDR SPD interpreter is using full 64-bit
integer divide operations to convert between nanoseconds and DDR clock
cycles given arbitrary DDR clock frequencies.

Since all of the inputs to this are 32-bit (nanoseconds, clock cycles,
and DDR frequencies), we can easily restructure the computation to use
the "do_div()" function to perform 64-bit/32-bit divide operations.

On 64-bit this change is basically a no-op, because do_div is
implemented as a literal 64-bit divide operation and the instruction
scheduling works out almost the same.

On 32-bit PowerPC a fully accurate 64/64 divide (__udivdi3 in libgcc) is
over 1.1kB of code and thousands of heavily dependent cycles to compute,
all of which is linked from libgcc.  Another 1.2kB of code comes in for
the function __umoddi3.

It should be noted that nothing else in U-Boot or the Linux kernel seems
to require a full 64-bit divide on my 32-bit PowerPC.

Build-and-boot-tested on the HWW-1U-1A board using DDR2 SPD detection.

Signed-off-by: Kyle Moffett 
Acked-by: York Sun 
Cc: Andy Fleming 
Cc: Kumar Gala 

--
Changelog:
v2: Resubmitted separately from the other HWW-1U-1A patches
v3: Rebased on the 'next' branch of git://git.denx.de/u-boot-mpc85xx.git

 arch/powerpc/cpu/mpc8xxx/ddr/util.c |   56 +--
 1 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/cpu/mpc8xxx/ddr/util.c 
b/arch/powerpc/cpu/mpc8xxx/ddr/util.c
index b9a5a69..02908b4 100644
--- a/arch/powerpc/cpu/mpc8xxx/ddr/util.c
+++ b/arch/powerpc/cpu/mpc8xxx/ddr/util.c
@@ -8,9 +8,17 @@
 
 #include 
 #include 
+#include 
 
 #include "ddr.h"
 
+/* To avoid 64-bit full-divides, we factor this here */
+#define ULL_2e12 2ULL
+#define UL_5pow12 244140625UL
+#define UL_2pow13 (1UL << 13)
+
+#define ULL_8Fs 0xULL
+
 /*
  * Round mclk_ps to nearest 10 ps in memory controller code.
  *
@@ -20,35 +28,51 @@
  */
 unsigned int get_memory_clk_period_ps(void)
 {
-   unsigned int mclk_ps;
+   unsigned int data_rate = get_ddr_freq(0);
+   unsigned int result;
+
+   /* Round to nearest 10ps, being careful about 64-bit multiply/divide */
+   unsigned long long mclk_ps = ULL_2e12;
 
-   mclk_ps = 2ULL / get_ddr_freq(0);
-   /* round to nearest 10 ps */
-   return 10 * ((mclk_ps + 5) / 10);
+   /* Add 5*data_rate, for rounding */
+   mclk_ps += 5*(unsigned long long)data_rate;
+
+   /* Now perform the big divide, the result fits in 32-bits */
+   do_div(mclk_ps, data_rate);
+   result = mclk_ps;
+
+   /* We still need to round to 10ps */
+   return 10 * (result/10);
 }
 
 /* Convert picoseconds into DRAM clock cycles (rounding up if needed). */
 unsigned int picos_to_mclk(unsigned int picos)
 {
-   const unsigned long long ULL_2e12 = 2ULL;
-   const unsigned long long ULL_8Fs = 0xULL;
-   unsigned long long clks;
-   unsigned long long clks_temp;
+   unsigned long long clks, clks_rem;
 
+   /* Short circuit for zero picos */
if (!picos)
return 0;
 
-   clks = get_ddr_freq(0) * (unsigned long long) picos;
-   clks_temp = clks;
-   clks = clks / ULL_2e12;
-   if (clks_temp % ULL_2e12) {
+   /* First multiply the time by the data rate (32x32 => 64) */
+   clks = picos * (unsigned long long)get_ddr_freq(0);
+
+   /*
+* Now divide by 5^12 and track the 32-bit remainder, then divide
+* by 2*(2^12) using shifts (and updating the remainder).
+*/
+   clks_rem = do_div(clks, UL_5pow12);
+   clks_rem <<= 13;
+   clks_rem |= clks & (UL_2pow13-1);
+   clks >>= 13;
+
+   /* If we had a remainder, then round up */
+   if (clks_rem)
clks++;
-   }
 
-   if (clks > ULL_8Fs) {
+   /* Clamp to the maximum representable value */
+   if (clks > ULL_8Fs)
clks = ULL_8Fs;
-   }
-
return (unsigned int) clks;
 }
 
-- 
1.7.2.3

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot