32bit

Eric Auer Tue, 20 Jul 2004 15:09:12 -0700

Hi, I took the effort to grep through the whole 2035 kernel sources,
to find divisions and modulo calculations. Some results:


* division/modulo macro should have an "if x/512 then ..." shortcut,
  especially for older CPUs where division is slow. We often divide
  by sector size, but without KNOWING in advance that this will be 512
  (because we still have the hope that all SEC_SIZE - defined as 512 - will
  eventually be replaced by "real sector size for this drive", to allow
  the kernel to handle other sector sizes).

* for "c = a/b, d = a%b" I found NO places where b can be above 64k,
  but SOME places where c can be above 64k. This means: The big bit-by-bit-
  division loop can be removed (let "division overflow" handle the rest).
  The places where c is above 64k are already special-cased: Two single
  32:16=16bit divisions implement that (because d is always 16 bit only,
  two 32:16=16bit divisions are actually enough for that case!)

Arkady already tuned the two LBA_to_CHS variants...
Somebody else already tuned the clock driver:
        ticks = ((ticks / 59659u) << 16) + ((ticks % 59659u) << 16) / 59659u;
  ... scaling factor 1193180/6553600 = 59659/327680 = 59659/65536/5 ...
  Ticks = ((hs >> 16) * 59659u + (((hs & 0xffff) * 59659u) >> 16)) / 5;
... which also uses a simplified (works 1901 to 2199) leap year detection, too.
The link_fat thing might look tricky, but it divides by 2*, 0.5* or 0.25* sector size 
only.
It might be interesting to use a shift for that (sector sizes are always
powers of 2) if there is a nice way to quickly get the right shift count.

The prf.c and put_unsigned base conversion things are two of VERY few
places where the result of a division can be > 64k, but again, such
divisions still do not need the full bit by bit processing loop ;-).

FCB stuff sometimes divides by record size, but values are clipped to 64k
in some aspects here.

FAT calculations for bpb processing (cluster count, fat size in bytes),
using cdiv(a, b) (((a) + (b) - 1) / (b)), usually divide by sector size
based values: FAT size * 0.25, * 0.5, * 2/3 ... (Is defbpb->bpb_nsector right
.... Argh, I meant CLUSTER size of course.
So still everything with (b) < "32.1k" in the cdiv division. Right after that,
you have a division by things like 4 or the sector size... there you can
get RESULTS above 64k, but still the "two step division" is enough, no loop needed.

(and results above 64k only happen for FATs > 32 MB, which is only needed
for drives > 512 GB but which can happen for drive > 4 GB for tiny clusters).

Another thing which I noted: getbpb() has no clipping of ddt_ncyl value,
I assume it intentionally returns the result modulo 64k? No real problem
because using > 64k disk cylinders is impossible anyway. FreeDOS will
even refuse to use > 1k disk cylinders, and will use LBA instead. The
ddt_ncyl field is not READ for anything by FreeDOS itself.

Only in the standalone programs (exeflat, sys), I found one single really
huge division:
    return x.xfs_freeclusters > (bytes / (x.xfs_clussize * x.xfs_secsize));
This can be a division by 64k if cluster size is 64k. Could be easily
special-cased, but having an inefficient "huge div loop" in SYS doesn't hurt.
As long as we now know that the WORST case in the entire KERNEL is already
the one which is handled by simple "two step div".


Please correct me if I missed any place where a real "huge div loop"
would be needed. See also my notes (only temporarily online) at:
http://www.coli.uni-sb.de/~eric/kernelmodulo.txt
(88 lines, as I removed all "boring" cases like "... = .../constant" already)

Eric



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Freedos-kernel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/freedos-kernel

[Freedos-kernel] Optimization idea: DIV/MODULO never full 32=32/32bit

Reply via email to