Dear users and developers,
This is my first mail on this mailing list, and it is also the first time
I come back to Z80-assembly, after a pause of a decade and a half...
I know I'm late, because 8x8-bits multiplication on Z80 is "Now replaced
by a built-in for code generation, but"... I have to start from
somewhere...
I'm trying to substitute the code for "__muluchar_rrx_s::" in
device/lib/z80/mulchar.s with some faster code.
If I'm not wrong: current code uses (342+b*6) T-states (call and ret
excluded), where (b) is the number of bit set to "1" in the second
operand.
Mine uses (356-b), this means that the current code is faster than mine
only when the second operand is zero or a power of two, and slower every
time the second operand has at least three bits set to "1" (85% of the
values in 0-255).
Advantages of the new code:
- faster (most of the times), saving 14 T-states on average
- does not mess with DE
- can be easily modified to give its result on DE (or BC)
Drawbacks:
- overwrites the accumulator A
Parity:
- same memory footprint (2 bytes can be saved, but costs 30 T-states)
The patch follows (it applies to sdcc 2.9.0):
---------8<---------8<---------8<---------8<---------8<---------8<-----
--- mulchar.s 2009-03-27 11:31:35.000000000 +0100
+++ sdcc/device/lib/z80/mulchar.s 2009-01-05 11:20:47.000000000 +0100
@@ -1,38 +1,25 @@
.area _CODE
-;; Multiply two 8-bits operands, giving a 16-bits result
-;; by Marco Bodrato, March 27, 2009, licensed GPLv2+
-;; Before:
-;; On-stack: return address, operands
-;; After:
-;; B=0
-;; HL=result
-;; A=H, F=[H=N=Cy=0,(P,S,Z, depend on L)]
-;;
-;; Timings:
-;; Total cycles needed depend on z=the number of "0" bits in L;
-;; returns after (348+z).
-;; Notes: HL can be replaced by DE (or, a bit trickier, BC)
-
+; This multiplication routine is similar to the one
+; from Rodnay Zaks, "Programming the Z80".
+
; Now replaced by a builtin for code generation, but
; still called from some asm files in this directory.
__muluchar_rrx_s::
- pop af
- pop hl ; Load operands
- push hl ; and recover stack
- push af
- ;; registers H and L now store the two operands
- xor a
+ ld hl, #2+1
+ add hl, sp
+ ld e, (hl)
+ dec hl
+ ld h, (hl)
+ ld l, #0
+ ld d, l
ld b, #8
muluchar_rrx_s_loop:
- rr l
+ add hl, hl
jr nc, muluchar_rrx_s_noadd
- add a, h
+ add hl, de
muluchar_rrx_s_noadd:
- rra
djnz muluchar_rrx_s_loop
- rr l ; result is in AL, now...
- ld h, a
ret
; operands have different sign
---------8<---------8<---------8<---------8<---------8<---------8<-----
Let me know if you find it interesting! If you do, I can try to optimise
also the "code generation" (but I'll need some hints) and maybe other
multiplication routines...
Regards,
Marco
--
http://bodrato.it/software/
------------------------------------------------------------------------------
_______________________________________________
Sdcc-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sdcc-user