> Thanks for reply! I am writing inline assembly in C. Strangely, this
> time it works. However, I just had another strange problem. Here is
> the code doing 1million times of multiplications, I expected it finishes
> in a couple seconds. However, it seems there is an infinite loop, and
> it never ends. Did I do anything wrong?
> register uint16_t a1 = 1;
> register uint16_t a2 = 2;
> asm volatile (
> " mov #1000, r15 \n"
> " LPI: mov #1000, r14 \n"
> " LPJ: mov %0, &0x0130 \n"
> " mov %1, &0x0138 \n"
> " mov &0x013a, %0 \n"
> " mov &0x013c, %1 \n"
> " inc %0 \n"
> " inc %1 \n"
> " dec r14 \n"
> " jnz LPJ \n"
> " dec r15 \n"
> " jnz LPI \n"
> :"+r"(a1), "+r"(a2)
> :
> );
The big problem here is that you haven't told GCC not to use r14 and r15
for a1 and a2! If it does that (check the assembly output), the
various decrements might never end.
Try something like:
uint16_t a1 = 1, a2 = 2;
uint16_t i = 1000, j;
asm volatile ("\n"
"LPI: mov #1000,%3 \n"
"LPJ: mov %0, &0x130 \n"
" mov %1, &0x138 \n"
" mov &0x13a, %0 \n"
" mov &0x13c, %1 \n"
" inc %0 \n"
" inc %1 \n"
" dec %3 \n"
" jnz lpj \n"
" dec %2 \n"
" jnz LPI"
: "+%r" (a1), "+r" (a2), "+r" (i), "=r" (j));
This lets GCC choose registers for i (%2) and j (%3).
I also told it that a1 and a2 are commutative (the order doesn't
matter).
If you have any additional input operands, you may also have to add &
(earlyclobber) annotations to some output operands. Otherwise, GCC,
thinks that it's perfectly fine to optimize:
int x = 5, y = 5;
asm(" add %1,%0" : "+r" (x) : "r" (y));
to
mov #5,x
add x,x
where the "x" register is used for both x and y! GCC's inline asm is
primarily designed for exactly this sort of single-instruction code
where the inputs and outputs may overlap arbitrarily.
Far simpler would probably be:
uint16_t a1 = 1, a2 = 2;
uint16 i = 1000;
do {
uint16_t j = 1000;
do {
asm("mov %2, &0x130\n\t"
"mov %3, &0x138\n\t"
"mov 0x13a, %0\n\t"
"mov 0x13c, %1\n\t"
: "=r" (a1), "=r" (a2)
: "%r" (a1), "r" (a2));
a1++;
a2++;
} while (--j);
} while (--i);
Although gcc probably won't use it, this gives it the flexibility
to use different registers for the input values than the output ones.
(There is no inherent reason that a variable has to live in a
single register for its entire life, after all.)
Gcc's inline asm is a *wonderful* thing, but it can be a bit strange to
people used to other compiler's "turn optimization off and don't mess
with my code" inline asm features. If you want that from gcc, use an
external assembly routine. With gcc, use its features to explain
exactly what you want and it can actually optimize.
E.g. don't say "give me r15 as a temporary", but rather "give me
a temporary register", and let it choose:
unsigned temp;
asm( "..." : "=&r" (temp));
Or, if you want that temp initialized to 1024, try:
unsigned temp;
asm( "..." : "=r" (temp) : "0" (1024));
For example, consider that:
asm("" : "=rm" (x) : "0" (y));
is a fancy way of writing "x = y".