Overlapping MVC where the overlap is not just 1 byte is likely to be much slower than an ordinary MVC, as it is less likely to be handled in an optimised way.
I think the fastest solution would be one ordinary MVC per 256 bytes, something like the following (using the same literal in both cases to avoid needing a second one, even though the second move is less than 256 bytes): MVC P1(256),=64PL4'0' MVC P1+256(104*4-256),=64PL4'0' I had an ASSERT macro to verify assembly-time assumptions, so I'd add something like ASSERT L'P1,EQ,4 in this case. If you don't have an equivalent macro, you can at least use this: DC 0SL2(L'P1-4,4-L'P1) Verify assumption that L'P1 is 4 [I'm retired now so the above is all from memory and has not been tested]. Jonathan Scott
