https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52394
--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> --- (In reply to Oleg Endo from comment #2) > > The code should actually be something like this: > mov.l .L2,r2 > bld #0,r5 > mov #0,r0 > bor.b #5,@(0,r2) > bst.b #5,@(0,r2) Actually this might also result in unexpected side-effects when accessing external hardware, because the volatile mem is accessed with 2x load and 1x store. So actually, this sequence can't be really used safely. Moreover, it seems the code size improvements for those SH2A bitops are not so big. The above code is 14 bytes. The same on non-SH2A could be: shlr r5 subc r0,r0 not r0,r0 and #32,r0 mov.l .L5,r2 mov.b @r2,r1 or r0,r1 mov.b r1,@r2 which is 16 bytes. And the SH2A version of that could be: bld #0,r5 mov #0,r0 bst #5,r0 mov.l .L5,r2 mov.b @r2,r1 or r0,r1 mov.b r1,@r2 which is 14 bytes. And of course, if GBR can be clobbered it gets down to 8 bytes: mov.l .L5,r2 ldc r2,gbr mov #0,r0 or.b #32,@(r0,gbr)