https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93808

--- Comment #22 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #21)
> 
> I think it is more by accident.    strict-alginment here should not make a
> difference really as it is undefined even on non-strict targets. 
> fcross-jumping in this case causes the BB that contains
> __builtin_unreachable to go to an invalid basic-block which is valid
> optimization which just happens on sh and for some reason not arm or other
> targets.  I have not looked into the code or even the RTL to double check
> this theory though.


But then that would mean, the code is just generally undefined and might reach
the "unreachable" cases in any case.

The function 'search_nonascii' is used in the crashing function
'coderange_scan' 3 times.  It's a bit difficult to see what's going on in the
resulting code.  Prohibiting the inlining with...

--- "orig ng/string.c.orig"     2019-10-01 20:02:30.000000000 +0900
+++ "orig ng/string.c"  2020-02-22 17:12:02.624621304 +0900
@@ -436,7 +436,7 @@

 VALUE rb_fs;

-static inline const char *
+static const char * __attribute__((noinline))
 search_nonascii(const char *p, const char *e)
 {
     const uintptr_t *s, *t;

...results in a more straight forward code of course.  In that case the
unreachable case just falls through into one of the other cases, so the code
will not crash, but it will also not work correctly.  The latter switch is also
not translated into a jump table but a series of simple comparisons.

One thing I've noticed ...

    if (0 || e - p >= 4)

this line uses a signed comparison for e-p >= 4 in the final code.
If 'p' is greater than 'e', then this code will do nonsense.  It might then go
into the unreachable case in the latter switch.

'p' is modified in the calling function 'coderange_scan' like so

   p += (ret);

if 'ret' is >= 0, where ret is the return value of function
'rb_enc_precise_mbclen', which in turn calls

   (enc)->precise_mbc_enc_len(p,e,enc)

through the macro ONIGENC_PRECISE_MBC_ENC_LEN

I see at least one encoder function that 



Adrian, could please apply the following patch to the original string.c file
and try building & running the whole thing again with the original compiler
flags, with -fno-cross-jumping and with -O1.  Does one of the added traps go
off?


--- "orig ng/string.c.orig"     2019-10-01 20:02:30.000000000 +0900
+++ "orig ng/string.c"  2020-02-22 18:29:54.904783490 +0900
@@ -446,13 +446,15 @@
 # define NONASCII_MASK 0x80808080UL
 #endif

+if ( (intptr_t)(e-p) < 0) __builtin_trap ();
+
     if (UNALIGNED_WORD_ACCESS || e - p >= SIZEOF_VOIDP) {
 #if !UNALIGNED_WORD_ACCESS
        if ((uintptr_t)p % SIZEOF_VOIDP) {
            int l = SIZEOF_VOIDP - (uintptr_t)p % SIZEOF_VOIDP;
            p += l;
            switch (l) {
-             default: UNREACHABLE;
+             default: __builtin_trap ();
 #if SIZEOF_VOIDP > 4
              case 7: if (p[-7]&0x80) return p-7;
              case 6: if (p[-6]&0x80) return p-6;
@@ -481,7 +483,7 @@
     }

     switch (e - p) {
-      default: UNREACHABLE;
+      default: __builtin_trap ();
 #if SIZEOF_VOIDP > 4
       case 7: if (e[-7]&0x80) return e-7;
       case 6: if (e[-6]&0x80) return e-6;

Reply via email to