On 12/15/2014 06:59 AM, Norihiro Tanaka wrote:
+/* True if each byte can not occur inside a multibyte character */
+static bool always_single_byte[NOTCHAR];
+
+static void
+dfaalwayssb (void)
+{
+ size_t i;
+ unsigned char const uc[] = { '\0', '\n', '\r', '.', '/' };
+ for (i = 0; i < sizeof uc / sizeof uc[0]; ++i)
+ always_single_byte[uc[i]] = true;
+}
Can't we improve this when using_utf8 () is true? In that case, every
ASCII character is always single byte. Also, the bytes 0xc0, 0xc1, and
0xf5 through 0xff can be added to the table: they are not single-byte
characters but they are always encoding errors so they will be a
character boundary as far as skip_remains_mb is concerned. This
suggests that the table 'always_single_byte' should be renamed to
something like 'always_character_boundary'.
wint_t wc = WEOF;
+ if (always_single_byte[*p])
+ return p;
This won't assign anything to *WCP, contrary to the documented API for
for skip_remains_mb. This is OK (as callers don't care) but the API
documentation should be changed to reflect the actual behavior.