Hi, wc -w doesn't seem to recognize whitespace characters with a codepoint over UCHAR_MAX (255) as word separators. For example, using the character EM SPACE U+2003:
$ printf "foo\u2003bar" | ./wc -w 1 I should get a word count of 2, but instead the space is ignored while counting words. Meanwhile, wc v9.4 gives the correct answer: $ printf "foo\u2003bar" | wc -w 2 It looks like the regression has been introduced by [f40c6b5] and would be fixed by something like the following change: diff --git a/src/wc.c b/src/wc.c index f5a921534..9d456f8c0 100644 --- a/src/wc.c +++ b/src/wc.c @@ -528,7 +528,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos) if (width > 0) linepos += width; } - in_word2 = !iswnbspace (wide_char); + in_word2 = !iswspace (wide_char) && !iswnbspace (wide_char); } /* Count words by counting word starts, i.e., each Cheers, -- Aearil