Kingsley G. Morse Jr. wrote:
> Please change wc to treat "," as a word delimiter.

The 'wc' is an old command and is the way that it is today because
that is the way that it was way back 30 years ago when it was
originally written.  Changing it would break a lot of legacy code.
For a long time wc was used as a "poor man's checksum" for example.

However, if you select a locale that contains comma in the set of
whitespace characters then wc would count as you want.

Please see the POSIX standards reference.

  http://www.opengroup.org/onlinepubs/009695399/utilities/wc.html

    The wc utility shall consider a word to be a non-zero-length string of
    characters delimited by white space.

  http://www.opengroup.org/onlinepubs/009695399/functions/isspace.html

    The isspace() function shall test whether c is a character of class
    space in the program's current locale; see the Base Definitions volume
    of IEEE Std 1003.1-2001, Chapter 7, Locale.

All of that basically says that isspace() is used to determine word
separators and that isspace() follows the definition of the currently
active locale.  But I don't know of any locale that sets comma to be
in the set of whitespace characters.

> Here's how to duplicate the problem:
> 
>     $ echo "cat,dog" | wc -w
> 1
>     $ echo "cat dog" | wc -w
> 2

A very good small test case.  Very nice.

> A workaround is 
> 
>     $ cat file | sed 's/[[:punct:]]/ /g' | wc -w

I think your "workaround" is really the best you can do to achieve the
behavior you are looking for.  Because other people would want the
original behavior.

Note that you have an extra 'cat' process there that is useless.  You
don't need it and should not use it.  The following is the same but
without the extra 'cat' process.

  $ sed 's/[[:punct:]]/ /g' < file | wc -w

You can also use 'tr' instead of 'sed' for this purpose.

  $ tr '[:punct:]' ' ' < file | wc -w

Bob

-- 
Bob Proulx <[EMAIL PROTECTED]>
http://www.proulx.com/~bob/
CP-ASEL-IA-Tailwheel-Glider


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to