Kingsley G. Morse Jr. wrote: > Please change wc to treat "," as a word delimiter.
The 'wc' is an old command and is the way that it is today because that is the way that it was way back 30 years ago when it was originally written. Changing it would break a lot of legacy code. For a long time wc was used as a "poor man's checksum" for example. However, if you select a locale that contains comma in the set of whitespace characters then wc would count as you want. Please see the POSIX standards reference. http://www.opengroup.org/onlinepubs/009695399/utilities/wc.html The wc utility shall consider a word to be a non-zero-length string of characters delimited by white space. http://www.opengroup.org/onlinepubs/009695399/functions/isspace.html The isspace() function shall test whether c is a character of class space in the program's current locale; see the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 7, Locale. All of that basically says that isspace() is used to determine word separators and that isspace() follows the definition of the currently active locale. But I don't know of any locale that sets comma to be in the set of whitespace characters. > Here's how to duplicate the problem: > > $ echo "cat,dog" | wc -w > 1 > $ echo "cat dog" | wc -w > 2 A very good small test case. Very nice. > A workaround is > > $ cat file | sed 's/[[:punct:]]/ /g' | wc -w I think your "workaround" is really the best you can do to achieve the behavior you are looking for. Because other people would want the original behavior. Note that you have an extra 'cat' process there that is useless. You don't need it and should not use it. The following is the same but without the extra 'cat' process. $ sed 's/[[:punct:]]/ /g' < file | wc -w You can also use 'tr' instead of 'sed' for this purpose. $ tr '[:punct:]' ' ' < file | wc -w Bob -- Bob Proulx <[EMAIL PROTECTED]> http://www.proulx.com/~bob/ CP-ASEL-IA-Tailwheel-Glider -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]