Hi Roland, I don't claim to be an I18N expert, but from a standards point of view, I believe David has correctly stated what the output should be.
The only reason I can see why the word count (-w output) could not be 17 for both files would be if one or more of the three byte UTF-8 characters were not valid characters in the charmap for en_US.UTF-8. But, if this is the problem, the exit code would have to be non-zero and one or more diagnostic messages would have to have been written to STDERR. If one or more of the three byte characters were classified as white space characters, that would reduce the number of words to a value less than 17; it could not increase the count to 32. Cheers, Don Roland Mainz wrote: > Hi! > > ---- > > [I'ce CC'ed Ienup Sung <Ienup.Sung at Sun.COM> and Masaki Katakai > <katakai at japan.sun.com> for the i18n parts and Don Cragun > <dcragun at sonic.net> as standards expert] > > During testing of a new version of the AST "wc" (not the version for > ksh93-integration update2, this one is a newer one) we hit the issue > below where two of the testcases show different results for AST "wc", > GNU coreutils "wc" and Solaris /usr/bin/wc ... > ... the question is now: What is the expected (from an i18n+standards > point of view) output for both testcases ? > > -------- Original Message -------- > Subject: wc broken on GNU and Solaris > Date: Fri, 14 Aug 2009 17:57:14 -0400 > From: David Korn <dgk at research.att.com> > To: roland.mainz at nrubsig.org > > I have attached two one line files below. > > The file oneline1 when run with the GNU /usr/bin/wc -mwl in > the en_US.UTF-8 shows 1 word. The Solaris version shows 17 > which is correct. > > However, for oneline2, the Solaris version shows 32 and the > Gnu version shows 17 which I believe is correct. > Ours shows 17 for both. > > ======================================== >
