On 03/21/2016 04:16 PM, Pádraig Brady wrote:
On 21/03/16 00:59, William R. Fraser wrote:
When wc gets its list of files by reading from stdin, using the argument
'--from-files0=-', it reuses the same fstatus struct for each file.
The problem is that the 'wc' function checks the 'failed' member of this
struct and if it is <=0, it skips doing fstat on the file. The main loop
doesn't reset this value between files, so only the first file has fstat
done on it.
This can result in the 'wc' function seeking past the end of
subsequent files and then over-reporting their byte counts.
See the attached patch, which resets the fstatus struct in between files
when reading the file list from stdin.
Ouch. This seems to be since v7.0-96-gc2e56e0
It would also mean there would be a lot of redundant reading
if the initial file was significantly smaller than any other file.
$ truncate -s1G wc.big
$ touch wc.small
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
1073741824 wc.big
1073741760 wc.small
2147483584 total
We'll submit a full patch in your name.
Interesting enough, there seems to be a threshold to trigger the bug:
$ touch wc.small
$ seq 1000 > wc.big
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
3893 wc.big
0 wc.small
3893 total
$ seq 10000 > wc.big
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
48894 wc.big
45067 wc.small
93961 total
That's why I couldn't reproduce it this morning.
Have a nice day,
Berny