Pádraig Brady wrote: ... > Wow that's interesting. My results are with 400MHz DDR2. > If I do a simpler test excluding file-system and page cache > to just show the syscall overhead I can also see the doubling > of throughput when going from 4KiB to 32KiB buffers: > > for i in $(seq 0 10); do > bs=$((1024*2**$i)) > printf "%7s=" $bs > dd bs=$bs if=/dev/zero of=/dev/null count=$(((2*1024**3)/$bs)) 2>&1 | > sed -n 's/.* \([0-9.]* [GM]B\/s\)/\1/p' > done > 1024=484 MB/s > 2048=857 MB/s > 4096=1.6 GB/s > 8192=2.4 GB/s > 16384=3.1 GB/s > 32768=3.6 GB/s > 65536=3.6 GB/s > 131072=3.8 GB/s > 262144=3.9 GB/s > 524288=3.9 GB/s > 1048576=3.9 GB/s > > Why I only see a small increase between 4 & 32K buffers when going > through the file-system and page cache on my kernel, must be due to > inefficiencies that have subsequently been addressed?
Interesting test. On the 2-core AMD system (1MB cache per core) $ for i in $(seq 0 10); do bs=$((1024*2**$i)) printf "%7s=" $bs dd bs=$bs if=/dev/zero of=/dev/null count=$(((2*1024**3)/$bs)) 2>&1 | sed -n 's/.* \([0-9.]* [GM]B\/s\)/\1/p' done 1024=578 MB/s 2048=1.1 GB/s 4096=1.8 GB/s 8192=2.6 GB/s 16384=3.2 GB/s 32768=4.1 GB/s 65536=4.8 GB/s 131072=5.2 GB/s 262144=5.7 GB/s 524288=5.9 GB/s 1048576=3.4 GB/s On the 4-core Intel with 6M cache per core and faster RAM 1024=1.5 GB/s 2048=2.8 GB/s 4096=5.0 GB/s 8192=7.7 GB/s 16384=10.4 GB/s 32768=9.6 GB/s 65536=9.9 GB/s 131072=10.6 GB/s 262144=10.7 GB/s 524288=10.6 GB/s 1048576=11.2 GB/s 2097152=10.6 GB/s 4194304=9.8 GB/s 8388608=2.6 GB/s _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils