At Robert Hyatt's prompting, I have looked more closely at
this. The degree to which a pipe (i.e., cmd1 | cmd2)
parallelizes seems to depend quite sensitively on (a) the
size of the writes/reads and (b) the amount of computation
done between the writes/reads.
I've written a small program (attached) that writes/reads a
megabyte of data and does some pointless computation for
each byte it writes. There are two parameters: "write_size",
which is the size of each write/read and "think_exponent",
which is the log-2 of the amount of computation done per
byte.
Here are the results for elapsed time and % CPU as functions
of think_exponent (3 to 11) and write_size (1k to 1024k):
1 2 3 4 5 6 7 8 9 10 11 12
1k 0.08 0.13 0.20 0.36 0.67 1.28 2.45 4.85 9.67 19.58 39.76 78.84
2k 0.07 0.13 0.21 0.36 0.66 1.27 2.47 4.89 9.78 19.41 38.61 77.40
4k 0.08 0.14 0.20 0.36 0.67 1.27 2.47 4.87 9.74 19.41 38.45 77.07
8k 0.09 0.15 0.22 0.37 0.87 1.28 2.50 5.14 9.92 19.44 39.18 78.04
16k 0.10 0.16 0.21 0.46 0.72 1.36 2.54 5.00 9.87 19.78 40.06 81.95
32k 0.11 0.23 0.38 0.67 0.74 1.30 2.47 5.05 9.88 20.77 39.00 79.51
64k 0.11 0.22 0.39 0.63 0.93 1.85 2.80 5.10 10.37 22.89 40.62 79.54
128k 0.11 0.23 0.36 0.69 1.21 2.10 3.66 6.61 10.91 19.59 39.04 77.79
256k 0.10 0.20 0.35 0.48 0.80 1.39 2.55 4.99 9.81 19.57 38.74 76.93
512k 0.12 0.21 0.24 0.41 0.71 1.29 2.67 5.11 9.89 19.31 38.85 77.87
1024k 0.11 0.14 0.22 0.36 0.69 1.32 2.62 4.95 9.67 19.33 39.76 77.46
1 2 3 4 5 6 7 8 9 10 11 12
1k 188 176 196 193 195 195 199 198 197 195 191 193
2k 181 186 189 191 194 196 197 197 196 197 198 197
4k 187 178 186 196 195 196 197 198 197 197 196 197
8k 159 169 180 198 148 190 194 188 194 197 195 196
16k 132 130 194 151 180 181 191 193 196 193 190 186
32k 86 105 101 104 170 190 196 190 193 184 195 192
64k 106 101 97 106 141 130 172 189 185 166 185 191
128k 100 103 105 100 106 118 133 145 176 195 196 196
256k 120 115 108 141 160 178 190 193 195 192 194 199
512k 115 114 162 167 179 193 184 188 194 198 197 197
1024k 107 163 165 192 187 189 186 195 199 194 188 197
When the pipe parallelizes well, the %CPU is close to 200%;
when it parallelizes badly, the %CPU is close to 100%.
You can see that the pipe parallelizes badly when the
write/read size is in a certain interval (e.g., 16k to 512k
for think_exponent of 4) and the amount of computation being
done is small or moderate.
Can any one explain this? Can anyone fix it?
Again, I don't think this is latency because if I replace
the line "n = write_size" in the function write_or_read with
"n = 1" then everything parallelizes well. It seems to be
that under certain circumstances, the amonut of data being
passed and the amount of computation being performed
conspire to keep both processes on the same processor.
Regards,
Alan
/*
* use as:
* ./a.out write think_exponent write_size_in_k
* ./a.out read think_exponent write_size_in_k
*/
#include <unistd.h>
#include <stdlib.h>
#define NBYTES (1024L * 1024L)
long think_exponent;
size_t write_size;
void
write_or_read(char *which)
{
unsigned char buffer[NBYTES];
ssize_t n, m;
n = write_size;
do {
if (strcmp(which, "write") == 0)
m = write(1, buffer, n);
else
m = read(0, buffer, n);
n -= m;
} while (n > 0);
}
void
think(void)
{
long i, j;
volatile unsigned int v = 1;
for (j = 0; j < write_size; ++j)
for (i = 0; i < 2 << think_exponent; ++i)
v *= 3;
}
int
main(int argc, char **argv)
{
long i;
think_exponent = atol(argv[2]);
write_size = atol(argv[3]) * 1024;
for (i = 0; i < NBYTES / write_size; ++i) {
write_or_read(argv[1]);
think();
}
return 0;
}
--
Dr Alan Watson
Instituto de Astronom�a UNAM
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]