Re: Reading from stdin significantly slower than reading file directly?

wjoe via Digitalmars-d-learn Thu, 13 Aug 2020 07:40:35 -0700

On Thursday, 13 August 2020 at 07:08:21 UTC, Jon Degenhardt wrote:

Test                          Elapsed  System   User
----                          -------  ------   ----
tsv-select -f 2,3 FILE          10.28    0.42   9.85
cat FILE | tsv-select -f 2,3    11.10    1.45  10.23
cut -f 2,3 FILE                 14.64    0.60  14.03
cat FILE | cut -f 2,3           14.36    1.03  14.19
wc -l FILE                       1.32    0.39   0.93
cat FILE | wc -l                 1.18    0.96   1.04



The TREE file:

Test                          Elapsed  System   User
----                          -------  ------   ----
tsv-select -f 2,3 FILE           3.77    0.95   2.81
cat FILE | tsv-select -f 2,3     4.54    2.65   3.28
cut -f 2,3 FILE                 17.78    1.53  16.24
cat FILE | cut -f 2,3           16.77    2.64  16.36
wc -l FILE                       1.38    0.91   0.46
cat FILE | wc -l                 2.02    2.63   0.77

Your table shows that when piping the output from one process toanother, there's a lot more time spent in kernel mode. A switchfrom user mode to kernel mode is expensive [1].It costs around 1000-1500 clock cycles for a call to getpid() onmost systems. That's around 100 clock cycles for the actualswitch and the rest is overhead.


My theory is this:

One of the reasons for the slowdown is very likely mutexun/locking of which there is more need when multiple processesand (global) resources are involved compared to a single instance.

Another is copying buffers.

When you read a file the data is first read into a kernel bufferwhich is then copied to the user space buffer i.e. the buffer youallocated in your program (the reading part might not happen ifthe data is still in the cache).If you read the file directly in your program, the data is copiedonce from kernel space to user space.When you read from stdin (which is technically a file) it wouldseem that cat reads the file which means a copy from kernel touser space (cat), then cat outputs that buffer to stdout (alsotechnically a file) which is another copy, then you read fromstdin in your program which will cause another copy from stdoutto stdin and finally to your allocated buffer.

Each of those steps may invlovle a mutex un/lock.

Also with pipes you start two programs. Starting a program takesa few ms.

PS. If you do your own caching, or if you don't care about itbecause you just read a file sequentially once, you may benefitfrom opening your file with the O_DIRECT flag which basicallymeans that the kernel copies directly into user space buffers.


[1] https://en.wikipedia.org/wiki/Ring_(computer_security)

Re: Reading from stdin significantly slower than reading file directly?

Reply via email to