Derek Martin writes:

> After doing some more experiments with this myself, I am too.  I realized
> later that the second example I gave still shouldn't work, and with a
> large enough input file, it doesn't.  I think I was half right; commands
> separated by a ; should work fine, but those separated by pipes should
> have issues.

Commands seperated by semicolons are run sequentially.  The shell
wait()s for the first command to get done before it executes the
second, and so on.

: echo j > junk; cat junk | sed "s/j/p/" > junk
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                            \ /
                             |           
                             |
For the rest of this conversation, let's only concentrate on this part
of the command line.  For the purposes of this thread, we can safely
assume that there exists a file called junk that contains the letter
'j' in it.  It could have been created days ago.  Keeping this in the
conversation only confuses matters.

What happens when you type this?  (assume "junk" already exists)

     cat junk | sed "s/j/p/" > junk
     
In the case of bash, the shell calls pipe() and then it calls fork().
The (first) child process then manipulates file descriptors with dup()
or dup2() so that the pipe is the new destination for stdout.  Then
this (first) child calls exec() and runs "cat junk".  As it happens,
this might start running right away, and any output is stored in the
pipe's internal buffers, which is stored in kernel space...

Then the parent shell opens up the file "junk" (for writing).  The it
calls fork().  The (second) child process arranges for stdout to be
redirected towards the fd associated with the file "junk" (again, by
calling some variant of dup()).  Also, the (second) child arranges for
stdin to come from the pipe (again dup()).  Then the (second) child
process calls exec() to run the sed command.

The interesting part about all of this is the pipe, which is
maintained by the kernel.  Under the right conditions, it is possible
for the "cat" command to execute relatively quickly and output the
contents of the file "junk" into the pipe before the shell opens (and
subsequently truncates) the file "junk".  Because the contents of the
file have already been read and the contents of the file have been
copied to the pipe, the fact that the file gets truncated becomes
irrelevant.

However, as the file size (of "junk") increases, the probability of
all of this "working" becomes closer and closer to zero.  And if the
file size becomes greater than the size of the pipe's buffer, "cat"
will block in write().  Strange things will happen when "cat" becomes
unblocked and attempts to read from "junk" again (after the file has
been truncated).

So there's a race condition going on here.


I've kindof simplified what's actually going on here, but this is the
general gist of it.  I hope this this helps explain what's going on.  

--kevin
-- 
Kevin D. Clark          |                          |
[EMAIL PROTECTED] | [EMAIL PROTECTED] |  Give me a decent UNIX
Enterasys Networks      | PGP Key Available        | and I can move the world
Durham, N.H. (USA)      |                          |


**********************************************************
To unsubscribe from this list, send mail to
[EMAIL PROTECTED] with the following text in the
*body* (*not* the subject line) of the letter:
unsubscribe gnhlug
**********************************************************

Reply via email to