Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-25 Thread Frank Heckenbach
(Sorry, I can't reply properly; I'm not subscribed, just saw the
message on the web archive.)

Stefano Lattarini wrote:

> Make run in parallel mode with output redirected to a regular file
> can randomly drop output lines

Yet another reason to use the new --output-sync feature. :)

I hadn't actually thought about this particular problem when I got
interested in the feature, but it makes sense. Perhaps I had
occasionally lost some messages before and never noticed ...

> The issue is present both in all of make 3.81, make 3.82 and make
> built from latest Git.  Here is a script that demonstrates it:
>
> [...]
>
> and the following suggests it might not be easy to fix:
>
>   

As Ralf Wildenhues explains in that mail, it's not really a make
problem, but the behaviour of POSIX files, also "Using O_APPEND
avoids this race", i.e. in your demo script, using ">>" (and
clearing the target file before) would also fix the problem.

> But it's worth nothing that the issue is not present with FreeBSD make (as
> offered by Debian package freebsd-buildutils 9.0-11); maybe the sources of
> that package might suggest how to obtain a fix after all?

I suppose it does something similar to our output-sync, i.e.
directing the output from different jobs to different temp files and
dumping it to the original stdout/stderr synchronized. Or do you
have something else in mind that make should (and could) do?

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-26 Thread Stefano Lattarini
On 05/26/2013 05:18 AM, Frank Heckenbach wrote:
> (Sorry, I can't reply properly; I'm not subscribed, just saw the
> message on the web archive.)
> 
> Stefano Lattarini wrote:
> 
>> Make run in parallel mode with output redirected to a regular file
>> can randomly drop output lines
> 
> Yet another reason to use the new --output-sync feature. :)
> 
Definitely, but only when redirecting the output to a regular file.

> I hadn't actually thought about this particular problem when I got
> interested in the feature, but it makes sense. Perhaps I had
> occasionally lost some messages before and never noticed ...
> 
>> The issue is present both in all of make 3.81, make 3.82 and make
>> built from latest Git.  Here is a script that demonstrates it:
>>
>> [...]
>>
>> and the following suggests it might not be easy to fix:
>>
>>   
> 
> As Ralf Wildenhues explains in that mail, it's not really a make
> problem, but the behaviour of POSIX files, also "Using O_APPEND
> avoids this race", i.e. in your demo script, using ">>" (and
> clearing the target file before) would also fix the problem.
>
I know, I just refactored the whole Automake testsuite a few days
ago to fix the issue in the same way you suggested ;-)
The issue is not at all difficult to work around, *once* you know
it's there and what its causes are.

>> But it's worth nothing that the issue is not present with FreeBSD make (as
>> offered by Debian package freebsd-buildutils 9.0-11); maybe the sources of
>> that package might suggest how to obtain a fix after all?
> 
> I suppose it does something similar to our output-sync, i.e.
> directing the output from different jobs to different temp files and
> dumping it to the original stdout/stderr synchronized.
>
Yes, I think you are right.

> Or do you have something else in mind that make should (and could) do?
> 
Actually no; maybe it is *possible* to play some tricks with the
inherited file descriptors in order to mitigate or fix the issue,
but I haven't given it any real thought.  Nor I will anytime soon
in all likelihood.  I just wanted to report this IMO subtle issue
with the hope of making awareness of it more widespread, since
AFAIK even very experienced developers had been bitten by it.

An now that I think about it, maybe the sanest "fix" would be just
documenting the issue in the manual?

Thanks, and best regards,
  Stefano

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-26 Thread Paul Smith
On Sun, 2013-05-26 at 11:14 +0200, Stefano Lattarini wrote:
> Actually no; maybe it is *possible* to play some tricks with the
> inherited file descriptors in order to mitigate or fix the issue,
> but I haven't given it any real thought.

Well, we can use fcntl() to set O_APPEND on stdout/stderr.  But I'm not
sure that's always the right thing to do.  It would work in the common
cases ("make >foo") because by the time make is invoked, the destination
file has already been truncated by the open() the shell performed.

Is there a situation where this would do the WRONG (unexpected) thing?
I guess someone could invoke make with a stdout opened without O_APPEND,
but also without O_TRUNC.  Then if we changed to O_APPEND, we'd get
different behavior.  It's very hard to think of a valid use-case for
opening a file used for stdout in this mode, however.  Maybe it's
bizarre enough to not worry about.

Might be worthwhile checking the FreeBSD code for their make, to see if
they do something like this.


___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-26 Thread Stefano Lattarini
On 05/26/2013 09:57 PM, Paul Smith wrote:
>
> [SNIP]
>
> Might be worthwhile checking the FreeBSD code for their make, to see if
> they do something like this.
> 
Nope, Frank was right: when run in parallel mode, FreeBSD make unconditionally
behaves like GNU make does with the '-O' option enabled (I behavior I actively
dislike, since it cannot be worked around).  And it also has several other
terrible hacks and quirks.  For more info, see:


Thanks,
  Stefano

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-26 Thread Paul Smith
On Sun, 2013-05-26 at 22:05 +0200, Stefano Lattarini wrote:
> On 05/26/2013 09:57 PM, Paul Smith wrote:
> >
> > [SNIP]
> >
> > Might be worthwhile checking the FreeBSD code for their make, to see if
> > they do something like this.
> > 
> Nope, Frank was right: when run in parallel mode, FreeBSD make unconditionally
> behaves like GNU make does with the '-O' option enabled (I behavior I actively
> dislike, since it cannot be worked around).  And it also has several other
> terrible hacks and quirks.  For more info, see:
> 

Nevertheless, I do wonder whether forcing stdout/stderr into O_APPEND
mode would be worthwhile.  It would fix this problem in any event.  I'm
having a hard time coming up with a reason NOT to do it.


___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-26 Thread Stefano Lattarini
On 05/26/2013 10:20 PM, Paul Smith wrote:
> On Sun, 2013-05-26 at 22:05 +0200, Stefano Lattarini wrote:
>> On 05/26/2013 09:57 PM, Paul Smith wrote:
>>>
>>> [SNIP]
>>>
>>> Might be worthwhile checking the FreeBSD code for their make, to see if
>>> they do something like this.
>>>
>> Nope, Frank was right: when run in parallel mode, FreeBSD make 
>> unconditionally
>> behaves like GNU make does with the '-O' option enabled (I behavior I 
>> actively
>> dislike, since it cannot be worked around).  And it also has several other
>> terrible hacks and quirks.  For more info, see:
>> 
> 
> Nevertheless, I do wonder whether forcing stdout/stderr into O_APPEND
> mode would be worthwhile.  It would fix this problem in any event.  I'm
> having a hard time coming up with a reason NOT to do it.
> 
To be clear: I'm not opposing to the change in any way; it's just that I
don't feel comfortable enough with the area to give an explicit ACK.  If
you think the change would be worthwhile, go for it :-)  I trust your
judgment far more than mine.

Thanks,
  Stefano

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-26 Thread Frank Heckenbach
Paul Smith wrote:

> Nevertheless, I do wonder whether forcing stdout/stderr into O_APPEND
> mode would be worthwhile.  It would fix this problem in any event.  I'm
> having a hard time coming up with a reason NOT to do it.

One issue, though it might seem strange that I'm the one to mention
it, is that it might be POSIX specific. How do other systems behave,
can they set O_APPEND via fcntl or otherwise, and if so, does it
guarantee non-conflicting writes? Of course, you could say an
improvement that only works on some systems (as long as doesn't
negatively affect other systems) is better than nothing, but it
might give package maintainers a false sense of safety in keeping a
problematic way of doing things rather than modifying their packages
to do it safely right away (like Stefano just did as I understand
it). Though, if O_APPEND doesn't guarantee non-conflicting writes
*at all* (i.e., even when set on opening, as in "make >> logfile")
on some systems, it's not really safe either.

Of course, you can construct a theoretical case, such as someone
setting up a large empty file, seeking to its beginning and
expecting their make jobs to write there. I can't think of any
practical reason to do this, but as with most of the "-O" and
related discussions, I'm sure someone will tell us about such a
scenario soon. :-)

Neither of may be a very strong argument, but my gut feeling is that
this is slightly outside of the scope of what make should do. But in
any case, I don't have a strong opinion since I'll be using
output-sync from now on anyway.

Stefano Lattarini wrote:

> An now that I think about it, maybe the sanest "fix" would be just
> documenting the issue in the manual?

I tend to agree. Perhaps a sentence or two in "Parallel Output",
mentioning that output-sync will also avoid this problem.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-27 Thread Eli Zaretskii
> Date: Mon, 27 May 2013 00:42:34 +0200
> From: Frank Heckenbach 
> Cc: bug-make@gnu.org
> 
> One issue, though it might seem strange that I'm the one to mention
> it, is that it might be POSIX specific. How do other systems behave,
> can they set O_APPEND via fcntl or otherwise

This can be done on Windows by creating a new file descriptor that has
the O_APPEND bit set, and then using dup2 to force stdout/stderr refer
to that file descriptor.  (This is theory; I should try that and see
if it actually works.)

> and if so, does it guarantee non-conflicting writes?

Not sure I understand what you are asking here.  Can you elaborate?

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-27 Thread Paul Smith
On Mon, 2013-05-27 at 20:13 +0300, Eli Zaretskii wrote:
> > and if so, does it guarantee non-conflicting writes?
> 
> Not sure I understand what you are asking here.  Can you elaborate?

The original issue reported is that if you do something like this:

make -j >make.out

and your make environment is recursive so you invoke one or more
sub-makes, your output may not just be interspersed (that is output
between multiple jobs are mixed together) but you will actually lose
some output: it will never appear at all.

The reason is that when you have multiple processes trying to update the
same file at the same time using standard output file mode, there is a
race condition between when the output is written to the file and when
the "current offset" value is updated, where multiple processes could be
overwriting the same part of the file.

The suggested solution (not modifying make) is to use this instead:

  : >make.out # truncate the file
  make -j >> make.out

POSIX guarantees that if you open a file in O_APPEND mode, the above
race can never happen because the kernel updates the file offset as the
file is being written.

Frank's question is whether other, non-POSIX systems have the same
behavior with O_APPEND.  Of course if they don't I don't see how it
would make things worse than they are now.

What I was suggesting was having make itself reset the mode of stdout
and stderr to add O_APPEND, so that the first (most common) syntax would
work correctly.  POSIX says that you can change the mode of an open file
descriptor using fcntl().

This wouldn't hurt anything in the above case, because when the shell
opens the output file (with O_TRUNC) it will be truncated, then it will
give the FD to make and make will change the mode to O_APPEND, so the
file will still be truncated as you expect.

The only possible way this could burn someone is if they are invoking
make from a program where they've specifically opened make's
stdout/stderr without O_APPEND and without O_TRUNC, and they expect make
to start overwriting the file from the beginning rather than appending
to the end.  I cannot conceive of any situation where something like
that would be done intentionally.


___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-27 Thread Eli Zaretskii
> From: Paul Smith 
> Cc: Frank Heckenbach , stefano.lattar...@gmail.com,
>   bug-make@gnu.org
> Date: Mon, 27 May 2013 14:09:42 -0400
> 
> The original issue reported is that if you do something like this:
> 
> make -j >make.out
> 
> and your make environment is recursive so you invoke one or more
> sub-makes, your output may not just be interspersed (that is output
> between multiple jobs are mixed together) but you will actually lose
> some output: it will never appear at all.
> 
> The reason is that when you have multiple processes trying to update the
> same file at the same time using standard output file mode, there is a
> race condition between when the output is written to the file and when
> the "current offset" value is updated, where multiple processes could be
> overwriting the same part of the file.

It sounds strange to me that the filesystem doesn't serialize the
writes.  Maybe I'm naive.

> POSIX guarantees that if you open a file in O_APPEND mode, the above
> race can never happen because the kernel updates the file offset as the
> file is being written.
> 
> Frank's question is whether other, non-POSIX systems have the same
> behavior with O_APPEND.

I will have to try that to know for sure.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-29 Thread Frank Heckenbach
Eli Zaretskii wrote:

> > Date: Mon, 27 May 2013 00:42:34 +0200
> > From: Frank Heckenbach 
> > Cc: bug-make@gnu.org
> > 
> > One issue, though it might seem strange that I'm the one to mention
> > it, is that it might be POSIX specific. How do other systems behave,
> > can they set O_APPEND via fcntl or otherwise
> 
> This can be done on Windows by creating a new file descriptor that has
> the O_APPEND bit set, and then using dup2 to force stdout/stderr refer
> to that file descriptor.  (This is theory; I should try that and see
> if it actually works.)

I don't think this would work, as least on systems I know (mostly
POSIX), since we're talking about altering the flags of the
stdout/stderr given to us. We don't usually have its filename to
open it again; it may not even have a filename (e.g., it might be a
file created and deleted; or it might be a pipe, a socket, etc.), or
it might not be possible to reopen it (maybe we don't have
permissions anymore; or again sockets) ...

If Windows has a function to make a copy of a FD, whatever it is,
with new flags, this plus dup2 would be mostly equivalent to fcntl
for our purposes indeed. (Though I doubt it has one, since from what
I've seen, it generally doesn't seem to treat files, pipes, etc.
uniformly.)

Paul Smith wrote:

> POSIX guarantees that if you open a file in O_APPEND mode, the above
> race can never happen because the kernel updates the file offset as the
> file is being written.
> 
> Frank's question is whether other, non-POSIX systems have the same
> behavior with O_APPEND.  Of course if they don't I don't see how it
> would make things worse than they are now.

I don't think it would make things worse, it might just cause
package authors to ignore the issue on their end, so if what we do
works only on POSIX and they test only on POSIX, they might not
notice that there is a possible problem elsewhere.

Of course, this doesn't apply if other systems serialize writes even
without O_APPEND, and the whole discussion is moot for those
systems.

Eli Zaretskii wrote:

> > From: Paul Smith 
> >
> > The original issue reported is that if you do something like this:
> > 
> > make -j >make.out
> > 
> > and your make environment is recursive so you invoke one or more
> > sub-makes, your output may not just be interspersed (that is output
> > between multiple jobs are mixed together) but you will actually lose
> > some output: it will never appear at all.
> > 
> > The reason is that when you have multiple processes trying to update the
> > same file at the same time using standard output file mode, there is a
> > race condition between when the output is written to the file and when
> > the "current offset" value is updated, where multiple processes could be
> > overwriting the same part of the file.
> 
> It sounds strange to me that the filesystem doesn't serialize the
> writes.  Maybe I'm naive.

I don't know the exact reasons. Perhaps it's just for efficiency, to
avoid synchronization by the OS for a rather special case, i.e.
different processes writing to the *same* file concurrently. If you
look at it this way, it smells like trouble because the question is,
how to merge the various writes. There are basically two answers:
Either the programs care about it themselves (in which case they
must cooperate, so they can also synchronize themselves), or it's
done automatically in the only sane way I can think of, i.e.
appending. Therefore POSIX makes an explicit guarantee for O_APPEND.
That's how I understand it.

In other words, you might in trouble as soon as you duplicate a
writable FD without O_APPEND set. "make -j" of course may do just
that, if its stdout/stderr is a regular file without O_APPEND. But
it's not particular to make. Any simple program that forks another
one (perhaps just a shell script starting a background job) is in
the same situation if both programs write to stdout/stderr. So if I
understand it correctly:

% cat foo
#!/bin/sh
echo foo &
echo bar
% ./foo > bar

Whoops, undefined behaviour. (Though it seems unlikely for the
problem to actually occur in such a simple case.)

It seems the real culprit in all of these cases it redirecting
stdout/stderr to a log file with ">". Of course, people do this all
the time because they usually only think of the open-time effects of
">" vs. ">>" (i.e., truncation or not) and not about the effects
further down. Unfortunately, the shell has no easy way to open a
file with truncation and appending (which is what one really wants
here), and most people are too lazy (or not aware of the need) to do
the two-step procedure (remove and ">>").

So on an abstract level I still think make has no business messing
with the FD flags, since make is just one example in a large class
of affected programs. In practice, though, it may be a very
important example, and since we're not gonna convince everyone to
use ">>", it may indeed be the pragmatically best thing to set
O_APPEND (unless we discover actual problems w

Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-29 Thread Eli Zaretskii
> Date: Wed, 29 May 2013 09:32:55 +0200
> Cc: stefano.lattar...@gmail.com, bug-make@gnu.org
> From: Frank Heckenbach 
> 
> Eli Zaretskii wrote:
> 
> > > Date: Mon, 27 May 2013 00:42:34 +0200
> > > From: Frank Heckenbach 
> > > Cc: bug-make@gnu.org
> > > 
> > > One issue, though it might seem strange that I'm the one to mention
> > > it, is that it might be POSIX specific. How do other systems behave,
> > > can they set O_APPEND via fcntl or otherwise
> > 
> > This can be done on Windows by creating a new file descriptor that has
> > the O_APPEND bit set, and then using dup2 to force stdout/stderr refer
> > to that file descriptor.  (This is theory; I should try that and see
> > if it actually works.)
> 
> I don't think this would work, as least on systems I know (mostly
> POSIX)

I was talking specifically about Windows, because that's what your
question above was about.

> since we're talking about altering the flags of the
> stdout/stderr given to us. We don't usually have its filename to
> open it again; it may not even have a filename (e.g., it might be a
> file created and deleted; or it might be a pipe, a socket, etc.), or
> it might not be possible to reopen it (maybe we don't have
> permissions anymore; or again sockets) ...

I don't need the name of the file, all I need is its file descriptor
or its Windows handle.

> If Windows has a function to make a copy of a FD, whatever it is,
> with new flags, this plus dup2 would be mostly equivalent to fcntl
> for our purposes indeed.

On Windows, file descriptors are created and maintained by the C
runtime, and they are private to the application.  Each descriptor is
an index into an array which holds the underlying Windows handle for
the file object and a bunch of flags, one of which is O_APPEND.  Those
flags are used by the Posix emulation APIs, in this case 'write', to
move the file pointer to the end of the file on each call to 'write'.
The OS knows nothing about those flags, it manipulates the file using
the handle and doesn't care about the descriptor.

There's a library function to get a handle that corresponds to file
descriptor, and another one that takes a handle and returns a new
descriptor which references that handle.  The latter function accepts
flags, including O_APPEND, to use for the file descriptor.

So the plan is:

  . get the handle that corresponds to (e.g.) stdout

  . produce a new descriptor for that handle with O_APPEND flag

  . use dup2 to replace the original stdout descriptor with this new
descriptor

> (Though I doubt it has one, since from what I've seen, it generally
> doesn't seem to treat files, pipes, etc.  uniformly.)

Windows does treat everything uniformly, just not the Posix way: every
object is referenced by a handle, which is an opaque pointer.  That
paradigm is actually broader than the Posix file descriptor paradigm:
there are objects, like events, critical sections, semaphores,
processes, etc. that are all referenced by handles, and there are APIs
that will take just about any handle and do their thing on it.  The
simplest example is CloseHandle.

> > It sounds strange to me that the filesystem doesn't serialize the
> > writes.  Maybe I'm naive.
> 
> I don't know the exact reasons. Perhaps it's just for efficiency, to
> avoid synchronization by the OS for a rather special case, i.e.
> different processes writing to the *same* file concurrently. If you
> look at it this way, it smells like trouble because the question is,
> how to merge the various writes.

I wasn't talking about synchronization or merging.  I was talking
about _losing_ some of the output, which was the issue discussed here.
My interpretation of that is that the system writes to the file using
more than a single file pointer.  And that is what sounded strange,
because I always thought that a handle that was inherited from a
parent process shares the same file pointer, and actually the whole
underlying object used for the I/O, with the parent.  If that were
true, then all the sub-makes would share the same file pointer, and we
couldn't possibly lose any output due to overwriting.  Perhaps this
information is outdated nowadays, though.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


RE: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-29 Thread Martin Dorey
Frank wrote:

> the two-step procedure (remove and ">>").

Woah, *truncate* and ">>".  Removal wouldn't do the right thing for symlinks.

> That said, I'm now going back to my own programs which redirect
> stdout in forked child processes and add O_APPEND to O_TRUNC ...

Me too!


___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-29 Thread Frank Heckenbach
Martin Dorey wrote:

> Frank wrote:
> 
> > the two-step procedure (remove and ">>").
> 
> Woah, *truncate* and ">>".  Removal wouldn't do the right thing for symlinks.

You're right, of course, thanks!

> > That said, I'm now going back to my own programs which redirect
> > stdout in forked child processes and add O_APPEND to O_TRUNC ...
> 
> Me too!

So this duscussion has had one positive effect for us already. :)

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-29 Thread Frank Heckenbach
Eli Zaretskii wrote:

> > If Windows has a function to make a copy of a FD, whatever it is,
> > with new flags, this plus dup2 would be mostly equivalent to fcntl
> > for our purposes indeed.
> 
> On Windows, file descriptors are created and maintained by the C
> runtime, and they are private to the application.  Each descriptor is
> an index into an array which holds the underlying Windows handle for
> the file object and a bunch of flags, one of which is O_APPEND.  Those
> flags are used by the Posix emulation APIs, in this case 'write', to
> move the file pointer to the end of the file on each call to 'write'.
> The OS knows nothing about those flags, it manipulates the file using
> the handle and doesn't care about the descriptor.
>
> There's a library function to get a handle that corresponds to file
> descriptor, and another one that takes a handle and returns a new
> descriptor which references that handle.  The latter function accepts
> flags, including O_APPEND, to use for the file descriptor.
> 
> So the plan is:
> 
>   . get the handle that corresponds to (e.g.) stdout
> 
>   . produce a new descriptor for that handle with O_APPEND flag
> 
>   . use dup2 to replace the original stdout descriptor with this new
> descriptor

I see, this might work.

However, there may still be a problem. The trick about O_APPEND on
POSIX it that it's atomic, i.e. nothing can get between moving the
file pointer and the write, even if another process tries to write
simultaneously. So if the POSIX emulation API emulates it with two
system calls (seek and write), it wouldn't be atomic. I don't know
if it does, or whether the problem exists in the first place. So for
all I know there are 3 possibilities:

a) Writing without O_APPEND is "synchronized" (see below) already.
   No need to do anything then.

b) Not a), and writing with O_APPEND is not synchronized either
   (perhaps because the emulation layer just does two separate
   system calls). Setting O_APPEND would be pointless then.

c) Not a), but O_APPEND is synchronized (probably because the
   emulation layer can use a system call to seek and write
   atomically). Only then it would make sense to do what you
   suggest.

> > (Though I doubt it has one, since from what I've seen, it generally
> > doesn't seem to treat files, pipes, etc.  uniformly.)
> 
> Windows does treat everything uniformly, just not the Posix way:

(I meant uniform WRT file operations. I remember how hard it was for
you to implement same_stream() with a solution that works one way
for files, another way for the console and not for the null device
etc. I heard about similar issues WRT "select", which is probably
also emulated and AIUI only works for sockets (and perhaps some
other devices), whereas on POSIX it accepts any FD. But we're
digressing, after your explanation above that's not the problem
here.)

> > > It sounds strange to me that the filesystem doesn't serialize the
> > > writes.  Maybe I'm naive.
> > 
> > I don't know the exact reasons. Perhaps it's just for efficiency, to
> > avoid synchronization by the OS for a rather special case, i.e.
> > different processes writing to the *same* file concurrently. If you
> > look at it this way, it smells like trouble because the question is,
> > how to merge the various writes.
> 
> I wasn't talking about synchronization or merging.  I was talking
> about _losing_ some of the output, which was the issue discussed here.

That's the consequence of lack of synchronization. As I understand
it, both writes take place, but at the same file position because
the offset is not updated before the 2nd write takes place. So they
overwrite each other and the 1st one gets lost.

> My interpretation of that is that the system writes to the file using
> more than a single file pointer.  And that is what sounded strange,
> because I always thought that a handle that was inherited from a
> parent process shares the same file pointer, and actually the whole
> underlying object used for the I/O, with the parent.  If that were
> true, then all the sub-makes would share the same file pointer, and we
> couldn't possibly lose any output due to overwriting.  Perhaps this
> information is outdated nowadays, though.

I think it's still true. It's just a question *when* the shared file
pointer is updated, i.e. there's apparently no critical section for
write plus seek. As I said, I don't know why this is so, maybe just
for efficiency since those who need it can work around (O_APPEND or
higher level synchronization).

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-30 Thread Eli Zaretskii
> Date: Thu, 30 May 2013 04:18:22 +0200
> Cc: psm...@gnu.org, stefano.lattar...@gmail.com, bug-make@gnu.org
> From: Frank Heckenbach 
> 
> >   . get the handle that corresponds to (e.g.) stdout
> > 
> >   . produce a new descriptor for that handle with O_APPEND flag
> > 
> >   . use dup2 to replace the original stdout descriptor with this new
> > descriptor
> 
> I see, this might work.
> 
> However, there may still be a problem. The trick about O_APPEND on
> POSIX it that it's atomic, i.e. nothing can get between moving the
> file pointer and the write, even if another process tries to write
> simultaneously. So if the POSIX emulation API emulates it with two
> system calls (seek and write), it wouldn't be atomic. I don't know
> if it does, or whether the problem exists in the first place.

The problem exists, but there's nothing that can be done about it, as
long as we use write/fwrite/fprintf for this: the call to 'write'
isn't atomic on Windows even without O_APPEND, because of the
text-mode translation of newlines to CR-LF pairs.

> b) Not a), and writing with O_APPEND is not synchronized either
>(perhaps because the emulation layer just does two separate
>system calls). Setting O_APPEND would be pointless then.

No, it's not pointless.  It makes the problem smaller.  And if the
Posix systems will do that, doing that on Windows will minimize the
number of #ifdef's, of which we have way to many already.

> > > (Though I doubt it has one, since from what I've seen, it generally
> > > doesn't seem to treat files, pipes, etc.  uniformly.)
> > 
> > Windows does treat everything uniformly, just not the Posix way:
> 
> (I meant uniform WRT file operations. I remember how hard it was for
> you to implement same_stream() with a solution that works one way
> for files, another way for the console and not for the null device
> etc.

It was not a file operation that was a problem, it was the Posix-only
concept of inodes.  Emulating a paradigm from another OS is always
hard.  I'm sure the same would happen to Posix code if it were to try
emulating native Windows ops.

> I heard about similar issues WRT "select", which is probably
> also emulated and AIUI only works for sockets (and perhaps some
> other devices), whereas on POSIX it accepts any FD.

That's because a socket is not a file on Windows.

> > I wasn't talking about synchronization or merging.  I was talking
> > about _losing_ some of the output, which was the issue discussed here.
> 
> That's the consequence of lack of synchronization.

No, it isn't.  If the same file pointer were used, there would be no
need for any synchronization, because that pointer would serialize
output by its very nature.

> I think it's still true. It's just a question *when* the shared file
> pointer is updated, i.e. there's apparently no critical section for
> write plus seek.

If there's only one pointer, this is not an issue.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-30 Thread Frank Heckenbach
Eli Zaretskii wrote:

> > From: Frank Heckenbach 
> > 
> > However, there may still be a problem. The trick about O_APPEND on
> > POSIX it that it's atomic, i.e. nothing can get between moving the
> > file pointer and the write, even if another process tries to write
> > simultaneously. So if the POSIX emulation API emulates it with two
> > system calls (seek and write), it wouldn't be atomic. I don't know
> > if it does, or whether the problem exists in the first place.
> 
> The problem exists, but there's nothing that can be done about it, as
> long as we use write/fwrite/fprintf for this: the call to 'write'
> isn't atomic on Windows even without O_APPEND, because of the
> text-mode translation of newlines to CR-LF pairs.

I didn't mean the whole write() must be atomic, just the seek and
write part. I.e., if the translation is done in user-space and then
a system call does the write and seek atomically, it might still be
OK even without O_APPEND.

> > b) Not a), and writing with O_APPEND is not synchronized either
> >(perhaps because the emulation layer just does two separate
> >system calls). Setting O_APPEND would be pointless then.
> 
> No, it's not pointless.  It makes the problem smaller.  And if the
> Posix systems will do that, doing that on Windows will minimize the
> number of #ifdef's, of which we have way to many already.

As I undestood you, you'd have to write an emulation for
fcntl (F_SETFD). That code has to be written and maintained, and it
adds a (small) runtime overhead. Not sure if that's worth saving an
#ifdef, *if* the problem doesn't actually exist.

> > > > (Though I doubt it has one, since from what I've seen, it generally
> > > > doesn't seem to treat files, pipes, etc.  uniformly.)
> > > 
> > > Windows does treat everything uniformly, just not the Posix way:
> > 
> > I heard about similar issues WRT "select", which is probably
> > also emulated and AIUI only works for sockets (and perhaps some
> > other devices), whereas on POSIX it accepts any FD.
> 
> That's because a socket is not a file on Windows.

That's what I meant. On POSIX, stdout could be a socket. It might be
unusual for make (though I actually once had a web application that
ran make to produce various files on demand, however it returned the
files made, not make's output, but it was close ... ;-). But since
this is probably not possible on Windows (if a socket is not a file,
stdout can't be connected to one), we don't have to care about this
situation there.

> > > I wasn't talking about synchronization or merging.  I was talking
> > > about _losing_ some of the output, which was the issue discussed here.
> > 
> > That's the consequence of lack of synchronization.
> 
> No, it isn't.  If the same file pointer were used, there would be no
> need for any synchronization, because that pointer would serialize
> output by its very nature.

Not sure what you mean here. The OS is not a magic box that does
anything "by its very nature". AIUI, it may contain code roughly
like this:

void write (int fd, void *data, size_t size)
{
  if (getflags (fd) & O_APPEND)
{
  lock_mutex (get_mutex (fd));
  off_t pos = get_size (fd);
  do_write (fd, pos, data, size);
  set_pos (fd, pos + size);
  unlock_mutex (get_mutex (fd));
}
  else
{
  // no mutex here!
  off_t pos = get_pos (fd);
  do_write (fd, pos, data, size);
  set_pos (fd, pos + size);
}
}

Different code paths, different behaviour.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-30 Thread Frank Heckenbach
Martin Dorey wrote:

> > That said, I'm now going back to my own programs which redirect
> > stdout in forked child processes and add O_APPEND to O_TRUNC ...
> 
> Me too!

I just realized that this also applies to the temp files created for
output-sync, since tmpfile() doesn't set O_APPEND. The problem
doesn't exist WRT parallel make jobs, since they get different temp
files, but with jobs that fork themselves, like:

foo:
echo foo & echo bar; wait

It might be hard to reproduce the problem with such a simple test
case, but since we just basically determined that it should be the
job of the caller to insure stdout is "forkable", this burden falls
on make in this case. Also it's another case that output-sync would
noticeable change the visible behaviour. So I suggest this change:

--- misc.c.orig 2013-05-31 05:06:33.0 +0200
+++ misc.c  2013-05-31 05:21:34.0 +0200
@@ -18,6 +18,12 @@
 #include "dep.h"
 #include "debug.h"
 
+#ifdef HAVE_FCNTL_H
+# include 
+#else
+# include 
+#endif
+
 /* GNU make no longer supports pre-ANSI89 environments.  */
 
 #include 
@@ -961,7 +967,7 @@
 int
 open_tmpfd ()
 {
-  int fd = -1;
+  int fd = -1, flags;
   FILE *tfile = tmpfile ();
 
   if (! tfile)
@@ -974,6 +980,12 @@
 
   fclose (tfile);
 
+  flags = fcntl (fd, F_GETFL, 0);
+  if (flags < 0)
+pfatal_with_name ("fcntl (F_GETFL)");
+  if (fcntl (fd, F_SETFL, flags | O_APPEND) < 0)
+pfatal_with_name ("fcntl (F_SETFL)");
+
   return fd;
 }
 

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-30 Thread Eli Zaretskii
> Date: Fri, 31 May 2013 05:36:24 +0200
> Cc: psm...@gnu.org, stefano.lattar...@gmail.com, bug-make@gnu.org
> From: Frank Heckenbach 
> 
> Eli Zaretskii wrote:
> 
> > > From: Frank Heckenbach 
> > > 
> > > However, there may still be a problem. The trick about O_APPEND on
> > > POSIX it that it's atomic, i.e. nothing can get between moving the
> > > file pointer and the write, even if another process tries to write
> > > simultaneously. So if the POSIX emulation API emulates it with two
> > > system calls (seek and write), it wouldn't be atomic. I don't know
> > > if it does, or whether the problem exists in the first place.
> > 
> > The problem exists, but there's nothing that can be done about it, as
> > long as we use write/fwrite/fprintf for this: the call to 'write'
> > isn't atomic on Windows even without O_APPEND, because of the
> > text-mode translation of newlines to CR-LF pairs.
> 
> I didn't mean the whole write() must be atomic, just the seek and
> write part. I.e., if the translation is done in user-space and then
> a system call does the write and seek atomically, it might still be
> OK even without O_APPEND.

But that's what I'm telling you: each chunk of text after NL to CR-LF
conversion is written separately in a separate call to WriteFile,
which is the low-level API for file I/O.

> > > b) Not a), and writing with O_APPEND is not synchronized either
> > >(perhaps because the emulation layer just does two separate
> > >system calls). Setting O_APPEND would be pointless then.
> > 
> > No, it's not pointless.  It makes the problem smaller.  And if the
> > Posix systems will do that, doing that on Windows will minimize the
> > number of #ifdef's, of which we have way to many already.
> 
> As I undestood you, you'd have to write an emulation for
> fcntl (F_SETFD). That code has to be written and maintained, and it
> adds a (small) runtime overhead. Not sure if that's worth saving an
> #ifdef, *if* the problem doesn't actually exist.

Given your next message about tmpfile, I will need that anyway.  And
the code is simple (assuming it works).

> > > I heard about similar issues WRT "select", which is probably
> > > also emulated and AIUI only works for sockets (and perhaps some
> > > other devices), whereas on POSIX it accepts any FD.
> > 
> > That's because a socket is not a file on Windows.
> 
> That's what I meant. On POSIX, stdout could be a socket.

Stdout can be a socket on Windows as well, because, under the hood,
they are both represented by handles.  See how the setup of child
processes is done in w32/subproc/sub_proc.c.

> > > > I wasn't talking about synchronization or merging.  I was talking
> > > > about _losing_ some of the output, which was the issue discussed here.
> > > 
> > > That's the consequence of lack of synchronization.
> > 
> > No, it isn't.  If the same file pointer were used, there would be no
> > need for any synchronization, because that pointer would serialize
> > output by its very nature.
> 
> Not sure what you mean here. The OS is not a magic box that does
> anything "by its very nature".

No magic needed when there's only one file pointer, because a single
file pointer can only be at one place at any given time.

> void write (int fd, void *data, size_t size)
> {
>   if (getflags (fd) & O_APPEND)
> {
>   lock_mutex (get_mutex (fd));
>   off_t pos = get_size (fd);
>   do_write (fd, pos, data, size);
>   set_pos (fd, pos + size);
>   unlock_mutex (get_mutex (fd));
> }
>   else
> {
>   // no mutex here!
>   off_t pos = get_pos (fd);
>   do_write (fd, pos, data, size);
>   set_pos (fd, pos + size);
> }
> }

If the 'else' clause uses a single file pointer system-wise, there's
no overwriting because the pointer is not moved between writes.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-31 Thread Frank Heckenbach
Eli Zaretskii wrote:

> > From: Frank Heckenbach 
> > 
> > Eli Zaretskii wrote:
> > 
> > > The problem exists, but there's nothing that can be done about it, as
> > > long as we use write/fwrite/fprintf for this: the call to 'write'
> > > isn't atomic on Windows even without O_APPEND, because of the
> > > text-mode translation of newlines to CR-LF pairs.
> > 
> > I didn't mean the whole write() must be atomic, just the seek and
> > write part. I.e., if the translation is done in user-space and then
> > a system call does the write and seek atomically, it might still be
> > OK even without O_APPEND.
> 
> But that's what I'm telling you: each chunk of text after NL to CR-LF
> conversion is written separately in a separate call to WriteFile,
> which is the low-level API for file I/O.

Actually you hadn't told me this before. I had assumed the data was
converted as a whole and then written in one go. Now what you say
makes sense to me.

So the original problem (losing some output) still might not exist,
but a different problem might exist (unintended line-wise mixup of
different outputs) and might not be fixed by O_APPEND.

> > As I undestood you, you'd have to write an emulation for
> > fcntl (F_SETFD). That code has to be written and maintained, and it
> > adds a (small) runtime overhead. Not sure if that's worth saving an
> > #ifdef, *if* the problem doesn't actually exist.
> 
> Given your next message about tmpfile, I will need that anyway.

It's the same situation (just seen from the other end), so if you
don't need O_APPEND for stdout/stderr, you won't need it for the
tmpfiles either.

> > > > > I wasn't talking about synchronization or merging.  I was talking
> > > > > about _losing_ some of the output, which was the issue discussed here.
> > > > 
> > > > That's the consequence of lack of synchronization.
> > > 
> > > No, it isn't.  If the same file pointer were used, there would be no
> > > need for any synchronization, because that pointer would serialize
> > > output by its very nature.
> > 
> > Not sure what you mean here. The OS is not a magic box that does
> > anything "by its very nature".
> 
> No magic needed when there's only one file pointer, because a single
> file pointer can only be at one place at any given time.

Yes, but not necessarily at different places at different times:

> > void write (int fd, void *data, size_t size)
> > {
> >   if (getflags (fd) & O_APPEND)
> > {
> >   lock_mutex (get_mutex (fd));
> >   off_t pos = get_size (fd);
> >   do_write (fd, pos, data, size);
> >   set_pos (fd, pos + size);
> >   unlock_mutex (get_mutex (fd));
> > }
> >   else
> > {
> >   // no mutex here!
> >   off_t pos = get_pos (fd);
> >   do_write (fd, pos, data, size);
> >   set_pos (fd, pos + size);
> > }
> > }
> 
> If the 'else' clause uses a single file pointer system-wise, there's
> no overwriting because the pointer is not moved between writes.

I still can't follow you. Just imagine this function is run by two
different processes simultaneously with the same FD without
O_APPEND. Both fetch the current position (get_pos) and get the same
value. Then both write (do_write) at this same position, overwriting
each other. Finally, both update the file pointer (set_pos), but
again, only the 2nd one becomes effective.

That's a rather typical lack-of-synchronization situation, and a
mutex around the code would fix it, because one process wouldn't be
able to get_pos before the other one has finished and done set_pos.
If I was designing a system, I might have done it this way, but fact
is POSIX doesn't mandate it, so we can't assume it. But perhaps
Windows does so, and then this problem doesn't exist there.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-31 Thread Eli Zaretskii
> Date: Fri, 31 May 2013 16:58:21 +0200
> Cc: psm...@gnu.org, stefano.lattar...@gmail.com, bug-make@gnu.org
> From: Frank Heckenbach 
> 
> > > void write (int fd, void *data, size_t size)
> > > {
> > >   if (getflags (fd) & O_APPEND)
> > > {
> > >   lock_mutex (get_mutex (fd));
> > >   off_t pos = get_size (fd);
> > >   do_write (fd, pos, data, size);
> > >   set_pos (fd, pos + size);
> > >   unlock_mutex (get_mutex (fd));
> > > }
> > >   else
> > > {
> > >   // no mutex here!
> > >   off_t pos = get_pos (fd);
> > >   do_write (fd, pos, data, size);
> > >   set_pos (fd, pos + size);
> > > }
> > > }
> > 
> > If the 'else' clause uses a single file pointer system-wise, there's
> > no overwriting because the pointer is not moved between writes.
> 
> I still can't follow you. Just imagine this function is run by two
> different processes simultaneously with the same FD without
> O_APPEND. Both fetch the current position (get_pos) and get the same
> value. Then both write (do_write) at this same position, overwriting
> each other. Finally, both update the file pointer (set_pos), but
> again, only the 2nd one becomes effective.

There's no reason for them to call get_pos.  do_write moves the
pointer as a side effect.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines

2013-05-31 Thread Frank Heckenbach
Eli Zaretskii wrote:

> > From: Frank Heckenbach 
> > 
> > > > void write (int fd, void *data, size_t size)
> > > > {
> > > >   if (getflags (fd) & O_APPEND)
> > > > {
> > > >   lock_mutex (get_mutex (fd));
> > > >   off_t pos = get_size (fd);
> > > >   do_write (fd, pos, data, size);
> > > >   set_pos (fd, pos + size);
> > > >   unlock_mutex (get_mutex (fd));
> > > > }
> > > >   else
> > > > {
> > > >   // no mutex here!
> > > >   off_t pos = get_pos (fd);
> > > >   do_write (fd, pos, data, size);
> > > >   set_pos (fd, pos + size);
> > > > }
> > > > }
> > > 
> > > If the 'else' clause uses a single file pointer system-wise, there's
> > > no overwriting because the pointer is not moved between writes.
> > 
> > I still can't follow you. Just imagine this function is run by two
> > different processes simultaneously with the same FD without
> > O_APPEND. Both fetch the current position (get_pos) and get the same
> > value. Then both write (do_write) at this same position, overwriting
> > each other. Finally, both update the file pointer (set_pos), but
> > again, only the 2nd one becomes effective.
> 
> There's no reason for them to call get_pos.  do_write moves the
> pointer as a side effect.

Not sure what we're arguing here. I'm discussing a hypothetical
implementation of a POSIX-conformant system. In my hypothetical
scenario, do_write does not move the position. Sure, there are
alternatives, I never denied that.

And even if it did, how is do_write implemented? At some point, it
must retrieve the current position, write there and update the
position. If this isn't protected, the problem exists.

Maybe this goes back to what we discussed a few weeks ago WRT the
implementation of seek. It's not like seek (explicit or implicit
during writes) moves the hard disk heads which just sit there
waiting for the next write. It updates the current file position
which is just a variable in (kernel space) memory, and write uses
this variable (among other things) to decide where to place the
data. Like all shared variables, unsynchronized concurrent access
can cause problems.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make