Hi Janne,
On 10/17/2011 05:30 PM, Janne Blomqvist wrote:
On Mon, Oct 17, 2011 at 15:49, Tobias Burnus<bur...@net-b.de> wrote:
This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine and
the FLUSH statement. It removes the _commit from gfortran's buf_flush.
Like I argued in this message
http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross
mistake.
[...]
And I think it is a mistake to not make the data available to other
processes as it is indicated by the Fortran 2008 standard:
"Execution of a FLUSH statement causes data written to an external le
to be available to other processes, or causes data placed in an external
file by means other than Fortran to be available to a READ statement.
These actions are processor dependent."
Thus, I think it makes sense for FLUSH to call _commit on Windows.
If you don't want to have a slow down: Simply do not call FLUSH.
libgfortran should not require _commit nor fsync in any situation. Those calls
are useful for writing databases and other applications which must make data
integrity guarantees, and are prepared to pay the performance cost associated
with it. It's absolutely not something a language support library should do
unless the language spec explicitly requires such data integrity guarantees.
Well, Fortran does not need to write the data to the file, however, the
purpose of FLUSH is that I can, e.g., run execute_command_line with the
file the program just has written. It will work on Unix/Linux but not on
MinGW/MinGW-w64 without a _commit (or without closing the file).
That write() would be buffered on windows makes no sense to me
Why shouldn't it be buffed? Typical Windows programs open files with an
exclusive lock and as Windows never had the pipes and many small
programs as Unix did, having a per-file-descriptor buffer is easier to
implement, avoids multi-thread issues and is potentially faster. If a
program wants to make the data available, it can just _commit it or
close the file handle - that way one also has a perfect data integrity.
And, while I'm at it, this kind of "relaxed consistency" is not
unheard of in the unix world either. Consider NFS, where data and
metadata may not be flushed to the server until fsync() or close() is
called, or the attribute cache timeout forces the writeout(?), and
thus it's possible for clients to have an inconsistent view of a file.
Well, most of the time it works well on the same system: If I call
execute_command_line, the data is up to date. The issue with NFS only
occurs if I want to access the data remotely, which is another issue. If
one wants to do that, one can use a parallel access with, e.g., HDF5 or
MPIv2 or the Coarray TS (to be written and implemented).
In both cases the remedy is the same; if this kind of consistency matters, the
user should close the file or fsync()/_commit() before expecting that the OS
metadata is consistent. I think that's a better option than sprinkling
_commit() all over the library.
No, for the required consistency, FLUSH is enough (including calling
_commit on MinGW/MinGW-w64). It makes sure that if the program crashes,
the data is still there, it makes the data available for other processes.
Only if one wants to have complete integrity, one can call fsync.
However, with NFS, Lustre et al., I am not 100% sure that the data is
immediately available on all other clients after fsync returned.
So I would rather prefer my own patch from the URL above. Also, I
think it would be nice if we could get this fix into 4.6.2..
I also would like to see this fixed for 4.6.2. However, a prerequisite
is that we agree on how to implement it.
Regarding your patch: I think it does not solve the FLUSH issue. For the
file size itself, I think the patch is okay, but frankly, I think for
the performance it does not really matter which approach is taken. And I
do not like the test-suite part of your patch.
Tobias