Re: [WORK] std.file.update function

2016-10-18 Thread Patrick Schluter via Digitalmars-d

On Tuesday, 18 October 2016 at 13:51:48 UTC, R wrote:
On Monday, 19 September 2016 at 02:57:01 UTC, Chris Wright 
wrote:


You have an operating system that automatically checksums 
every file?


There are a few filesystems that keep checksums of blocks, but 
I don't see one that keeps file checksums.


zfs , btrfs. If the checksum's accessible is anoher story.


Re: [WORK] std.file.update function

2016-10-18 Thread R via Digitalmars-d

On Monday, 19 September 2016 at 02:57:01 UTC, Chris Wright wrote:

You have an operating system that automatically checksums every 
file?


There are a few filesystems that keep checksums of blocks, but I 
don't see one that keeps file checksums.


Re: [WORK] std.file.update function

2016-09-19 Thread Walter Bright via Digitalmars-d
One way to implement it is to open the existing file as a memory-mapped file. 
Memory-mapped files only get paged into memory as the memory is referenced. So 
if you did a memcmp(oldfile, newfile, size), it will stop once the first 
difference is found, and the rest of the file is never read.


Also, only the changed pages of the memory-mapped file have to be written. On 
large files, this could be a big savings.


Re: [WORK] std.file.update function

2016-09-19 Thread Walter Bright via Digitalmars-d

On 9/19/2016 7:04 AM, Andrei Alexandrescu wrote:

On 09/19/2016 01:16 AM, Walter Bright wrote:

The compiler currently creates the complete object file in a buffer,
then writes the buffer to a file in one command. The reason is mostly
because the object file format isn't incremental, the beginning is
written last and the body gets backpatched as the compilation progresses.

Great. In that case, if the target .o file already exists, it should be compared
against the buffer. If identical, there should be no write and the timestamp of
the .o file should stay the same.


That's right. I was just referring to the idea of incrementally writing and 
comparing, which is a great idea for sequential file writing, likely won't work 
for the object file case. I think it is distinct enough to merit a separate 
library function. Note that we already have:


http://dlang.org/phobos/std_file.html#.write

Adding another "writeIfDifferent()" function would be a good thing. The range 
based incremental one should go into std.stdio.


Any case where writing is much more costly than reading (such as SSD drives you 
mentioned, and the new Seagate "archival" drives), would make your technique a 
good one. It works even for memory; I've used it in code to reduce swapping, as in:


if (*p != newvalue) *p = newvalue;


I need to re-emphasize this kind of stuff is important for tooling. Many files
get recompiled to identical object files - e.g. the many innocent bystanders in
a dense dependency structure when one module changes. We also embed
documentation in source files. Being disciplined about reflecting actual changes
in the actual file operations is very helpful for tools that track file writes
and/or timestamps.


That's right.



I can't really see a compilation producing an object file where the
first half of it matches the previous object file and the second half is
different, because of the file format.


Interesting. What happens e.g. if one makes a change to a function whose
generated code is somewhere in the middle of the object file? If it doesn't
alter the call graph, doesn't the new .o file share a common prefix with the old
one?


Two things:

1. The object file starts out with a header that contains file offsets to the 
various tables and sections. Changing the size of any of the pieces in the file 
changes the header, and will likely require moving pieces around to make room.


2. Writing an object file can mean "backpatching" what was written earlier, as a 
declaration one assumed was external turns out to be internal.




Re: [WORK] std.file.update function

2016-09-19 Thread Stefan Koch via Digitalmars-d
On Monday, 19 September 2016 at 14:04:03 UTC, Andrei Alexandrescu 
wrote:


Interesting. What happens e.g. if one makes a change to a 
function whose generated code is somewhere in the middle of the 
object file? If it doesn't alter the call graph, doesn't the 
new .o file share a common prefix with the old one?


Only if the TOC is unchanged.
There are a lot of common sections in the same order but with 
different offsets.

we would need some binary patching method.

But I am unaware of file-systems supporting this.
Microsofts incremental linking mechnism makes use of thunks so it 
can avoid changing the header iirc.


But all of this needs codegen to adept.


Re: [WORK] std.file.update function

2016-09-19 Thread Andrei Alexandrescu via Digitalmars-d

On 09/18/2016 10:05 PM, Brad Roberts via Digitalmars-d wrote:

This is nice in the case of no changes, but problematic in the case of
some changes.  The standard write new, rename technique never has either
file in a half-right state.  The file is atomically either old or new
and nothing in between.  This can be critical.


Good point, should be also part of the doco or a flag with update (e.g.
Yes.atomic). Alternative: the caller may wish to rename the file prior 
to the operation and then rename it back after the operation. -- Andrei


Re: [WORK] std.file.update function

2016-09-19 Thread Andrei Alexandrescu via Digitalmars-d

On 09/19/2016 01:16 AM, Walter Bright wrote:

On 9/18/2016 8:20 AM, Andrei Alexandrescu wrote:

On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:

Simplest case is - source file is being changed, therefore a new object
file is being produced, therefore a new executable is being produced.


Forgot to mention a situation here: if you change the source code of a
module
without influencing the object file (e.g. documentation, certain style
changes,
unittests in non-unittest builds etc) there'd be no linking upon
rebuilding. --



The compiler currently creates the complete object file in a buffer,
then writes the buffer to a file in one command. The reason is mostly
because the object file format isn't incremental, the beginning is
written last and the body gets backpatched as the compilation progresses.


Great. In that case, if the target .o file already exists, it should be 
compared against the buffer. If identical, there should be no write and 
the timestamp of the .o file should stay the same.


I need to re-emphasize this kind of stuff is important for tooling. Many 
files get recompiled to identical object files - e.g. the many innocent 
bystanders in a dense dependency structure when one module changes. We 
also embed documentation in source files. Being disciplined about 
reflecting actual changes in the actual file operations is very helpful 
for tools that track file writes and/or timestamps.



I can't really see a compilation producing an object file where the
first half of it matches the previous object file and the second half is
different, because of the file format.


Interesting. What happens e.g. if one makes a change to a function whose 
generated code is somewhere in the middle of the object file? If it 
doesn't alter the call graph, doesn't the new .o file share a common 
prefix with the old one?



Interestingly, the win32 .lib format is designed for incredibly slow
floppy disks, in that updating the library need not read/write every
disk sector.

I'd love to design our own high speed formats, but then they'd be
incompatible with everybody else's.


This (and the subsequent considerations) is drifting off-topic. This is 
about getting a useful function off the ground, and sadly is 
degenerating into yet another off-topic debate leading to no progress.



Andrei


Re: [WORK] std.file.update function

2016-09-19 Thread ketmar via Digitalmars-d

On Monday, 19 September 2016 at 06:53:47 UTC, Walter Bright wrote:
Doing a linker inside DMD means that object files imported from 
other C/C++ compilers have to be correctly interpreted. I could 
do it, but I couldn't do that and continue to work on D.


yeah. there is a reason for absense of 100500 hobbyst FOSS 
linkers. ;-) contrary to what it may look like, correct linking 
is really hard task. and mostly not fun to write too. people 
usually trying, and then just silently returning to binutils. ;-)


Re: [WORK] std.file.update function

2016-09-19 Thread Walter Bright via Digitalmars-d

On 9/18/2016 11:33 PM, Stefan Koch wrote:

However the maintenance burden is a bit heavy we don't have enough menpower as
it is.


A major part of the problem (that working with Optlink has made painfully clear) 
is that although linking is conceptually a rather trivial task, the people 
who've designed the file formats have an unending love of making trivial things 
exceedingly complicated. Furthermore, the weird things about the format are 98% 
undocumented lore.


DMD still has problems generating "correct" Dwarf debug info because its 
correctness is not defined by the spec, but by lore and the idiosyncratic way 
that gcc emits it.


Doing a linker inside DMD means that object files imported from other C/C++ 
compilers have to be correctly interpreted. I could do it, but I couldn't do 
that and continue to work on D.


Re: [WORK] std.file.update function

2016-09-19 Thread Stefan Koch via Digitalmars-d

On Monday, 19 September 2016 at 05:16:37 UTC, Walter Bright wrote:


I'd love to design our own high speed formats, but then they'd 
be incompatible with everybody else's.


I'd like that as well.

I recently had a look at the ELF and the COFF file formats both 
are definitely in need of rework and dust-off :-)


There are some nice things we could do if we had certain features 
on every platform, wrt. linking and symbol-tables.


However the maintenance burden is a bit heavy we don't have 
enough menpower as it is.


Re: [WORK] std.file.update function

2016-09-19 Thread Jacob Carlborg via Digitalmars-d

On 2016-09-19 07:16, Walter Bright wrote:


I'd love to design our own high speed formats, but then they'd be
incompatible with everybody else's.


You already mentioned in an other post [1] that the compiler could do 
the linking as well. In that case you would need to write some form of 
linker. Then I suggest to develop the linker as a library, supporting 
all formats DMD currently supports. The library can be used both 
directly from DMD and to build an external linker. When we have our own 
linker we could create our own format too without having to worry about 
compatibility.


I guess we need to create other tools for the new format as well, like 
object dumpers. But I assume that's a natural thing to do anyway.


Bundle that with something like musl libc and we will have our own 
complete tool chain. It would also be easier to add support for 
cross-compiling.


[1] http://forum.dlang.org/post/nrnsn7$1h3k$1...@digitalmars.com

--
/Jacob Carlborg


Re: [WORK] std.file.update function

2016-09-18 Thread Walter Bright via Digitalmars-d

On 9/18/2016 7:05 PM, Brad Roberts via Digitalmars-d wrote:

This is nice in the case of no changes, but problematic in the case of some
changes.  The standard write new, rename technique never has either file in a
half-right state.  The file is atomically either old or new and nothing in
between.  This can be critical.


As for compilation, I bet considerable speed increases could be had by never 
writing object files at all. (Not only does it save the read/write file time, 
but it saves the encoding into the object file format and decoding of that 
format.) Have the compiler do the linking directly.


dmd already does this for generating library files directly, and it's been very 
successful (although sometimes I suspect nobody has noticed(!) which is actually 
a good thing). It took surprisingly little code to make that work, though doing 
a link step would be far more work.


Re: [WORK] std.file.update function

2016-09-18 Thread Walter Bright via Digitalmars-d

On 9/18/2016 8:20 AM, Andrei Alexandrescu wrote:

On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:

Simplest case is - source file is being changed, therefore a new object
file is being produced, therefore a new executable is being produced.


Forgot to mention a situation here: if you change the source code of a module
without influencing the object file (e.g. documentation, certain style changes,
unittests in non-unittest builds etc) there'd be no linking upon rebuilding. --



The compiler currently creates the complete object file in a buffer, then writes 
the buffer to a file in one command. The reason is mostly because the object 
file format isn't incremental, the beginning is written last and the body gets 
backpatched as the compilation progresses.


I can't really see a compilation producing an object file where the first half 
of it matches the previous object file and the second half is different, because 
of the file format.


Interestingly, the win32 .lib format is designed for incredibly slow floppy 
disks, in that updating the library need not read/write every disk sector.


I'd love to design our own high speed formats, but then they'd be incompatible 
with everybody else's.


Re: [WORK] std.file.update function

2016-09-18 Thread Chris Wright via Digitalmars-d
On Mon, 19 Sep 2016 04:24:41 +1200, rikki cattermole wrote:

> On 19/09/2016 3:41 AM, Andrei Alexandrescu wrote:
>> On 9/18/16 11:24 AM, rikki cattermole wrote:
>>> On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:
 On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:
> Simplest case is - source file is being changed, therefore a new
> object file is being produced, therefore a new executable is being
> produced.

 Forgot to mention a situation here: if you change the source code of
 a module without influencing the object file (e.g. documentation,
 certain style changes, unittests in non-unittest builds etc) there'd
 be no linking upon rebuilding. -- Andrei
>>>
>>> How does this compare against doing a checksum comparison on the file?
>>
>> Favorably :o). -- Andrei
> 
> Confirmed in doing the checksum myself.
> However I have not compared against OS provided checksum.

You have an operating system that automatically checksums every file?


Re: [WORK] std.file.update function

2016-09-18 Thread Brad Roberts via Digitalmars-d

On 9/18/2016 8:17 AM, Andrei Alexandrescu via Digitalmars-d wrote:

There is actually an even better way at the application level. Consider
a function in std.file:

updateS, Range)(S name, Range data);

updateFile does something interesting: it opens the file "name" for
reading AND writing, then reads data from the Range _and_ the file. For
as long as the data and the contents in the file agree, it just moves
reading along. At the first difference between the data and the file
contents, starts writing the data into the file through the end of the
range.

So this makes zero writes (and leaves the "last modified time" intact)
if the file has the same content as the data. Better yet, if it so
happens that the file and the data have the same prefix, there's less
writing going on, which IIRC is faster for most filesystems. Saving on
writes happens to be particularly nice on new solid-state drives.

Who wants to take this with testing, measurements etc? It's a cool mini
project.


Andrei


This is nice in the case of no changes, but problematic in the case of 
some changes.  The standard write new, rename technique never has either 
file in a half-right state.  The file is atomically either old or new 
and nothing in between.  This can be critical.


Re: [WORK] std.file.update function

2016-09-18 Thread rikki cattermole via Digitalmars-d

On 19/09/2016 3:41 AM, Andrei Alexandrescu wrote:

On 9/18/16 11:24 AM, rikki cattermole wrote:

On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:

On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:

Simplest case is - source file is being changed, therefore a new object
file is being produced, therefore a new executable is being produced.


Forgot to mention a situation here: if you change the source code of a
module without influencing the object file (e.g. documentation, certain
style changes, unittests in non-unittest builds etc) there'd be no
linking upon rebuilding. -- Andrei


How does this compare against doing a checksum comparison on the file?


Favorably :o). -- Andrei


Confirmed in doing the checksum myself.
However I have not compared against OS provided checksum.



Re: [WORK] std.file.update function

2016-09-18 Thread Andrei Alexandrescu via Digitalmars-d

On 9/18/16 12:15 PM, Chris Wright wrote:

This will produce different behavior with hard links. With hard links,
the temporary file mechanism you mention will result in the old file
being accessible via the other path. With your recommended strategy, the
data accessible from both paths is updated.

That's probably acceptable, and hard links aren't used that much anyway.


Awesome, this should be part of the docs.


Obviously, if you have to overwrite large portions of the file, it's
going to be faster to just write it. This is just for cases when you can
get speedups down the line by not updating write timestamps, or when you
know a large portion of the file is unchanged and the file is cached, or
you're using a disk that sucks at writing data.


That's exactly right, and such considerations should also go in the 
function documentation. Wanna go for it?



Andrei



Re: [WORK] std.file.update function

2016-09-18 Thread Chris Wright via Digitalmars-d
This will produce different behavior with hard links. With hard links, 
the temporary file mechanism you mention will result in the old file 
being accessible via the other path. With your recommended strategy, the 
data accessible from both paths is updated.

That's probably acceptable, and hard links aren't used that much anyway.

Obviously, if you have to overwrite large portions of the file, it's 
going to be faster to just write it. This is just for cases when you can 
get speedups down the line by not updating write timestamps, or when you 
know a large portion of the file is unchanged and the file is cached, or 
you're using a disk that sucks at writing data.


Re: [WORK] std.file.update function

2016-09-18 Thread Andrei Alexandrescu via Digitalmars-d

On 9/18/16 11:24 AM, rikki cattermole wrote:

On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:

On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:

Simplest case is - source file is being changed, therefore a new object
file is being produced, therefore a new executable is being produced.


Forgot to mention a situation here: if you change the source code of a
module without influencing the object file (e.g. documentation, certain
style changes, unittests in non-unittest builds etc) there'd be no
linking upon rebuilding. -- Andrei


How does this compare against doing a checksum comparison on the file?


Favorably :o). -- Andrei



Re: [WORK] std.file.update function

2016-09-18 Thread Stefan Koch via Digitalmars-d
On Sunday, 18 September 2016 at 15:17:31 UTC, Andrei Alexandrescu 
wrote:
There are quite a few situations in rdmd and dmd generally when 
we compute a dependency structure over sets of files. Based on 
that, we write new files that overwrite old, obsoleted files. 
Those changes in turn trigger other dependencies to go stale so 
more building is done etc.


If so we need it in druntime.

Introducing phobos into ddmd is still considered a nono.

Personally I am pretty torn, without range-specific optimizations 
in dmd they do incur more overhead then they should.


Re: [WORK] std.file.update function

2016-09-18 Thread rikki cattermole via Digitalmars-d

On 19/09/2016 3:20 AM, Andrei Alexandrescu wrote:

On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:

Simplest case is - source file is being changed, therefore a new object
file is being produced, therefore a new executable is being produced.


Forgot to mention a situation here: if you change the source code of a
module without influencing the object file (e.g. documentation, certain
style changes, unittests in non-unittest builds etc) there'd be no
linking upon rebuilding. -- Andrei


How does this compare against doing a checksum comparison on the file?



Re: [WORK] std.file.update function

2016-09-18 Thread Andrei Alexandrescu via Digitalmars-d

On 09/18/2016 11:17 AM, Andrei Alexandrescu wrote:

Simplest case is - source file is being changed, therefore a new object
file is being produced, therefore a new executable is being produced.


Forgot to mention a situation here: if you change the source code of a 
module without influencing the object file (e.g. documentation, certain 
style changes, unittests in non-unittest builds etc) there'd be no 
linking upon rebuilding. -- Andrei




[WORK] std.file.update function

2016-09-18 Thread Andrei Alexandrescu via Digitalmars-d
There are quite a few situations in rdmd and dmd generally when we 
compute a dependency structure over sets of files. Based on that, we 
write new files that overwrite old, obsoleted files. Those changes in 
turn trigger other dependencies to go stale so more building is done etc.


Simplest case is - source file is being changed, therefore a new object 
file is being produced, therefore a new executable is being produced. 
And it only gets more involved.


We've discussed before using a simple method to avoid unnecessary stale 
dependencies when it's possible that a certain file won't, in fact, 
change contents:


1. Do all work on the side in a separate file e.g. file.ext.tmp

2. Compare the new file with the old file file.ext

3. If they're identical, delete file.ext.tmp; otherwise, rename 
file.ext.tmp into file.ext


There is actually an even better way at the application level. Consider 
a function in std.file:


updateS, Range)(S name, Range data);

updateFile does something interesting: it opens the file "name" for 
reading AND writing, then reads data from the Range _and_ the file. For 
as long as the data and the contents in the file agree, it just moves 
reading along. At the first difference between the data and the file 
contents, starts writing the data into the file through the end of the 
range.


So this makes zero writes (and leaves the "last modified time" intact) 
if the file has the same content as the data. Better yet, if it so 
happens that the file and the data have the same prefix, there's less 
writing going on, which IIRC is faster for most filesystems. Saving on 
writes happens to be particularly nice on new solid-state drives.


Who wants to take this with testing, measurements etc? It's a cool mini 
project.



Andrei