On Fri, 2015-03-27 at 14:42 +0100, Glen Stark wrote: > Is this planned? Has the idea already been rejected, and if so could > you point me to the discussion so I can inform myself?
There is no formal planning around it right now, and it's not at the top of my TODO list for GNU make. > If it is planned, or you agree it's worth doing, how can I help? I'm > willing to write the code if someone is willing to help me work into the > code a little. Until now I'm only a user, not maintainer of Make, and > would need some tips about how to fit the functionality into the overall > design of Make. Someone to bounce ideas off, and direct questions to > would be wonderful. If someone else is working on it already, I'd like > to help however I can -- testing, debugging, etc. I'm not aware of anyone working on it. It sounds like a simple thing, but actually there are a lot of issues that need to be considered before any implementation can be started. The important thing to remember is that currently make is completely stateless... or rather, it uses the filesystem to maintain its state (in the form of modification times). Any change to a method of determining "out-of-date-ness" such as a hash of the file content means introducing a separate state that make has to maintain: this adds a lot of complexity and corner cases to work through. Before anyone can consider writing code of this magnitude, they should familiarize themselves with the FSF's requirements for contributing to the GNU project; you'll need to assign copyright to the FSF for the work contributed to GNU make, which involves some legal paperwork on your part and, if your employer has rights to your work which most do, at least in the U.S., even if you don't do the work on the job, your employer will have to agree as well. On the technical side, there are various things to consider: * What form will the extra state be kept in? One file per directory? One file per target? Something else? * If we use one file per target things are simpler, although that adds up to a LOT of files in bigger builds and some platforms might have problems. * If we use one file per directory, there are lots of issues: * When is the file written? Every time a target is updated? Once at the end of the build? * How will make handle the state file if it's killed in the middle of a build? * How will make handle missing/corrupted state files? Will it fall back on modification times, or just rebuild everything? * How do we handle recursion, where multiple instances of make could be running in the same directory? * We need to consider platform-specific issues; for example on UNIX systems a cheap/fast method of keeping per-file metadata might be to make a symbolic link containing the data, but that won't work on Windows or VMS, etc. * What type of extra state will we use? My suspicion is that md5sum is not the best. We don't really need it: we want fingerprinting not a cryptographic hash. We don't even need to do de-dup so we won't run into the birthday paradox: we only want to know if the file has changed since the last time we saw it. Probably a straightforward, well-distributed hash like xxhash would be sufficient. If you combine both mod time AND the hash that's pretty definitive; you can probably get away with a 32bit hash. * What are the performance implications? You're committing to having make read the entire content of every single file involved in the build into memory, just to decide what to update! That's definitely going to hurt: a simple "nothing to do" build will suffer a big performance penalty. In fact, in a way the fewer jobs make needs to run the slower it will be, since it will have to check the hash of every target where the mod time doesn't give an answer. Maybe the hashing could be done per-block instead of on the entire file so you could fail faster, or something. But now you're storing more state per target (multiple hashes per target). * Do we really need to hash the file? Maybe simply expanding the current checking is sufficient. For example, if in addition to mod time we also considered the size of the file (and maybe other things maintained by the filesystem like inode, for tools which don't just overwrite the same file) we could increase our accuracy WITHOUT resorting to a separate state file. Is that good enough? * What if people want to define their own "out-of-date-ness" test? Maybe someone wants to integrate with inotify, or they want to check the preprocessor output so that files are not considered changed just because a comment changes, or something. _______________________________________________ Bug-make mailing list Bug-make@gnu.org https://lists.gnu.org/mailman/listinfo/bug-make