Re: [ccache] Why not cache link commands?
On 19/09/12 13:18, Eitan Adler wrote: Under what circumstances can the binary change but the build-id remain the same? I'm aware of line number, and file path differences in the debug info. Is there anything else? differing -frandom-seed options perhaps? If you've changed the command line then you've already got a cache miss, but yes, that sounds plausible. Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
On 19 September 2012 05:43, Andrew Stubbs wrote: > On 18/09/12 22:59, Mike Frysinger wrote: >> >> the linker's --build-id and associated .note.gnu.build-id section. you >> can't >> hash the entire object because it can change between compiles. build-id >> lets >> you say "regardless of the hash of the entire object, we know the content >> that >> matters is unchanged". > > > Ah, excellent, this is the sort of detail I was looking for! > > My own brief experimentation shows that static libraries contain troublesome > datestamps, but object files appear to be reproducible, given the same > source and command line (the case ccache handles). > > Under what circumstances can the binary change but the build-id remain the > same? I'm aware of line number, and file path differences in the debug info. > Is there anything else? differing -frandom-seed options perhaps? -- Eitan Adler ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
On 18/09/12 22:59, Mike Frysinger wrote: the linker's --build-id and associated .note.gnu.build-id section. you can't hash the entire object because it can change between compiles. build-id lets you say "regardless of the hash of the entire object, we know the content that matters is unchanged". Ah, excellent, this is the sort of detail I was looking for! My own brief experimentation shows that static libraries contain troublesome datestamps, but object files appear to be reproducible, given the same source and command line (the case ccache handles). Under what circumstances can the binary change but the build-id remain the same? I'm aware of line number, and file path differences in the debug info. Is there anything else? Anyway, as I understand it, ccache could dump the build-id section first, if there is one, and hash the entire binary second, if there isn't one. I'm a bit concerned about the build-id though. As I read it, the build-id can't tell the difference between a stripped binary and one with full debug, and the two certainly produce different output (OK, a *very* smart tool could determine that, with a certain link command or script, two different inputs are equivalent, but let's not go there). It can't even tell the difference between an object with *only* debug. Hashing the entire binary could lead to additional cache misses in the case that the user has made minor, unimportant changes to the build, but in the normal case the object file will have come from the cache anyway so this won't be a problem. The library datestamps problem can be got around by hashing the output of "ar p libNAME.a" (perhaps combined with "ar t libNAME.a", just to be safe, but certainly not with "-v"), or perhaps "objdump -j .note.gnu.build-id -s libNAME.a" if we want to use build-ids. "-###" isn't meant to be a wildcard. That's an actual GCC option. I put quotes around it because most shells would interpret the hashes as the start of a comment. hmm, gotcha. it does seem to include all the necessary info. whether it's easy for a machine to parse across gcc versions is a diff question :). seems to have changed subtly over time between 3.3.6 and 4.7.1. Probably true, but it ought to be possible to determine if we do understand it, or not, and fall back to the old behaviour if not. Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
On Tuesday 18 September 2012 17:07:53 Andrew Stubbs wrote: > On 18/09/12 21:04, Mike Frysinger wrote: > > On Tuesday 18 September 2012 08:44:29 Andrew Stubbs wrote: > >> Clearly there are some technical challenges in doing this: we'd have to > >> hash all the object files and libraries (a la direct mode), but those > >> problems are surmountable, I think. > > > > or just re-use build-id ... > > Sorry, I'm probably being thick, but what do you mean? the linker's --build-id and associated .note.gnu.build-id section. you can't hash the entire object because it can change between compiles. build-id lets you say "regardless of the hash of the entire object, we know the content that matters is unchanged". > >> The linker does not use any libraries not listed with "gcc '-###' > >> whatever". > > > > mmm different gcc flags can implicitly expand into -l### or different crt > > objects, so you can't cache linking at the compiler driver level w/out > > re- implementing much of the guts of gcc, and even then you'd break with > > moderately patched gcc versions. > > "-###" isn't meant to be a wildcard. That's an actual GCC option. I put > quotes around it because most shells would interpret the hashes as the > start of a comment. hmm, gotcha. it does seem to include all the necessary info. whether it's easy for a machine to parse across gcc versions is a diff question :). seems to have changed subtly over time between 3.3.6 and 4.7.1. > >> I'm also aware that it's not that interesting for many incremental > >> builds, where the final link will always be different, but my use case > >> is accelerating rebuilds of projects that my have many outputs, most of > >> which are likely to be unaffected by small code changes. It's also worth > >> noting that incremental builds are not the target use case for ccache in > >> general. > > > > gold should already support incremental linking (ala build-id), so i > > don't think that's already a fixed problem err, typo here. s/don't//. > As I said, the interesting use case is *not* incremental links. The > interesting use case is accelerating "clean" builds. ccache can never > help where genuinely new inputs are involved. right, i was just agreeing with you and providing more details as to how it already works today. -mike signature.asc Description: This is a digitally signed message part. ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
On 18/09/12 21:04, Mike Frysinger wrote: On Tuesday 18 September 2012 08:44:29 Andrew Stubbs wrote: Clearly there are some technical challenges in doing this: we'd have to hash all the object files and libraries (a la direct mode), but those problems are surmountable, I think. or just re-use build-id ... Sorry, I'm probably being thick, but what do you mean? The linker does not use any libraries not listed with "gcc '-###' whatever". mmm different gcc flags can implicitly expand into -l### or different crt objects, so you can't cache linking at the compiler driver level w/out re- implementing much of the guts of gcc, and even then you'd break with moderately patched gcc versions. "-###" isn't meant to be a wildcard. That's an actual GCC option. I put quotes around it because most shells would interpret the hashes as the start of a comment. "-###" causes gcc to print the commands that it would run, including the link line (well, collect2, but same difference). We can read that and bypass reimplementing all of gcc. As you say, without this feature we couldn't predict what gcc will do: the compiler wouldn't even need to be patched if customer specs files were used. I'm also aware that it's not that interesting for many incremental builds, where the final link will always be different, but my use case is accelerating rebuilds of projects that my have many outputs, most of which are likely to be unaffected by small code changes. It's also worth noting that incremental builds are not the target use case for ccache in general. gold should already support incremental linking (ala build-id), so i don't think that's already a fixed problem As I said, the interesting use case is *not* incremental links. The interesting use case is accelerating "clean" builds. ccache can never help where genuinely new inputs are involved. Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
On Tuesday 18 September 2012 08:44:29 Andrew Stubbs wrote: > Clearly there are some technical challenges in doing this: we'd have to > hash all the object files and libraries (a la direct mode), but those > problems are surmountable, I think. or just re-use build-id ... > The linker does not use any libraries not listed with "gcc '-###' whatever". mmm different gcc flags can implicitly expand into -l### or different crt objects, so you can't cache linking at the compiler driver level w/out re- implementing much of the guts of gcc, and even then you'd break with moderately patched gcc versions. > I'm also aware that it's not that interesting for many incremental > builds, where the final link will always be different, but my use case > is accelerating rebuilds of projects that my have many outputs, most of > which are likely to be unaffected by small code changes. It's also worth > noting that incremental builds are not the target use case for ccache in > general. gold should already support incremental linking (ala build-id), so i don't think that's already a fixed problem -mike signature.asc Description: This is a digitally signed message part. ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
On 18/09/12 16:37, Justin Lebar wrote: ldcache would hash object files and spit out linked files. It would use an entirely separate cache. Its handling of command-line options would be entirely different. Its processing of input files would be entirely different. ISTM that very little would be shared. It takes multiple input files and returns a single output file, plus stderr. This much is the same. An input object file is just as hashable as an input header file, you just find them a different way. I think the manifest file would need little or no modification. Similarly, the output file is just as cacheable. There's probably no need to even use a different suffix in the cache. I've yet to get into the precise details, but I think the file discovery mechanism would need to be abstracted out a little, but that's the biggest change. The command line parsing would need a once over, of course. The biggest change there is that it's more normal to list multiple input files on the command line, and there's no "language" to determine. Since this is targeting a niche use-case and is a large change to ccache, I'd be hesitant to take this change upstream, if I were Joel. Right, as little churn as possible, and no extra overhead in the most common cases. Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
> What I'm looking for is more concrete > roadblocks I haven't considered. You'd basically have to rewrite all of ccache. ccache hashes header files and spits out object files. ldcache would hash object files and spit out linked files. It would use an entirely separate cache. Its handling of command-line options would be entirely different. Its processing of input files would be entirely different. ISTM that very little would be shared. Since this is targeting a niche use-case and is a large change to ccache, I'd be hesitant to take this change upstream, if I were Joel. -Justin On Tue, Sep 18, 2012 at 11:27 AM, Andrew Stubbs wrote: > On 18/09/12 15:31, Justin Lebar wrote: >>> >>> So, again, before I waste my time implementing this feature, are there >>> any >>> other fundamental gotchas that would prevent it ever working or ever >>> being >>> useful? >> >> >> On a large project with many inputs to ld, you'd have to hash a /lot/ >> of object files, increasing the overhead of ccache substantially. I >> understand that this isn't your particular use-case, but it's the >> common one. > > > Yes, that's true, but those are also the most expensive link commands, so > maybe it's not so bad. > > I realise that there's some risk that a cache miss can be expensive, and > that a cache hit might be only a very little cheaper than the real link, but > I'm prepared to take that risk. What I'm looking for is more concrete > roadblocks I haven't considered. > > Incidentally, I'm also considering the possibility of caching the hashes and > using the inode/size/mtime etc. to short-cut that process (perhaps as a > "sloppiness" option), not only for objects, but also for sources. > > >> If you're on Linux, have you tried the gold linker? > > > Let's limit this discussion to what can be done with ccache, please. I > assure you, we know about the toolchain options. > > Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
On 18/09/12 15:31, Justin Lebar wrote: So, again, before I waste my time implementing this feature, are there any other fundamental gotchas that would prevent it ever working or ever being useful? On a large project with many inputs to ld, you'd have to hash a /lot/ of object files, increasing the overhead of ccache substantially. I understand that this isn't your particular use-case, but it's the common one. Yes, that's true, but those are also the most expensive link commands, so maybe it's not so bad. I realise that there's some risk that a cache miss can be expensive, and that a cache hit might be only a very little cheaper than the real link, but I'm prepared to take that risk. What I'm looking for is more concrete roadblocks I haven't considered. Incidentally, I'm also considering the possibility of caching the hashes and using the inode/size/mtime etc. to short-cut that process (perhaps as a "sloppiness" option), not only for objects, but also for sources. If you're on Linux, have you tried the gold linker? Let's limit this discussion to what can be done with ccache, please. I assure you, we know about the toolchain options. Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Why not cache link commands?
> So, again, before I waste my time implementing this feature, are there any > other fundamental gotchas that would prevent it ever working or ever being > useful? On a large project with many inputs to ld, you'd have to hash a /lot/ of object files, increasing the overhead of ccache substantially. I understand that this isn't your particular use-case, but it's the common one. If you're on Linux, have you tried the gold linker? -Justin On Tue, Sep 18, 2012 at 8:44 AM, Andrew Stubbs wrote: > Hi all, again, > > I've just posted about improving compile speed by caching compiler failures, > and in the same vein I'd like to consider caching called-for-link compile > tasks. > > This is partly interesting for the many small autoconf tests, but is also > increasingly interesting for real compilations, now that > whole-program-optimization and link-time-optimization is more available in > GCC. Even without all this link-time compilation activity, there are some > link operations that simply take forever, mostly due to large file sizes. > > Clearly there are some technical challenges in doing this: we'd have to hash > all the object files and libraries (a la direct mode), but those problems > are surmountable, I think. The linker does not use any libraries not listed > with "gcc '-###' whatever". > > I'm also aware that it's not that interesting for many incremental builds, > where the final link will always be different, but my use case is > accelerating rebuilds of projects that my have many outputs, most of which > are likely to be unaffected by small code changes. It's also worth noting > that incremental builds are not the target use case for ccache in general. > > So, again, before I waste my time implementing this feature, are there any > other fundamental gotchas that would prevent it ever working or ever being > useful? > > Has anybody else ever tried to do this? Is anybody trying to do it now? > > Thanks > > Andrew > ___ > ccache mailing list > ccache@lists.samba.org > https://lists.samba.org/mailman/listinfo/ccache ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
[ccache] Why not cache link commands?
Hi all, again, I've just posted about improving compile speed by caching compiler failures, and in the same vein I'd like to consider caching called-for-link compile tasks. This is partly interesting for the many small autoconf tests, but is also increasingly interesting for real compilations, now that whole-program-optimization and link-time-optimization is more available in GCC. Even without all this link-time compilation activity, there are some link operations that simply take forever, mostly due to large file sizes. Clearly there are some technical challenges in doing this: we'd have to hash all the object files and libraries (a la direct mode), but those problems are surmountable, I think. The linker does not use any libraries not listed with "gcc '-###' whatever". I'm also aware that it's not that interesting for many incremental builds, where the final link will always be different, but my use case is accelerating rebuilds of projects that my have many outputs, most of which are likely to be unaffected by small code changes. It's also worth noting that incremental builds are not the target use case for ccache in general. So, again, before I waste my time implementing this feature, are there any other fundamental gotchas that would prevent it ever working or ever being useful? Has anybody else ever tried to do this? Is anybody trying to do it now? Thanks Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache