On Thu, Feb 26, 2015 at 3:17 PM, Ben Langmuir <[email protected]> wrote:
> On Feb 26, 2015, at 2:49 PM, Richard Smith <[email protected]> wrote: > > On Thu, Feb 26, 2015 at 8:30 AM, Ben Langmuir <[email protected]> wrote: > >> >> On Feb 25, 2015, at 6:20 PM, Richard Smith <[email protected]> wrote: >> >> On Tue, Feb 24, 2015 at 7:55 PM, Ben Langmuir <[email protected]> >> wrote: >> >>> >>> On Feb 24, 2015, at 6:57 PM, Richard Smith <[email protected]> >>> wrote: >>> >>> On Mon, Feb 9, 2015 at 12:35 PM, Ben Langmuir <[email protected]> >>> wrote: >>> >>>> Author: benlangmuir >>>> Date: Mon Feb 9 14:35:13 2015 >>>> New Revision: 228604 >>>> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=228604&view=rev >>>> Log: >>>> Diagnose timeouts in the LockFileManager and delete the dead lock file >>>> >>>> If the lock file manager times out, we should give an error rather than >>>> silently trying to load the existing module. And delete the >>>> (presumably) dead lock file, since it will otherwise prevent progress in >>>> future invokations. This is unsound since we have no way to prove that >>>> the lock file we are deleting is the same one we timed out on, but since >>>> the lock is only to avoid excessive rebuilding anyway it should be okay. >>>> Depends on llvm r228603. >>>> >>> >>> This diagnostic fires a *lot* when, say, bootstrapping Clang with >>> modules enabled at -j16. Is the timeout perhaps too short? >>> >>> >>> I lowered the timeout to 1 minute in my llvm commit, >>> >> >> Well, technically, it went from 8m 44s to 65.5s due to the exponential >> backoff... :-/ >> >> >> Heh, good point. >> >> >> so if it helps, feel free to bump it back up. I’m not particularly >>> concerned about the actual number now that we will actually honour the >>> limit, but the old value of 5 minutes seemed really long to me. >>> >> >> Really? Consider a Debug+Asserts clang, building a large module (such as >> the Clang_AST module). Requiring that to succeed in 65.5 seconds doesn't >> seem particularly reasonable (and, indeed, some such modules do not build >> in that environment in 65.5s on my machine, at least when it's heavily >> loaded). >> >> >> That sucks. >> >> >> But I'm not really sure that a timeout is the right approach here at all. >> What we want to track is whether the creating process is still alive. >> Perhaps we should have that process periodically bump the timestamp on the >> lock file, and assume it's died if the timestamp is too far in the past? >> This'd mean that in the (scary) case of sharing a module cache across a >> build farm, you'd need your machines to not have too much clock skew, but >> that's already necessary for build systems that use timestamps to determine >> what to rebuild, so it doesn't seem like a ridiculous restriction. >> >> Thoughts? >> >> >> I think what you’re proposing is a better way to track whether the >> creator is alive than what we currently do (looking at the hostname+pid). >> We could eliminate the clock-sync dependence if the waiting process set up >> its own local timer and then reset it if it sees the timestamp of the lock >> file has changed (ie. we don’t care what the value is, only whether it >> changes). But your suggestion is simpler and automatically handles a new >> process waiting on a long-dead file in a good way. >> >> At some level I like that the timeout gives you feedback when a >> compilation takes much longer than expected. How about this: in addition >> to tracking process liveness, we also use a timeout. When the timeout is >> reached, we emit a warning that the module build is taking a long time. We >> then forcibly delete the lock file and try again to acquire the lock >> ourselves. This has the added advantage of combatting priority inversion. >> If we reach the timeout again (maybe several times?), we emit an error and >> fail. What do you think? >> > > For the most part, this sounds great to me. I'm not entirely convinced > that it makes sense to remove the lock and retry if we can see that another > process is periodically touching it. > > > Okay, but right now any time we delete a lock file we may accidentally > delete a valid lock because of races. I haven’t found a good way to avoid > the window between checking a lock file is invalid and deleting that file > (which is done by name). > So long as we're conservatively correct in the case of a race, and the probability is low enough, I don't think this is a huge problem (IIUC, we should get a signature mismatch in any downstream modules if this goes wrong, and end up triggering a bunch of additional rebuilds, but we should converge to a stable set of module files in the end). The proposed direction reduces the probability of a race (we only delete the file and retry if the producing end appears to not be updating it), so it seems like an improvement. I think this is just part of the cost we pay by providing an implicit module building system and a transparent upgrade path; we do also provide the ability to explicitly build modules, which avoids all these issues. (That said, we don't yet have a non-cc1 way of building a module explicitly, and we really should add one.) > I think issuing a remark after some timeout (1 minute, maybe?), and > possibly failing after a longer timeout, may be reasonable. > > > I was thinking a warning, but I guess -Werror makes that a bad idea. > Failing instead of clearing the lock file is okay with me. > > > We'd need a mechanism to turn off the second timeout (for cases where > modules simply take a long time to build for whatever reason) and possibly > to turn off the first timeout (to ensure we provide deterministic output). > > > -Rmodule-timeout-soft > -Wmodule-timeout (DefaultError) > > We could theoretically make a -W group that included both for convenience, > although that might be weird with a remark... > We want to bail out of compilation when we hit the timeout, so a -W flag doesn't seem like the right model to me (it would need to control more than merely whether we emit the diagnostic). How about -fimplicit-modules-build-timeout=<time in s>, following the usual convention that 0 means "no limit"? Another question: where do you think we should touch the file? In a > consumer of top-level decls? > An ASTConsumer could be a reasonable place. That should also let us watch for template instantiations performed at end of TU, which are sometimes a sizeable portion of the parsing time. > Modified: >>>> cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td >>>> cfe/trunk/lib/Frontend/CompilerInstance.cpp >>>> >>>> Modified: cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td?rev=228604&r1=228603&r2=228604&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td (original) >>>> +++ cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td Mon Feb 9 >>>> 14:35:13 2015 >>>> @@ -83,6 +83,8 @@ def err_module_not_found : Error<"module >>>> def err_module_not_built : Error<"could not build module '%0'">, >>>> DefaultFatal; >>>> def err_module_lock_failure : Error< >>>> "could not acquire lock file for module '%0'">, DefaultFatal; >>>> +def err_module_lock_timeout : Error< >>>> + "timed out waiting to acquire lock file for module '%0'">, >>>> DefaultFatal; >>>> def err_module_cycle : Error<"cyclic dependency in module '%0': %1">, >>>> DefaultFatal; >>>> def note_pragma_entered_here : Note<"#pragma entered here">; >>>> >>>> Modified: cfe/trunk/lib/Frontend/CompilerInstance.cpp >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInstance.cpp?rev=228604&r1=228603&r2=228604&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/lib/Frontend/CompilerInstance.cpp (original) >>>> +++ cfe/trunk/lib/Frontend/CompilerInstance.cpp Mon Feb 9 14:35:13 2015 >>>> @@ -1022,9 +1022,19 @@ static bool compileAndLoadModule(Compile >>>> case llvm::LockFileManager::LFS_Shared: >>>> // Someone else is responsible for building the module. Wait for >>>> them to >>>> // finish. >>>> - if (Locked.waitForUnlock() == >>>> llvm::LockFileManager::Res_OwnerDied) >>>> + switch (Locked.waitForUnlock()) { >>>> + case llvm::LockFileManager::Res_Success: >>>> + ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate; >>>> + break; >>>> + case llvm::LockFileManager::Res_OwnerDied: >>>> continue; // try again to get the lock. >>>> - ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate; >>>> + case llvm::LockFileManager::Res_Timeout: >>>> + Diags.Report(ModuleNameLoc, diag::err_module_lock_timeout) >>>> + << Module->Name; >>>> + // Clear the lock file so that future invokations can make >>>> progress. >>>> + Locked.unsafeRemoveLockFile(); >>>> + return false; >>>> + } >>>> break; >>>> } >>>> >>>> >>>> >>>> _______________________________________________ >>>> cfe-commits mailing list >>>> [email protected] >>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>> >>> >> > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
