[HACKYSTAT-DEV-L] Build Analysis Failures and Hypotheses

Philip Johnson Mon, 13 Feb 2006 11:05:15 -0800

Last night's build failure resulted in the following analysis:

--On Monday, February 13, 2006 6:00 AM -1000 Hackystat Administrator <[EMAIL PROTECTED]>wrote:

Integration build for Hackystat-7 project on 13-Feb-2006 FAILED.
  Module: hackyApp_Cgqm
  Failure Types: Compilation
  Plausible Culprit: Unknown
  How culprit is identified: Build failure reason is unclear, and no one has 
made any
commit. Perhaps hackystat sensor data is incomplete, or it is caused by 
external error
on the integration build box.   Failure Messages:
    *
C:\HackystatSource\hackyApp_Cgqm\src\org\hackystat\app\cgqm\testbase\ABaseRemoteProject
TestClass.java::63::cannot find symbol

The compilation failures revolve around the ProjectManager class, which Hongbingcommitted recently, so it looks like this is something Hongbing should investigate.


But there's a more interesting hypothesis that occurs to me.

I'm wondering if the Build Analysis mechanism has become "smart enough" in the followingsense: build failures that the mechanism can identify a culprit for are basically thosemodifications for which our standard development process should prevent from failing theintegration build. Conversely, build failures that the mechanism cannot identify aculprit for are those that we generally allow as "acceptable" integration build failures.

This goes back to the following basic premise of our process: the only way to guaranteethat developer-induced integration build failures are prevented would be to forceeveryone to do a 'freshStart all.junit' on _all_ modules prior to _every_ commit. We'vesaid that this is too heavyweight: the cost in productivity for this kind of process(even if we could get everyone to do it) is higher than the cost in productivity foroccasional integration build failures.

On the other hand, if the nightly build fails constantly, then that slows things down toomuch because people can't trust the repository to contain a working version.

So, our goal has been to find a "happy medium"---a level of process in which people make"reasonable" efforts to test their changes before committing which reduces theintegration build failures to a level in which those failures that remain are"justifiable", because the cost of local testing to prevent these last remaining failuresis too high.

So, what I'm starting to wonder is whether our build analysis mechanism has actuallybecome a valid measure for "reasonable": in other words, if it can identify the culprit,then the culprit should have prevented the failure, but if it can't identify the culprit,then the failure is caused by a sufficiently indirect sequence of events that we can viewthe daily build mechanism as being the most efficient way to uncover it.

To test this, we simply need to start evaluating the daily build analysis from theculprit/no culprit perspective. For example, I would claim that today's failure is"reasonable", in that Hongbing would have had to do a full test of the entire system tocatch it.


Your thoughts?

Cheers,
Philip

[HACKYSTAT-DEV-L] Build Analysis Failures and Hypotheses

Reply via email to