Last night's build failure resulted in the following analysis:

--On Monday, February 13, 2006 6:00 AM -1000 Hackystat Administrator <[EMAIL PROTECTED]> wrote:

Integration build for Hackystat-7 project on 13-Feb-2006 FAILED.
  Module: hackyApp_Cgqm
  Failure Types: Compilation
  Plausible Culprit: Unknown
  How culprit is identified: Build failure reason is unclear, and no one has 
made any
commit. Perhaps hackystat sensor data is incomplete, or it is caused by 
external error
on the integration build box.   Failure Messages:
    *
C:\HackystatSource\hackyApp_Cgqm\src\org\hackystat\app\cgqm\testbase\ABaseRemoteProject
TestClass.java::63::cannot find symbol

The compilation failures revolve around the ProjectManager class, which Hongbing committed recently, so it looks like this is something Hongbing should investigate.

But there's a more interesting hypothesis that occurs to me.

I'm wondering if the Build Analysis mechanism has become "smart enough" in the following sense: build failures that the mechanism can identify a culprit for are basically those modifications for which our standard development process should prevent from failing the integration build. Conversely, build failures that the mechanism cannot identify a culprit for are those that we generally allow as "acceptable" integration build failures.

This goes back to the following basic premise of our process: the only way to guarantee that developer-induced integration build failures are prevented would be to force everyone to do a 'freshStart all.junit' on _all_ modules prior to _every_ commit. We've said that this is too heavyweight: the cost in productivity for this kind of process (even if we could get everyone to do it) is higher than the cost in productivity for occasional integration build failures.

On the other hand, if the nightly build fails constantly, then that slows things down too much because people can't trust the repository to contain a working version.

So, our goal has been to find a "happy medium"---a level of process in which people make "reasonable" efforts to test their changes before committing which reduces the integration build failures to a level in which those failures that remain are "justifiable", because the cost of local testing to prevent these last remaining failures is too high.

So, what I'm starting to wonder is whether our build analysis mechanism has actually become a valid measure for "reasonable": in other words, if it can identify the culprit, then the culprit should have prevented the failure, but if it can't identify the culprit, then the failure is caused by a sufficiently indirect sequence of events that we can view the daily build mechanism as being the most efficient way to uncover it.

To test this, we simply need to start evaluating the daily build analysis from the culprit/no culprit perspective. For example, I would claim that today's failure is "reasonable", in that Hongbing would have had to do a full test of the entire system to catch it.

Your thoughts?

Cheers,
Philip

Reply via email to