Dave Miner wrote: > Jean McCormack wrote: >> In the DC meeting yesterday we discussed the future user experience >> for the Distro Constructor. The first thing I'm >> looking at is the ability to restart the DC build at different >> checkpoints or steps in the process. >> >> There were 3 ways of specifying the restart that were considered >> 1) The user would edit the manifest file to specify they wanted to >> start the build at a certain point >> 2) a command line option >> 3) Making the command have an interactive option >> >> After consulting with Frank Ludolph #2 (command line option) was >> decided upon. >> His suggestion was this: >> dist_const -resume [step] >> >> dist_const -resume would resume the build from the failed step in the >> previous build >> dist_const -resume step would resume the build from the step specified. >> > > Generally this seems like the right sort of idea. A couple of things > to think about in designing it: > > - This doesn't seem much different from different targets in a > makefile. Perhaps think about how to leverage make for this. Yeah. The nice thing is it would work universally. I'll need to put a lot more thought into this though. > > - The restarting I'd implemented in the live media kit used ZFS > snapshots for recording state. Think about how that might be > leveraged here. This actually looks very much like what we want to do. A couple of questions:
1) This obviously only works if the user specifies a zfs dataset for their proto area. We aren't making ZFS a requirement for DC are we? 2) Do zfs snapshots take up a lot of space? > > >> Some technical thoughts behind this new option: >> >> - In order to keep the build from having issues because the user >> changes the manifest between the two >> runs, we would not have them specify a new manifest file. >> - The build does need to have the manifest information somehow, so my >> thought was that during a build >> we would copy the current manifest file to .step<step number>. As >> the step completes successfully this >> file would be deleted. It would then serve as a marker for the >> -resume case as to where to restart and >> would contain all the information for the restarted build. >> - dist_const -resume step would check that the step specified is <= >> the failed step. Restarting at step+n is not >> allowed >> - We could do some checking to make sure that the user hasn't >> modified .step<number> which has the potential >> to cause havoc in the build. Depending upon where you were in the >> build process, some modifications would be OK, others not. >> I'm not sure the extra complication is worth it. How do others feel >> about this? > > My use of ZFS snapshots in live media let me do fairly arbitrary > things by hand when I wanted to experiment with modifications to parts > of the image before bothering to commit them to code (I'd just rename > snapshots and so on to get to the state I wanted). I think that > whatever we do should allow for that sort of developer behavior. So are you suggesting that we let the user specify a manifest that might have been modified? Make it their responsibility to make sure they haven't changed anything critical? By allowing that we'd also give them flexibility to experiment more. I think that would be nice too. Jean > >> - The messaging coming from the DC would be worded such that the user >> would know what step failed in the process. >> That's the next step in this work. >> - the .step<number> files would be cleaned up at the start of every >> build and the end of every successful build. >> - dist_const -resume doesn't make sense after a complete successful >> build but dist_const -resume step does. If the user >> has a build that completes successfully but doesn't work, they >> could rerun the build from any step they think is appropriate. >> > > Right. > >> Any comments? >> > > Good start. Thanks for moving on this. > > Dave >
