On Wed, 2010-06-30 at 10:16 -0700, Ian Main wrote: > This patch is a first try at using condor as a job management system. > This removes the usage of the 'taskomatic' utilities and replaces them > with 'condormatic' calls that use the command line interfaces (no qmf or > gsoap etc) to condor. > > On startup of the server (and any changes after running), a pile of > 'classads' are created which define each possible startup location for > a given set of image/hardware profiles that exist and are useable as > well as the backend info condor needs to start an instance on the given > provider. > > For each instance that you start, a job will be created in condor. > Condor will then match the hardware profile and image to a provider and > can then start an instance on that provider. When you stop or destroy > that instance, the job will be removed (which isn't really how we want > it to go but..). > > This patch requires that you have our custom hacked up condor installed. > You can get this at: > > http://people.redhat.com/clalance/condor-dcloud > > Be sure to read the README. Chris has written up very good instructions > on how to set up condor. > > REVISED: This patch plus the new condor fixes a number of the bugs that > were in the previous patch. This patch adds realm matching support and > fixes the start/stop issues we were seeing. So most things basically > work now and I think it's generally useable. Probably the biggest > outstanding bug for useability is that we do not keep long-running jobs > for stateful instances. > > The outstanding bugs are now limited to: > > - To 'stop' a job in condor we should be using 'hold' instead of > removing the job. This is creating a few different problems. > - We are still reaching directly to the DeltaCloud API to get a list of > available actions for each instance. Maybe this is fine, I'm not > sure. > - Quotas are not yet implemented. > - Classads are sync'd to condor on startup and on any changes to the > hardware profile and image records. However, if you restart condor > you won't have any classads in it to match against and your jobs will > fail. > - We're still using 'on-demand' syncing of states from condor to the > aggregator. eg when you list the instances it updates the states of > each instance from condor at that time. There is no event logging. > - There's no 'reboot' as yet in condor. Not sure how we'll deal with > that just yet. > - We've kept the tasks model and usage but they are quazi-meaningless. > The task table needs to turn into an event/audit log table. >
ACK, this works for me. Couple minor notes we talked about, just want to make sure they dont get forgotten. It would be great to update the directions chris has with the 'yum local' bit, and add at least a comment that for dev, 'ALLOW_WRITE = *' is what you want. Lastly, and not directly related to the patch - this will be confusing for people checking out next unless we get some docs up on the site, especially since 'contribute' directions lean toward using 'next'. So we need some docs, or at least to do that adapter idea I have been pushing, with the default leaving it to taskomatic, and a comment saying how to enable condor. -j _______________________________________________ deltacloud-devel mailing list [email protected] https://fedorahosted.org/mailman/listinfo/deltacloud-devel
