Hi Dave,

Thank you for reviewing the document.  Please see my responses inline.

On 06/09/10 09:09, Dave Miner wrote:
Karen,

A few comments that are mostly pretty high-level since others have commented on the more detailed issues I spotted already.

I think there's an over-emphasis on singletons in this design. We've discussed DOC already and I think resolved that it won't be, but I would prefer not to see singletons other than the InstallEngine. It is a natural singleton, whereas the others seem to be a case of this design dictating behavior where it isn't actually necessary. For example, if some checkpoint wants/needs to do some alternate logging for some reason, or if the application has a need for multiple logs to satisfy some other integration requirement, there's no reason not to allow use of a different logging instance.
As discussed else where, I will remove any mention of the DOC being a singleton, and provide a public interface
in the engine to access the DOC.

As for logging, the engine calls logging.setLoggerClass(InstallLogger) to force all logger instances to use InstallLogger. If they want to use a different logger class, all logger instance will get affected. That's a restriction with Python logging.

I think I am perhaps not using the right terminology in the spec. The engine will set the InstallLogger class, then, initialize an instance of the Python logger. All other install components should reference this instance that's initialized by the engine. If they want to initialize another instance, they can do that. That instance
will still be using the InstallLogger class, but that instance
won't inherit any of the properties that's set for the instance initialized by the engine.

So, I think I will remove any mention about logger being a singleton, and replace it with something
similar to what I said above.

A further comment on the logging: why not allow the application to specify an alternate logger instance vs. the one that you would instantiate automatically? That seems more flexible that merely allowing a log level. Beyond that, I don't quite understand the reason that each checkpoint gets a "sub-logger"; what benefit does this provide?
Python has this hierarchical functionality for logging. Each checkpoint will have it's own logger instance, but the checkpoint's logger instance will inherit all the properties of the "parent" logger instance. For example, if the logger instance created by the engine is called "InstallationLogger", and there's a checkpoint called "TargetDiscovery", the logger instance with the name "InstallationLogger.TargetDiscovery" will be a separate logger instance, but it will inherit all the properties of the "InstallationLogger" instance.

Each checkpoint needs a separate logger instance so we can support setting separate
logging level at the checkpoint level.


Most, if not all, applications that use the engine will be privileged apps and as such should not be using /tmp for storage of any data. /var/run, please, perhaps falling back to /tmp if you find that you don't have write access there; and use TMPDIR environment variable if you need to provide flexibility to developers or support personnel.
Sounds good. Will adopt this in the spec, and change all the appropriate places.

In section 7.4, there's reference to an "installation target", which seems to make the engine dependent on a specific checkpoint (Target Instantiation) and elements of its schema, but you don't really say that here or list it as an imported interface. This seems to be an exception (at least in part) to the principle of the engine treating all checkpoints equally. Wouldn't it make more sense to define a method on InstallEngine that the application or a checkpoint could call to set the ZFS dataset that the engine should be using for its storage requirements?
Defining a method in the InstallEngine object sounds like a great idea.


I'm disappointed that the methodology for determining checkpoint progress weighting is still TBD. I'd thought this was one of the things we were trying to sort out in prototyping, but perhaps I assumed too much. When can we expect this to be specified?

The prototype confirm that we can have each of the checkpoints report it's own weight, and the engine used these weight to normalize the process reported by the checkpoints. We also shown in the prototype that we
can use the logger for progress reporting.

I don't have plans to work on the methodology in detail in the short term. It would involve specifying exactly which machine with what configuration should be used as the standard, and also provide the mapping between a performance number generated from that machine to the weight. In order to do this accurately, I think it would involve more research and experimentation to determine what would work for most cases. If we have the code in the engine to accept and interpret weights provided by checkpoints, when we
eventually have the methodology in place, we can just change the value
returned by the get_performance_estimate() function in the checkpoints, which should have a very minimal impact. At the mean time, we can have the checkpoints return
the "guess" weight like we do now.


In section 11, we seem to be eliminating the ability that DC currently has to continue in spite of errors in a particular checkpoint. Has this been discussed with the DC team?
I forgot about the ability to continue in spite of errors. That functionality should be included in the engine. I will provide an interface in the InstallEngine class for the application to indicate that they want to continue despite errors. By default, we will not continue if there's an error.


Finally, a moderately out-there question that is admittedly not part of the existing requirements list: what if we wanted to make checkpoints executable in parallel at some point in the future? Would we look at a tree model, rather than linear list, for checkpoint specification, or something else? Is there anything else in this design that would hinder that possibility (or things we could easily modify now to help allow it more easily)? An existing case where I might want to do this right now is to generate USB images at the same time as an ISO in DC, rather than generating a USB from an ISO exclusively (the current behavior). It's also the case that many of the existing ICT's could probably be run in parallel since they generally wouldn't conflict.
In order to support this, we would probably need to change how checkpoints are registered. Probably adding more arguments to the register_checkpoint() function to specify which checkpoints can be executed together.

In the InstallEngine object, we currently store checkpoints in the order that they are to be executed as a list. Since Python allows one to store a list within a list, for the case of executing checkpoints in parallel, we can store all the checkpoints that are meant to be run in parallel in a sub-list inside the "execution list". At execution time, we go down the "execution list" and run one single checkpoint
or a group of checkpoints.

For example, if our execution list has the following

A, (B, C, D), E, (F, G)

First, Checkpoint A will be executed by itself
Then, Checkpoint B, C, D will execute at the same time
Then, Checkpoint E will execute
Finally, Checkpoint F and G will be executed at the same time.

So, I don't think the current design would hinder that possibility.

Thank you again for taking the time to review the install design doc.

--Karen

_______________________________________________
caiman-discuss mailing list
caiman-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to