Hi Dave, Thanks for the review. Comments inline...
> Comments on the Orchestrator design, v 0.3: > > A general comment is that I think the memory management and error > return/messaging scheme needs some work to feel clean. A number of > specific API comments below reflect this, but it really is a global > issue in the design. Yes, agreed. We aren't completely set on the design of these two things, and felt like we needed to note some decisions. We will work on this more and update the design. > > page 6, section 3.0 item 12: the text notes that "[t]his is a temporary > piece of functionality...". It would be useful to note where this > functionality will ultimately reside. Overall, within this list I think > there are a number of other items which have the same property > (keyboard, locales, user accounts) and thus it might make more sense to > have a paragraph here which discusses this issue more generally. Ok, that's a good suggestion. We will add this data. > page 7, section 4.1: Perhaps I missed something, but I don't recall > support of upgrading instances installed on SVM as being on the planned > feature list, so what are we doing with this information? Also, we > should be using "GPT" here instead of "EFI", right? No, they aren't being supported. We are only keeping track of this data to let the users know why we can't upgrade their instance of Solaris. We will clarify this. It should be GPT. > page 9: structure members meant to represent sizes should be uint64_t to > help future-proof things. Also, assuming the partitionType is the > indentifier code (e.g. 0xbf for Solaris2), wouldn't it more properly be > a uint8_t? > I agree. Will change. > page 11, 4.1.2, om_initiateTargetDiscovery: the description says that > the "progress of the callback will be returned..." when it should say > "progress of the discovery will be returned...". Also, the text here > says, "If target discovery cannot be started the installation > application must exit." I think it's fundamentally incorrect for a > component to be dictating that its caller must react in any particular > way to a failure, certainly not in a fashion so drastic as "must exit". > Whenever possible, the installation application and its supporting > functions should be written to allow for the user to attempt to > re-initiate a process, not exit and tell them "type suninstall to try > again" or somesuch. Finally, how does the caller obtain details on the > reason for a failure? Ok, will modify text to be correct. As for dictating the consumer as to the behavior that must be taken in reaction to a particular error, I agree. This statement wasn't really intended to be this. It was my way of letting consumers know that a failure in this call for installation purposes is a fatal failure and that continuing with the installation process that started this needs to be stopped. But, I agree about having the install be re-initiated and that is our intent. We are not changing that behavior with Dwarf. I will make this clearer and remove the text that this should be a fatal exit. As for obtaining details for the reasons for a failure, we will log these of course as we do today. The caller will have to inform the user to look in the log, or we have to find a way to provide this data via the GUI. The callback mechanism we are still defining will have an error string/message parameter that we will fill in with more detailed data about why the failure happened. At least that is what we are currently planning. > om_getDiskInfo: what happens if it's called while discovery is still in > progress? And see comment above about dictating caller's behavior. Well.. I suppose the caller could make this call even if the discovery wasn't completed above. I would assume that if this happened and we are still doing target discovery we would fail in this call. Which implies we need additional error handling. I don't think we should allow this call to succeed if discovery is still going on. The expected ordering of calls is that the consumer will call om_initiateTargetDiscovery(), wait for their callback to be called with a signal we are done with this, and then call the other functions. > om_getDiskPartitionInfo: why not just take a DiskInfo as the first > argument? And what is meant by "A failure will be fatal in this case"? We could take DiskInfo as the first argument. And, this might be a better way to do this since we return the DiskInfo data from the previous call. The fatal failure statement isn't quite correct and I will change this. If for some reason we cannot read the partition table on an x86 disk, this is a fatal failure in the sense that the user cannot just use the disk selected and proceed on with the installation. We need at least to be able to read the partition data to use a disk for installation. But, the wording is wrong in the document. I will clarify. > om_validateDiskPartitionUpdate: if we're going to use this as a boolean > predicate, I'd suggest a name which reads that way, such as > om_isPartitionInfoValid or something of that nature. Also, > > om_getSolarisInstances: there's no way to free the space this allocates > other than to commit the installation? What happens if the user goes > back to the beginning of the installer, do we just leak that memory? No, this is incorrect in the document. This data would be freed by a call to om_releaseTargetData(). But, as you note we have work to do on the memory handling. So, expect an update on this. > > om_validateUpgradeTarget: why shouldn't this be a boolean predicate a la > the partition validation, in which case I'd suggest > om_isTargetUpgradeable or similar? This makes sense. Will change. Also, I don't really understand the > comment here about two-step validation - does this mean that > om_getSolarisInstances is only returning some concept of a "valid" > Solaris instance (which isn't defined there)? Yes, that's what this means. We can certainly define what is a valid instance and disk target in the document. Why are we doing this? Mostly for performance but also for usability. For performance we need to provide the list of of upgrade targets to the GUI for selection as quickly as possible. Much like we provide the disk data to the GUI initially for selection. After the user selects the disk/upgrade target we do additional validation on that target as well, reading partition data, etc.. So, for example, we find 6 instances of Solaris on a system and we find that all of them are within the range for upgrade, figure out if the configuration is valid(e.g. non-global zones are all ok)... The user selects that instance and we then calculate the size needed to do the upgrade. It is possible that a valid instance of Solaris can't be upgraded because the user doesn't have enough space left on the slice. Not showing this as a valid upgradeable instance seems wrong to me because it is really upgradeable but there is a space issue, which the user could manually fix. > > om_releaseTargetData: this seems to have an excessively drastic effect; > shouldn't we provide some slightly less drastic ways of freeing up > results that we don't need? I think we need to consider this for sure. In thinking about this more it seems that the consumer of this interface should have more control for freeing and regathering data they want/need. > > 4.2.1 progress reporting: what about a progress reporting interface that > can work in both a Gnome-friendly way as well as not requiring Gnome? > For the current iteration a Gnome dependency would seem fine, since > that's the only immediate consumer, but something to think about. We aren't set on using the Gnome toolkit functionality. It is just one possibility. We are looking at other ways to do this as well, and as you note perhaps a more generic way would be better. This is still TBD. > > 4.3 the list here has two #1 entries. Also, can you clarify what "User > data" here means? Will fix the numbered entires. User data == passwords, user account info. But, as I look at this I think this is an artifact of the original thought I had that the user accounts would be set with a specific call that the orchestrator provided, something like om_setUserAccount(). But, later I decided this specific call wasn't necessary nor desired and that we would do this as part of the installation process when invoked. I am going to remove the User Data piece of this section. It no longer makes sense since we have decided to do this as part of the perform_Install() processing. > 4.3.1 So where's the keyboard structure? No keyboard structure. I originally thought I would use one but it isn't required so I will remove that text. > 4.3.2 om_getInstallLanguages: there's no existing API that provides the > locale list? No, not that I could find. sysidtool does this now and there are no libspmi interfaces that get this data. > om_getKeyboardTypes: again, no existing API? And specifying what the > GUI will do here in reaction to a particular return is wrong; that > belongs in the GUI spec, not here. No api. Will change text RE: GUI reaction. No existing api, sysidtool handles this now. > > om_getAvailableLocales: See earlier comments about specifying > application behavior, but why would a null list of locales be fatal? "C" > always works, and there seems no harm in proceeding in that case so long > as the user's happy. > Well... we assumed if there were no locales on the media the media was likely corrupt. So, that is why it is listed as fatal. > om_getMinSize: The name here seems much too imprecise to describe what > it's doing. Also, it feels quite incorrect to return a random value > like 10 GB when we can't calculate something; I'd think we'd like an API > which can take a described set of software and a disk layout and > validate the two against each other. For this project the two are > basically fixed, but at least thinking in that direction here would be > nice. Ok, I will work on this interface. > om_free: this seems impossibly imprecise; how is it going to know > whether it's supposed to free a list of keyboards, a list of locales, or > something else entirely? That's a fair statement. We need to work on memory management. > 4.4 om_performInstall: here's the first mention of nvpairs, but how is > the caller supposed to create them without knowing the names to use for > the values? Are there any requirements on ordering of the pairs or > anything? We haven't gotten that far in the design. We need to specify this. nvlist was chosen because it felt like the most extensible design for the future for this function. Thanks again for the review. sarah **** > Dave
