[caiman-discuss] Caiman architecture document open for review

Joseph J. VLcek Tue, 12 Jan 2010 19:14:54 -0500

Thank you for the reply and explanations Sarah!


cc-ing Glenn

Glenn; Please just search this message for your name... A section 
regarding how CUD will make progress available to VMC.

My comments are in-line below.


Joe


Sarah Jelinek wrote:
> Hi Joe,
>
> Thank you for reviewing this! Comments inline..
>
>
>>> Hi Caimaniacs,
>>>
>>> I know you have been waiting for this bestseller to hit the shelves! 
>>> :-).
>>>
>>> The Caiman team has been working on an updated architecture and we 
>>> have the architecture document ready for review. The 
>>> opensolaris-caiman project architecture page is located 
>>> here(formerly known as Caiman Unified Design):
>>>
>>> http://hub.opensolaris.org/bin/view/Project+caiman/CUD
>>>
>>> The Caiman architecture document open for review is located here:
>>>
>>> http://hub.opensolaris.org/bin/download/Project+caiman/CUD/caimanarchitecture.pdf
>>>  
>>>
>>>
>>> Please send comments/questions to caiman-discuss by 12/18/09. If you 
>>> need more time please let us know, with the holidays coming up we 
>>> may have to extend the review period.
>>>
>>> Thank you for your time.
>>>
>>> Regards,
>>> sarah
>>> _______________________________________________
>>> caiman-discuss mailing list
>>> caiman-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>>
>>
>> This is great Sarah! I learned a lot reading it I find the UML 
>> Sequence Diagrams especially helpful!
>>
>> My comments below. Many of which are simply questions.
>>
>> Joe
>>
>> - - -
>>
>>
>>
>> Question/Issue:
>> ---------------
>>
>> 1.5. Booting of the installation environment from a network server.
>>
>> What does this mean? How is it different from 1.3?
>
> 1.5 talks about only the 'booting' part, that is booting the 
> microroot(installation environment). 1.3 could refer to booting from 
> local media such as usb, but installing using a network IPS repo.
>
>
>
>
>>
>> Suggestion:
>> -----------
>>
>> Perhaps it would be valuable to list the application names with
>> the bulleted list on page 2->3
>
> Ok, fair point. I can add this.
>>
>> Question/Issue:
>> ---------------
>>
>> "An interactive installation application with a text user interface,
>>  targeted primarily at server systems"
>>
>> I thought we wanted to use IPS for text installer because canned
>> bits which could be CPIOed off the media didn't address server
>> system user requirements.
>>
>> Do we need to communicate TI will be CPIO based?
>>
>
> It will eventually be IPS based as well, at least that's my 
> understanding. I believe the reason cpio was chosen initially is it is 
> somewhat easier in terms of our transfer module, etc.. We should let 
> users know if it will only ever be cpio based.
>
> They are working on the packages to include in the text installer 
> media to ensure they are more server based. It isn't the same bits we 
> are building the liveCD with.

I hadn't known that. Thanks.

>
>>
>> Question/Issue:
>> ---------------
>>
>> "An automated installation application that installs a system without 
>> user
>>  intervention using a pre-configured or computed configuration"
>>
>> "A tool to generate a replication image of an installed system"
>>
>> What do these referring to, VMC or R&R ?
>
> 1st one refers to AI. Second one is Replication(the first R part of 
> the R & R).
>
>
>> Question:
>> ---------
>>
>> p.3
>>
>> Core ENgine Classes
>>
>> Do we need a IPC Core Engine Class? Or will the data store act as an
>> IPC? I don't currently know of a need for a CoreEngine IPC but I can
>> imagine it might turn out to be valuable.
>
> I assume by IPC you mean Inter Process Communication? In the case of 
> Core Engine classes the Data Collection class is intended to provide 
> the communication mechanism for getting and setting data between 
> Application classes. Is there another reason or set of reasons you 
> think we should have a specific IPC class called out?

Yes  - IPC : Inter Process Communication

I was envisioning the possibility of  multiple concurrent components. 
Where one could use an IPC to indicate to the other(s) to query the Data 
Collection class for updated information.

>
>
>
>> Question/Issue:
>> ---------------
>>
>> 2.1 Execution
>>
>> Would it make sense to add "Registration / Configure to the bullet
>> list top of p. 4?
>
> Yes, it would. I will add.
>
>
>
>>
>> Question/Issue:
>> ---------------
>>
>> p.4
>>
>> - Report progress to the installation application during execution
>>   of the installation sequence
>> - Manage logging of the installation sequence
>>
>> Should these be under secton 2.3 instead of 2.1
>
> No, I don't think so. The engine is the 'keeper' of the management of 
> these things, and while there is a logger and error handler, all 
> consumers must go through the engine to get access to these.


OK. I can understand that perspective too.


>
>
>
>> Question/Issue:
>> ---------------
>>
>> p.4
>>
>> - add_logger(): configures a logger to be used during execution
>
>
>
>> Should we add "change log levels"
>
> Maybe. Is it your expectation that we should be able to change log 
> levels during the execution of an installation? In general, I would 
> think we would set the log level at the call to add_logger().
>
>
>> I could see this being valuable for diaging failures.
>>
>> e.g.
>> CheckPoint failed.
>
>
>
>> We could change log level and restart at CP 5
>
> Oh... I see. maybe, this is an interesting idea. Let me think on it a 
> bit.
>
>
>> Question/Issue:
>> ---------------
>>
>> - add_logger(): configures a logger to be used during execution
>>
>> Should this be under secton 2.3 instead of 2.1
>
> answered above.
>
>>
>> Suggestion:
>> ---------------
>>
>> p.5
>>
>> 2.2 Data Collection
>>
>> I found this confusing:
>>
>> -  Provide stable, extensible management, storage,
>>
>> I had to read that sentance a couple of times before the meaning was
>> clear. Maybe it's just me. ;)
>>
>> I thing removing "extensible management," might make it read better.
>> -  Provide stable, extensible management, storage,
>>
>>
>
> Is is a confusing sentence now that I look at it again :-). I will 
> reword.
>
>> Question/Issue:
>> ---------------
>>
>> p.5
>>
>> - Provide the ability to translate global installation data into a
>>   valid Automated Installer (AI) Manifest.
>>
>> What does this mean? From where ? Does it mean from an already
>> installed system for replication?
>
> this means that from any installation process we should be able to 
> dump the data collection cache and get an AI XML manifest that can be 
> used to 'replicate' that installation.
>
>
>> Question/Issue:
>> ---------------
>>
>> p.5
>>
>> All Common Application objects, as well as New Solaris Installer
>> applications will have read-write
>>
>>
>> I think it may be valuable to provide a means to limit write access...
>> Just thinking it could come in handy for future applications.
>
> You know.. I had originally proposed this as read only in some cases. 
> Then, I got talked out of that. I agree that maybe limiting write 
> access in some cases might be useful, although I haven't had the time 
> to identify those cases yet. I will make a note of this though and 
> revisit later.
>
>
>
>> Question/Issue:
>> ---------------
>>
>> p. 6
>>
>> - to_xml()
>>
>> Through out the document XML is mentioned. Isn't that an implementation
>> detail that could change? Pehaps XML may be replaced by something else
>> in the future.
>
> Actually, I think this will be removed. Originally, the Application 
> classes, specifically the DeviceData class was going to have a 
> to_xml() method to translate the object itself into xml. But, the way 
> we are going with the data collection it isn't likely that the 
> internal format of the object will be different than the external 
> format, or if it is, it will be hidden from consumers. I will leave 
> this for now, but as I said, it isn't likely to stay. In part because 
> of the extensibility issue you raise above.
>
>
>> Would
>>
>> - to_store() or something be more generic and valuable?
>
> Yes, that's a better name. I am not sure we need a method that does 
> anything like this actually, but we are not far enough in the design 
> of the data classes to know for sure.
>
>> Also would a from_xml() provide pulling from xml to a variable?
>
>> Question/Issue:
>> ---------------
>>
>> p.6
>>
>> 2.3 Logging and Error Reporting
>>
>> Should this be: "Logging, Error Reporting and Progress reporting" ?
>
> No, I don't think so. The engine itself manages the progress reporting 
> and each checkpoint is responsible for its own implementation of 
> progress reporting. There is no 'Progress Reporting' class.
>
>
>> Question/Issue:
>> ---------------
>>
>> p.6
>>
>> 2.3 Logging and Error Reporting
>>
>> open(), log_message(), close()
>>
>> I think better names might make the code more readable by indicating
>> these are logger APIs.
>
>
>> e.g.
>>
>> log_open(), log_message(), log_close()
>>
>
> Yes, I agree. I will change.
>
>> Question/Issue:
>> ---------------
>>
>> How will logging be made available to external components? For example
>> VMC could benefit from logging being made available from the bootable
>> AI running in a Virtual Box Client to VMC running on the VBox host.
>
> So, let's chat a bit about how this might work. VMC is running, and at 
> the point in time it has booted the VM, VMC is still running as a 
> process. AI is running inside the VM, but VMC is still the overall 
> process running. VMC will have a log for sure, as it will utilize the 
> Execution Engine as part of its processing, and AI will have a logger 
> as well. Why would VMC want to use the logging from the AI process 
> running inside the VB client if it has access to its own logging?

Well initially we hadn't a good way for the VMC user to monitor progress 
of the AI install. Glenn came up with a great way to do that so this by 
documenting how the user can log in to the VM and manually monitor the 
AI progress. So this may be less necessary. However having some 
mechanism to communicate to the VMC user what is happening with the AI 
install would be good.  Especially if the user doesn't want to log in to 
the VM.

We should chat and include Glenn.

>
>> Suggestion:
>> -----------
>>
>> I think a checkpoint should implement it's own clean up. Each checkpoint
>> should know what and how to cleanup after itself.
>>
>> I think the Execution Classes method: cleanup_checkpoints() should 
>> invoke
>> the checkpoints cleanup entry point.
>>
>> Maybe the Execution Class could contain a register_cleanup_checkpoint()
>> method.
>
> I do think each checkpoint should cleanup after itself, and that is 
> the expectation. The cleanup_checkpoints() method is something called 
> by the  InstallerApp on the ExecutionEngine object. I am not sure in 
> which scenarios this has to be called, I suppose it could be called in 
> the event of an unexpected checkpoint failure or something like that. 
> In general, the checkpoint should automatically clean up after itself.
>
> However, it might be useful to have the checkpoints have a method they 
> export that the engine calls for cleanup() even if the checkpoint 
> failed unexpectedly, the engine could still call this method on the 
> checkpoint and hope it works.
>

Sounds good.


>
>>
>> Question/Issue:
>> ---------------
>>
>> Should this be changed from:
>>
>> To enable extensibility and ease of translation of data to a common
>> data storage format, the "data" classes defined in Section 3 must
>> provide the following method:
>>
>> To:
>>
>> To enable extensibility and ease of translation of data to a common
>> data storage format, the "application" classes defined in Section 3
>> must provide the following method:
>
> As I noted above, this method should be going away, but I agree that 
> this should say application classes. If kept, I will modify this 
> sentence.
>
>
>
>> Question/Issue:
>> ---------------
>>
>> p.7
>>
>> 3.1 Manifest Parser
>>
>> parse()
>>
>> would an import() or store() also be neede are is this functionality
>> part of what parse() will do? Is there any reason it might  be valuable
>> to make this 2 steps: validate and import?
>>
>
> ManifestRetrieval, which isn't outlined in detail in this document(it 
> is a Application Specific Class), although.. I am not sure it should 
> be.. is the object that would import() the manifest.
>
> As for parse and validate and parse and validate and store, you are 
> the 2nd person who has mentioned that maybe we need multiple methods 
> on this  class so I think I should take a look. It would make it a 
> clearer interface to separate these out.
>
>
>> Question/Issue:
>> ---------------
>>
>> p.7
>>
>> 3.1 Manifest Parser
>>
>> The requirement defined for this component is:
>> - Must provide syntactic validation of user-provided manifests
>>
>> Should the requirement also include the following?
>>
>> - Storing the manifest data in storage cache utilizing the Data
>> Collection Class.
>
> No, I don't think that that functionality, while necessary, is a 
> function of the ManifestParser itself. It has to store data, but so do 
> a lot of the other Application Classes. I don't consider the storage 
> of the data in Data Collection a requirement on those classes either.
>
>
>> Question/Issue:
>> ---------------
>>
>> p.8
>>
>> 3.2 Target Discovery
>>
>> The bullet item list contains both:
>>
>> - Partitions, both FAT and GPT
>> - Slices
>>
>> and again on p.9
>>
>> - Partition
>> - Slice
>>
>> What's the difference between a partition and a slice?
>
> Partition in this document means x86 logical partition, such as fdisk 
> and GPT. Slice is a VTOC slice.
>
>
>> Question:
>> ---------
>>
>> p.8
>>
>> - Virtual Machines hosted by a Solaris xVM Dom0
>>
>> What's this?
>
> This refers to xVM DomU's which are hosted on a Solaris xVM Dom0, 
> which is Domain 0, a control domain. The reason it has to be a Solaris 
> Dom0 is that our tools don't run on say linux, so if the Dom0 isn't 
> Solaris, then we cannot do our discovery of the xVM Dom U's.
>
>
>>
>> Question/Issue:
>> ---------------
>>
>> p.9
>>
>> to_xml()
>>
>> Again I think to_store(), or the like, might be better than "xml"
>> which is an implementation that could change.
>
> Yes, I agree. Am rethinking this. Expect a change in this area.
>
>
>> Question/Issue:
>> ---------------
>>
>> p.9
>>
>> DiskClass
>>
>> Chould a get_dev_path() be valuable? e.g. something would which would
>> return /dev/dsk/xyx be valuable?
>>
>
> Actually, yes, I think we should. When the target instantiation runs 
> we need the dev path to use for creation of the zpool. Good catch! I 
> will add.
>
>
>> Question/Issue:
>> ---------------
>>
>> p.11
>>
>> ZpoolClass
>>
>> Do we need to support more of what zpool(1m) provides?
>> e.g. create, iostat, remove, status
>
> The 'create' is handled by the LogicalTarget class 'instantiate()' 
> method. We have a 'get_status()' method, which is essentially a zpool 
> status call. The iostat I can't see us needing. It isn't likely we 
> would ever 'remove' a device from a pool during install. I can't think 
> of any use cases where that would be required. A 'destroy' might 
> happen though, and an 'import'. I will take a look at this and see if 
> we need to add more methods to support the other types of zpool things 
> we need to do.
>
>
>> Question:
>> ---------
>>
>> p.12
>>
>> LogicalVolume Class
>>
>> -  set_size()
>>
>> Won't we need to get logical volume info/data ?
>
> Well.. what we need to get for this is described in the super class, 
> LogicalTarget. I don't think we need additional methods besides those 
> defined.
>
> For non ZFS Zvol logical volumes, what we care about are its 
> associations, so we can show which devices are in use, its name, its 
> mountpoint and thats about it. The other attributes of an SVM volume 
> say, are not interesting to us. The attributes of a ZFS ZVOL are 
> described by the methods in the LogicalTarget class and the only other 
> method we need is to be able to 'set_size()' for a ZVOL we are creating.
>
>
>> Question/Issue:
>> ---------------
>>
>> p.13
>>
>> Dataset Class
>>
>> Do we need to support more of what zfs(1m) provides?
>
> Possibly. The ones I list are the ones I thought that were required 
> for processing an installation.  I took a quick glance at the methods 
> offered by zfs(1m) and don't see any that look like things we would 
> ever do during an install. But, I could be wrong :-). Do you have 
> specific things in mind?


No nothing specific... Just thinking in terms of providing a fully 
functional engine...
>
>
>> Question/Issue:
>> ---------------
>>
>> p.13
>>
>> Zone Class
>>
>> Do we need to support more of what zoneadm(1m) provides?
>
> No, same argument as for Dataset class. If you have specific things in 
> mind, let me know.
>
>
>> Question/Issue:
>> ---------------
>>
>> p.14
>>
>> 3.4 Transfer
>>
>> -  SVR4 packages
>>
>> Why SVR4?
>
> Because we have SVR4 package types in many customer sites. As a matter 
> of fact, the bulk of the customers we talked to have SVR4 packages for 
> their application stacks. We need to allow for this and enable them to 
> make the transition to S Next more easily.
>
>>
>> Should we account for support of virtual import of VMC output?
>
> How would we use the VMC output in a transfer phase? Can you describe 
> what you are thinking?


No not in a transfer phase but perhaps the core engine could provide a 
mechanism to allow for a new app. which could guide the import of 
existing VMC output. e.g.: Help the user import an OVF to a new VM... 
Convert an OVF to some, yet identified, format then import...

>
>
>> Question:
>> ---------
>>
>> p.15
>>
>> -  data archive
>>
>> What's data archive for?
>
> That's meant to represent the replication or recovery archive. It is 
> essentially a 'data' archive.
>
>
>
>
>> Question/Issue:
>> ---------------
>>
>> p.16
>>
>> Shouldn't the ManifestParse have lines directly down to the components,
>> e.g. Transfer, TD & TI?
>
> You mean in terms of the 'usage' lines? ManifestParser doesn't use 
> these  classes that I can see. It parses and stores the data in Data 
> Collection. Maybe it should have a usage line to DataCollection, as 
> should the Transfer, TargetDiscovery and TargetInstantiation.

Ah ... of course. It parses the data and "store" it in the Data 
Collection... never mind! ;)

>
>> Or is Checkpointing going to providing the data from the manifest?
>
>
> Each checkpoint must store and get data from the data collection. The 
> data from the manifest is read in, parsed and stored in data 
> collection. Each subsequent checkpoint them must read data from data 
> collection as it needs. Is this clear?

Perfect. I get it now. Thanks.

>
>>
>> Question:
>> ---------
>>
>> p.16
>>
>> What is the flow of data collection? Is it only through the
>> ExecutionEngine?
>
>
> No, it can be direct as well. I noticed, as I mentioned above, that 
> the diagram looks as if DataCollection can only be accessed through 
> the ExecutionEngine. That isn't true. I will fix this.
>
>> Comment:
>> --------
>>
>> P.18->23
>>
>> These diagrams are great! Very helpful.
>>
>> +-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=
>>  
>>
>> +-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=
>>  
>>
>> Below are comments I have from an older version of the document:
>> +-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=
>>  
>>
>> +-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=+-=
>>  
>
>
> I am assuming that your comments start with a '-'? That's how I 
> answered them anyway. If not, let me know and I will re-answer.

That's correct... sorry I didn't do a better job formatting these older 
comments...

>
>>
>> 3.2.5 Assumptions
>> -----------------
>>
>> -  We will not have one monolithic library that provides a single 
>> consumer
>>    interface.
>>
>>     - Not having one monolithic library is a good idea
>>
>>        - I think we should strive to have a common API standard accross
>>     Caiman engine components.
>
> Agreed, that is the intend of this architecture.
>
>>
>>
>> - Consumers must maintain the state of their specific application data.
>>
>>     - I think the consumer should maintain state but do so by using
>>     a common engine interface. We should supply a state holding module.
>
> Consumers are responsible for some application specific state, 
> however, as the checkpoints proceed, any state that is required for a 
> checkpoint must be stored prior to running that checkpoint. So, for 
> example, the interactive installers gather user data along the way, 
> and the user may go back and forth and change their minds, in this 
> case the interactive installation app is responsible for managing the 
> state data, until the user presses 'go' for the install. Then the 
> state data is stored in data collection and is no longer in the hands 
> of the application.
>
>
>> - We will utilize the existing DC finalizer engine technology as a 
>> starting
>>    point for the installation driver engine. It will be re-designed to
>>    accommodate a wider consumer audience.
>>
>>     - I think leveraging the Finalizer engine is a good idea, however
>>     fundamental changes would be needed.
>>     e.g.
>>         - Passing run time gathered arguments to finalizer scripts
>>         - Being able to invoke finalizer scripts from different modules.
>
> In the new architecture, I think that an run time gathered arguments, 
> say for example target discovery, will be stored in the data 
> collection, and subsequent checkpoints read the data from data 
> collection. A checkpoint is like a finalizer script, except that data 
> is managed in a central place, not passed, but stored and ready for 
> the next checkpoint to use. This makes the 'data' the 'interface' 
> between checkpoints which helps decouple them and makes adding new 
> checkpoints easier, imo.

Agreed.
>
> What do you mean by being able to invoke finalizer scripts from 
> different modules?

Simply outside how they are invoked today.

>
>
>> - Changing out the liborchestrator code completely will happen over 
>> time.
>>
>>     - I think supporting a phased in approach of the replacement of the
>>     libraries with the new modular components may present excessive
>>     overhead. I think it would be benificial if the new engine framework
>>     were mostly in place. Then once we have the engine foundation we 
>> could
>>     migrate the addoptoin of the new engine. Leaving some on the old 
>> model
>>     as we work the migration schedule.
>
> I have attached a document which outlines the way I see the 
> migration/transition happening, at a high level. Take a look at this 
> and let me know if you see major issues.

Perhaps we should chat about the  trans_integration_plan.txt I would 
need some help understanding it all.



>
>
>
>> 3.3.2 Test Environments
>> -----------------------
>>
>>     - A well defined API to modular components will facilitate
>>     easier testing. I think testing of all of the different installers
>>     in all of the supported environments may not be as necessary if
>>     the supporting common modular components can be effectively
>>     exercised in each test environment.
>>
>> 4. Functional Description
>> -------------------------
>>
>> - The description of using callbacks in section 4. Functional 
>> Description may not be accurate as that approach would be associated 
>> with a functional "C" solution. If we were to consider an OOD 
>> approach "callbacks" may not be correct.
>>
>> - What's the "Data collection object"? Is it logging?
>
> No, its simply a cache to hold data.
>>
>> - I understand that the new Caiman Engine needs to provide the 
>> functionality of what is currently in libO however I think we should 
>> design what is required independent of libO and how libO is 
>> implemented today.
>
> I agree.  We are doing that.
>>
>> - We should design a consistent method of managing and passing 
>> consumer data between the engine modules. The parser/validation 
>> module describes some of how this will be done for user data from 
>> XML. I think we should consider having a user data gathering engine 
>> module that is independent of were the data comes from XML or GUI. 
>> Some data will also need to be gathered and acted on between engine 
>> modules. e.g. Target discover finds devices, the user selects which 
>> to install to, then target instanc. works on that device... A single 
>> common module should be used to manage gathering and passing this 
>> data. This will contribute to extensibility.
>>
>> OK I see: This is the Data Collection Object
>>
>> 5.1 Consumers.
>> --------------
>>
>> I see DC and perhaps R&R the same as LiveCD, TextI, AIC and VMC.
>>
>
> Yep. I agree.
>
> Thank you again for the detailed review!
>
> sarah

[caiman-discuss] Caiman architecture document open for review

Reply via email to