[caiman-discuss] Caiman architecture document open for review

Sarah Jelinek Thu, 14 Jan 2010 10:30:52 -0700


jeanm wrote:
>  From reading both Dave and your response I think the confusion is due 
> to what we consider a checkpoint. I consider a
> checkpoint a single place in time, that is when you actually perform the 
> zfs snapshot (yeah, that's implementation specific but you get the idea) 
> You two seem to consider it the whole step.


Yes, basically you are correct. A checkpoint is a grouping of 
functionality, such as "Target Discovery", which is executed as a 
whole(unless there is a failure). An analogous checkpoint in DC is the 
im-pop piece. All IPS packages are installed at one time, and as far as 
I know you cannot stop in the middle of this finalizer and restart.

sarah
****
> 
> Correct?
> 
> Jean
> 
> Sarah Jelinek wrote:
>> Hi Jean,
>>
>>
>>> Sarah Jelinek wrote:
>>>> Hi Sue,
>>>>
>>>> Thank you for reviewing this. Comments inline..
>>>>
>>>>
>>>>> Hi Sarah,
>>>>>
>>>>> Here are my comments.
>>>>>
>>>>> 1.0 Does the distinction between 2,3 and 4,5 mean that one should 
>>>>> be able to boot from one type of environment (network, media) and 
>>>>> install from another?
>>>>
>>>> Yes, basically that's what this means. For example, you could boot 
>>>> the text installer from local media, but install from an IPS 
>>>> repository.
>>>>
>>>>
>>>>> For 10, I assume you mean the first boot after the installation is 
>>>>> complete? If so, adding those words would make it clearer.
>>>>
>>>> Yes, that's what I mean. I will modify this. You are the 2nd person 
>>>> to note that this sentence is confusing.
>>>>
>>>>
>>>>> The term "this project" or "the project" is used in various places 
>>>>> (first referenced towards the bottom of p2). What does that term 
>>>>> mean wrt this document? Everything under the Caiman umbrella? Just 
>>>>> the implementation of the architecture (core engine and common app 
>>>>> classes)?
>>>>
>>>> In this document it refers to just the components of this document. 
>>>> But, the use of 'the project' implies it is one project and it 
>>>> isn't. Let me think about how to refer to this work in a way that is 
>>>> clearer.
>>>>
>>>>
>>>>> 2.1 "The engine is capable of beginning execution at the next 
>>>>> unreached checkpoint". So if one has reached and stopped at 
>>>>> checkpoint 1, they can continue from checkpoint 2??
>>
>> In our architecture, we will have completed the functionality that is 
>> provided by checkpoint 1, and thus can start with checkpoint 2 if 
>> desired.
>>>>
>>>> If one has reached and successfully run checkpoint 1, then they can 
>>>> start at checkpoint 2.
>>>
>>
>>
>>>
>>> Now both Sue and I have commented on this so I don't feel so bad 
>>> poking because it means I'm not really that dense.
>>>
>>> How does this work?
>>>
>>> There are 2 ways that I see checkpoints could work 1) you checkpoint 
>>> before functionality for the checkpoint (like DC does it) or 2) you 
>>> checkpoint after functionality for that checkpoint.
>>>
>>> Case 1: You run checkpoint 1. That means functionality between chkpt 
>>> 1 and chkpt 2 has not been run so how can you "begin" from 2?
>>
>>
>> I am not quite sure what you mean by functionality between checkpoint 
>> 1 and 2, but to clarify, when we have run checkpoint 1, we have 
>> actually run that checkpoint to its completion. So, the functionality 
>> that is next is in checkpoint 2.
>>
>>> picture:
>>>
>>> ----> chkpt 1
>>> some functionality for step 1
>>> ----> chkpt 2
>>> some functionality for step 2
>>>
>>> You will now miss running the functionality for step 1
>>>
>>
>> No, we won't. The engine won't start at checkpoint 2 unless checkpoint 
>> 1 and the functionality provided by checkpoint 1 is completed 
>> successfully. There is no gap between 1 and 2.
>>
>>> Case 2: You run checkpoint 1. That means you have run the 
>>> functionality associated with chkpt1 but not yet 2. Since chkpt 2 is 
>>> after that functionality again you miss something.
>>
>> The engine must make sure that the user never asks for running a 
>> checkpoint that is > 1 past the previously run, completed and 
>> successful checkpoint. In the case above you run checkpoint 1, in 
>> which the functionality is fully encapsulated in that 'checkpoint'. 
>> There is no functionality between 1 and 2 that won't get run.
>>
>>
>>> picture:
>>> functionality for step 1
>>> ----> chkpt 1
>>> functionality for step 2
>>> -----> chkpt 2
>>>
>>> In this case you're missing the functionality for step 2
>>>
>>> Are you considering some kind of funky overlap? Like the following 
>>> picture?
>>>
>>> ---> starting checkpoint for step 1
>>> functionality for step 1
>>> ---> completion checkpoint for step 1
>>> ----> starting checkpoint for step 2
>>> functionality for step 2
>>> ---> completion checkpoint for step 2
>>>
>>> So you would be aliasing end checkpoint for step n = start checkpoint 
>>> for n+1?
>>
>> No, when a user requests to run a checkpoint N, then we actually run 
>> that checkpoint N, and the functionality is executed for N. So, they 
>> can either re-run N, or ask to run N + 1. Which means we begin 
>> execution at N + 1.
>>
>> thanks,
>> sarah
>> ****
>>
>>> Jean
>>>
>>>
>>>
>>>>> remove_logger: from execution->from use during execution
>>>>> Similar for remove_progress_receiver
>>>>>
>>>>
>>>> Will change.
>>>>
>>>>
>>>>> 2.2
>>>>> o to_xml(): why did you choose xml? what else was considered?
>>>>
>>>> this will be going away. Even if the internal store is XML, we won't 
>>>> be exposing that to the consumers. We haven't settled on the format 
>>>> of the cache and it doesn't matter what it is anyway.
>>>>
>>>>> o retrieve_data(): (nit) methods->method
>>>>
>>>> Fixed.
>>>>
>>>>> o dump_cache(), "contained in the cache in an AI manifest format". 
>>>>> Maybe add a reference or footnote here for the reader who doesn't 
>>>>> know what this format is?
>>>>
>>>> Sure, good idea. I will add.
>>>>
>>>>
>>>>> 2.3 Is the implementation of the architecture assumed to be in 
>>>>> Python? This section refers to Python Logging module)? If so, 
>>>>> perhaps you should explicitly state that up front somewhere.
>>>>
>>>> No, it hasn't been decided if the whole architecture will be 
>>>> implemented in python. Sundar pointed out that this implementation 
>>>> detail doesn't really belong in the arch doc, and he is right. I 
>>>> will remove this.
>>>>
>>>>
>>>>> 3 Mentions AI and DC. Should there be references here?
>>>>
>>>> Yes. I will add.
>>>>
>>>>
>>>>> 3.1
>>>>> o add comma after: "After parsing and validating the manifest"
>>>>> o under parse(): will do-> does
>>>>
>>>> Fixed.
>>>>
>>>>> 3.2
>>>>> o Second paragraph, maybe add "in the case of x86" before disk 
>>>>> partitions
>>>>
>>>> ok.
>>>>
>>>>> o (nit) passed in to the objects -> passed in to the object's
>>>>
>>>> Fixed.
>>>>> o The See section 3.2 should reference 3.2.1.1 and 3.2.1.2, 
>>>>> otherwise points to itself.
>>>>
>>>> Fixed.
>>>>
>>>>> o Not clear to me why the seed is needed?
>>>>
>>>> Target discovery can be done on my types of targets. For example, 
>>>> with installation support for both S10 and S Next in an existing 
>>>> zpool, we need to discover the existing zpool, not the devices such 
>>>> as disks. So, the 'seed' value that is passed in to the 
>>>> TargetDiscovery class tells it what type of discovery we need it to 
>>>> do. We want to make this class useful to all Caiman applications 
>>>> that require target discovery.
>>>>
>>>>
>>>>> 3.2.1.1
>>>>> o Disk Class - there are two get_assoc_slices(). One should be a 
>>>>> set_assoc_slices().
>>>>
>>>> Fixed.
>>>>
>>>>> o Partition Class - should mention x86 only
>>>> ok.
>>>>> o Slice Class, what does get_flags do? What kind of flags? Maybe 
>>>>> change name to be more descriptive?
>>>>>
>>>>
>>>> Probably needs to be more descriptive. It is really, in this doc, 
>>>> just a way to show we have to worry about slice flags, such as ROOT, 
>>>> or USR, settings, but the details will go into the design document 
>>>> for the data modeling work.
>>>>
>>>>> 3.2.1.2
>>>>> o Logical devices - It seems odd that there is an instantiate 
>>>>> method under TD. There are also several set methods which may fall 
>>>>> into the same category. Do these classes have methods for both TD 
>>>>> and TI? If so, the document should be more clear on that point.
>>>>
>>>> Yes, they have methods for both TD and TI, since we want to reuse 
>>>> the classes. I can make this clearer. The 'instantiate()' method 
>>>> will likely be renamed to ensure that it is clear what it is being 
>>>> used for.
>>>>
>>>>> o Dataset Class - represents a ZFS datasets -> represents a ZFS 
>>>>> dataset
>>>>
>>>> Fixed.
>>>>
>>>>> 4.1 You point to the secure wiki for a detailed class diagram. 
>>>>> Since this will be an open document, shouldn't the class digram be 
>>>>> moved to osol so external users can see it?
>>>>
>>>> I will move these to opensolaris.
>>>>
>>>>>
>>>>> 4.2.1 I am seeing: "The following diagram Error: Reference source 
>>>>> not found, broken into multiple...".
>>>> Yeah.. I saw that too. Not sure why. I will fix.
>>>>
>>>>> 4.2.2 The 4.2 in "Drawing 4.2 shows the sequence of Target 
>>>>> Discovery" should be 4.3
>>>>
>>>> Fixed.
>>>>
>>>> Thanks again for the review.
>>>>
>>>> sarah
>>>> ****
>>>>>
>>>>> Thanks,
>>>>> Sue
>>>> _______________________________________________
>>>> caiman-discuss mailing list
>>>> caiman-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>>>
>

[caiman-discuss] Caiman architecture document open for review

Reply via email to