jeanm wrote: > From reading both Dave and your response I think the confusion is due > to what we consider a checkpoint. I consider a > checkpoint a single place in time, that is when you actually perform the > zfs snapshot (yeah, that's implementation specific but you get the idea) > You two seem to consider it the whole step.
Yes, basically you are correct. A checkpoint is a grouping of functionality, such as "Target Discovery", which is executed as a whole(unless there is a failure). An analogous checkpoint in DC is the im-pop piece. All IPS packages are installed at one time, and as far as I know you cannot stop in the middle of this finalizer and restart. sarah **** > > Correct? > > Jean > > Sarah Jelinek wrote: >> Hi Jean, >> >> >>> Sarah Jelinek wrote: >>>> Hi Sue, >>>> >>>> Thank you for reviewing this. Comments inline.. >>>> >>>> >>>>> Hi Sarah, >>>>> >>>>> Here are my comments. >>>>> >>>>> 1.0 Does the distinction between 2,3 and 4,5 mean that one should >>>>> be able to boot from one type of environment (network, media) and >>>>> install from another? >>>> >>>> Yes, basically that's what this means. For example, you could boot >>>> the text installer from local media, but install from an IPS >>>> repository. >>>> >>>> >>>>> For 10, I assume you mean the first boot after the installation is >>>>> complete? If so, adding those words would make it clearer. >>>> >>>> Yes, that's what I mean. I will modify this. You are the 2nd person >>>> to note that this sentence is confusing. >>>> >>>> >>>>> The term "this project" or "the project" is used in various places >>>>> (first referenced towards the bottom of p2). What does that term >>>>> mean wrt this document? Everything under the Caiman umbrella? Just >>>>> the implementation of the architecture (core engine and common app >>>>> classes)? >>>> >>>> In this document it refers to just the components of this document. >>>> But, the use of 'the project' implies it is one project and it >>>> isn't. Let me think about how to refer to this work in a way that is >>>> clearer. >>>> >>>> >>>>> 2.1 "The engine is capable of beginning execution at the next >>>>> unreached checkpoint". So if one has reached and stopped at >>>>> checkpoint 1, they can continue from checkpoint 2?? >> >> In our architecture, we will have completed the functionality that is >> provided by checkpoint 1, and thus can start with checkpoint 2 if >> desired. >>>> >>>> If one has reached and successfully run checkpoint 1, then they can >>>> start at checkpoint 2. >>> >> >> >>> >>> Now both Sue and I have commented on this so I don't feel so bad >>> poking because it means I'm not really that dense. >>> >>> How does this work? >>> >>> There are 2 ways that I see checkpoints could work 1) you checkpoint >>> before functionality for the checkpoint (like DC does it) or 2) you >>> checkpoint after functionality for that checkpoint. >>> >>> Case 1: You run checkpoint 1. That means functionality between chkpt >>> 1 and chkpt 2 has not been run so how can you "begin" from 2? >> >> >> I am not quite sure what you mean by functionality between checkpoint >> 1 and 2, but to clarify, when we have run checkpoint 1, we have >> actually run that checkpoint to its completion. So, the functionality >> that is next is in checkpoint 2. >> >>> picture: >>> >>> ----> chkpt 1 >>> some functionality for step 1 >>> ----> chkpt 2 >>> some functionality for step 2 >>> >>> You will now miss running the functionality for step 1 >>> >> >> No, we won't. The engine won't start at checkpoint 2 unless checkpoint >> 1 and the functionality provided by checkpoint 1 is completed >> successfully. There is no gap between 1 and 2. >> >>> Case 2: You run checkpoint 1. That means you have run the >>> functionality associated with chkpt1 but not yet 2. Since chkpt 2 is >>> after that functionality again you miss something. >> >> The engine must make sure that the user never asks for running a >> checkpoint that is > 1 past the previously run, completed and >> successful checkpoint. In the case above you run checkpoint 1, in >> which the functionality is fully encapsulated in that 'checkpoint'. >> There is no functionality between 1 and 2 that won't get run. >> >> >>> picture: >>> functionality for step 1 >>> ----> chkpt 1 >>> functionality for step 2 >>> -----> chkpt 2 >>> >>> In this case you're missing the functionality for step 2 >>> >>> Are you considering some kind of funky overlap? Like the following >>> picture? >>> >>> ---> starting checkpoint for step 1 >>> functionality for step 1 >>> ---> completion checkpoint for step 1 >>> ----> starting checkpoint for step 2 >>> functionality for step 2 >>> ---> completion checkpoint for step 2 >>> >>> So you would be aliasing end checkpoint for step n = start checkpoint >>> for n+1? >> >> No, when a user requests to run a checkpoint N, then we actually run >> that checkpoint N, and the functionality is executed for N. So, they >> can either re-run N, or ask to run N + 1. Which means we begin >> execution at N + 1. >> >> thanks, >> sarah >> **** >> >>> Jean >>> >>> >>> >>>>> remove_logger: from execution->from use during execution >>>>> Similar for remove_progress_receiver >>>>> >>>> >>>> Will change. >>>> >>>> >>>>> 2.2 >>>>> o to_xml(): why did you choose xml? what else was considered? >>>> >>>> this will be going away. Even if the internal store is XML, we won't >>>> be exposing that to the consumers. We haven't settled on the format >>>> of the cache and it doesn't matter what it is anyway. >>>> >>>>> o retrieve_data(): (nit) methods->method >>>> >>>> Fixed. >>>> >>>>> o dump_cache(), "contained in the cache in an AI manifest format". >>>>> Maybe add a reference or footnote here for the reader who doesn't >>>>> know what this format is? >>>> >>>> Sure, good idea. I will add. >>>> >>>> >>>>> 2.3 Is the implementation of the architecture assumed to be in >>>>> Python? This section refers to Python Logging module)? If so, >>>>> perhaps you should explicitly state that up front somewhere. >>>> >>>> No, it hasn't been decided if the whole architecture will be >>>> implemented in python. Sundar pointed out that this implementation >>>> detail doesn't really belong in the arch doc, and he is right. I >>>> will remove this. >>>> >>>> >>>>> 3 Mentions AI and DC. Should there be references here? >>>> >>>> Yes. I will add. >>>> >>>> >>>>> 3.1 >>>>> o add comma after: "After parsing and validating the manifest" >>>>> o under parse(): will do-> does >>>> >>>> Fixed. >>>> >>>>> 3.2 >>>>> o Second paragraph, maybe add "in the case of x86" before disk >>>>> partitions >>>> >>>> ok. >>>> >>>>> o (nit) passed in to the objects -> passed in to the object's >>>> >>>> Fixed. >>>>> o The See section 3.2 should reference 3.2.1.1 and 3.2.1.2, >>>>> otherwise points to itself. >>>> >>>> Fixed. >>>> >>>>> o Not clear to me why the seed is needed? >>>> >>>> Target discovery can be done on my types of targets. For example, >>>> with installation support for both S10 and S Next in an existing >>>> zpool, we need to discover the existing zpool, not the devices such >>>> as disks. So, the 'seed' value that is passed in to the >>>> TargetDiscovery class tells it what type of discovery we need it to >>>> do. We want to make this class useful to all Caiman applications >>>> that require target discovery. >>>> >>>> >>>>> 3.2.1.1 >>>>> o Disk Class - there are two get_assoc_slices(). One should be a >>>>> set_assoc_slices(). >>>> >>>> Fixed. >>>> >>>>> o Partition Class - should mention x86 only >>>> ok. >>>>> o Slice Class, what does get_flags do? What kind of flags? Maybe >>>>> change name to be more descriptive? >>>>> >>>> >>>> Probably needs to be more descriptive. It is really, in this doc, >>>> just a way to show we have to worry about slice flags, such as ROOT, >>>> or USR, settings, but the details will go into the design document >>>> for the data modeling work. >>>> >>>>> 3.2.1.2 >>>>> o Logical devices - It seems odd that there is an instantiate >>>>> method under TD. There are also several set methods which may fall >>>>> into the same category. Do these classes have methods for both TD >>>>> and TI? If so, the document should be more clear on that point. >>>> >>>> Yes, they have methods for both TD and TI, since we want to reuse >>>> the classes. I can make this clearer. The 'instantiate()' method >>>> will likely be renamed to ensure that it is clear what it is being >>>> used for. >>>> >>>>> o Dataset Class - represents a ZFS datasets -> represents a ZFS >>>>> dataset >>>> >>>> Fixed. >>>> >>>>> 4.1 You point to the secure wiki for a detailed class diagram. >>>>> Since this will be an open document, shouldn't the class digram be >>>>> moved to osol so external users can see it? >>>> >>>> I will move these to opensolaris. >>>> >>>>> >>>>> 4.2.1 I am seeing: "The following diagram Error: Reference source >>>>> not found, broken into multiple...". >>>> Yeah.. I saw that too. Not sure why. I will fix. >>>> >>>>> 4.2.2 The 4.2 in "Drawing 4.2 shows the sequence of Target >>>>> Discovery" should be 4.3 >>>> >>>> Fixed. >>>> >>>> Thanks again for the review. >>>> >>>> sarah >>>> **** >>>>> >>>>> Thanks, >>>>> Sue >>>> _______________________________________________ >>>> caiman-discuss mailing list >>>> caiman-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss >>> >
