Hi Sarah,
Additional comments in line, (filling in the gaps ...)
Evan Layton wrote:
> Some comments inline.
> (more to come...)
>
> Sarah Jelinek wrote:
>> Hi All,
>>
>> I have some review comments on the BE library for Snap Upgrade Design
>> Specification, Revision 0.5. Apologies for the delay.
>>
>>
>
> It's not like this reply was quick either. ;-)
>
>> -Section 1.2, the BE definition states:
>> The BE is currently defined by it's name which is the ZFS root pool it's
>> in and the datasets that belong to it. For example a BE would have the
>> following datasets...
>>
>> I wonder if we can be more precise here about what datasets are required
>> to meet the BE requirements? And, what are optional or supported
>> configurations. The reason I ask is that if a user has separate
>> datasets, or multiple BE's in a root pool, upgrading the datasets via
>> snap upgrade can mean different things depending on supported or
>> required configurations.
>>
>
> The only real requirement here is the root dataset. This is the dataset
> under rpool/ROOT and is the name of the BE. Any other dataset under that
> is not necessarily required however in may instances a separate /var is
> required to meet security requirements. There were arguments from some
> that the BE should just be rpool/ROOT/<be_name> but what we've proposed
> gives a bit more flexibility. The fact that anything under
> rpool/ROOT/<be_name> is considered part of the BE means that it really
> doesn't matter what's there when we do an upgrade.
>> -Section 2.1, your diagram shows IPS but it doesn't show Transfer. I
>> don't know if this service will be required or not at the time snap is
>> available, but I am wondering if you had thought about this? You note
>> the transfer module in your comments, but it isn't clear to me how you
>> envision this all to work together.
>>
>
> We mount the BE (and all of it's associated datasets on say /a) and pass
> that mounted area to the transfer module to install into. (created,
> mounted and then that mount point is passed to the transfer module).
And just to clarify, the entity passing that mounted dir to the
Transfer module is the installer. So what this means is that the
Transfer module's relationship with the Installer should look like
IPS's relationship with the Installer; we'll add it to the diagram.
Currently the installer calls the Transfer module with the target dir
already created and mounted so there isn't any interaction between
the Transfer module and TI or the BE library, and I don't see this
changing.
If later on we support installing directly from a repo, then the
Transfer module would likely have a relationship with IPS.
>
>> -Section 2.2, your diagram shows the libbe invoking the copy but the
>> text says the 'installer' will do the copy. Just checking to see who
>> does what.
The UFS to ZFS migration design is not complete. We'll try to have
it included with the next rev of the design. But in a nutshell,
the installer supports the case where there is no additional local
storage to create a ZFS BE to migrate to. This case requires DSR.
The installer will be doing the copy of the bits to and fro the
alternate location.
The other case is when there is additional local storage to create
a zpool and a ZFS BE to migrate to. For this case though, theres
no reason not to use the BE cli to accomplish the migration.
>>
>> -Section 3.1.1, Why would we use a file to hold policy information.
>> Maybe you don't mean 'file' exactly, but it seems like we could do a
>> better job than just a flat file. Something like SMF extended profiles.
We're considering storing this policy information in SMF.
>>
>> -Section 3.1.1.1, what are the class definitions for the snapshots?
These are the policy definitions for snapshots.
>> I
>> would think we would want pre-defined types, and perhaps the ability to
>> allow user defined types.
That's the idea. We will deliver a default policy class for snapshot
retention. The default could be something like:
snap_num_max = 12 # maximum num of snapshots to keep
snap_age_max = 336 # maximum number of hours to keep the snapshot
snap_storage_free = 80 # % level of pool storage usage before deleting
# oldest snapshots.
And we'd also deliver a policy class for "infinity". i.e. these
snapshots would never get deleted unless explicitly called to get
deleted via be_destroy().
Our intent was also to allow the caller to define their own
snapshot policy class, and be able to call into libbe to and create
snapshots managed via that policy. A caller defined snapshot policy
class gets defined by delivering a file into a specified area as
noted in 3.1.1.2-rev 0.5 (or stored in SMF if we decide to do that)
However, we may not end up implementing support for these caller
defined policy classes. Right now the only consumers of this are IPS,
the BE cli, and the installer, and they would all likely use either
the default class or the infinity class.
> But there are issues with user defined types
>> being understood by the software. Maybe the snapshot policy, instead of
>> being a file or an attribute on a class we define, could be something
>> ZFS could add itself as one of the snapshot dataset attributes? This
>> would be cleaner in my opinion.
We thought about this some, but since there is going to be multiple
policy attributes (at least three right now), we didn't want to store
it all as dataset attributes.
The one thing we hadn't cleanly figured out yet was how to tie a BE
snapshot to which policy class it belonged to. This perhaps could be
something we store as a snapshot dataset attribute. (Was that what
you meant?)
>>
>> -Who manages the retention policies? libbe? The text says that libbe
>> will provide a way to delete, display, create snapshots, but if a policy
>> indicates a retention of say 5 hours, who goes back and deletes this? In
libbe enforces the retention policy, but its more of a "passive"
enforcement. i.e. we won't be having a be-daemon running around
checking statuses of BE snapshots. Instead, on each invocation of
BE or BE snapshot creation, we run through the snapshots and delete
the ones that have outstayed their policy.
>> 3.1.1.3 you indicate that the persistence policy must be managed by
>> caller. But, you also indicate a 'default' policy that somehow gets
>> enforced by libbe? It might be good to outline exactly what libbe will
>> do, say in the case of a default policy, and what it will or won't do in
>> other cases.
Hopefully this is answered in the description above.
>>
>
> We're in the process of fixing this and the next version of the document
> will have this corrected and hopefully be clearer.
>
>> -Can an ABE be in a different root pool than the original BE? I would
>> assume so, I can't think of a technical reason this wouldn't work, but I
>> am just checking.
>>
>
> Yes as long as the pool it's in can be made the root pool. (pools
> containing stripes can not be used as root pools)
>
>> -A comment, or perhaps just musings out loud... one of the key
>> differences with ZFS snap upgrade is that a user isn't required to have
>> an ABE created when creating an initial BE. With the current live
>> upgrade mechanism this is required in a sense. When you run lucreate it
>> basically copies the current BE data(which it will designate the
>> currently booted systems BE as the current BE) to the ABE. With Snap it
>> seems as if we will always create a BE when doing an initial install, is
>> this correct? I am asking because:
>>
>> -I assume a user does not have to have enough space for an ABE when
>> doing an initial install? Up to the user really, right?
>>
>
> Right just enough for the initial PBE
>
>> -If only one BE is required in a root pool, the user doesn't have to
>> specify a synclist initially like we do today. But, if they want an ABE
>> they will have to do this if they need to keep some filesystems not shared.
sync'ing support is going to look a little different. The plan is
to no longer support random user selected files to be kept in sync.
If something needs to be kept in sync, it should be stored in a shared
file system. We're also going to do some cleanup of /var and put
parts of it into the shared space. This will handle most of whats on
the default synclist now, but some of it will be left out. More to
come on this though ...
>> -How will synclist datasets be represented in the namespace? Based on
>> your design it is assumed that non BE datasets will be under the root
>> pool which they belong, and not in the BE namespace. But, if a user
>> wants to have datasets not share, how will this be managed?
All non-shared datasets the user wishes to create for a BE needs to
be under the BE root dataset. These can't be kept in sync.
I'm not sure what you mean by a synclist dataset. If some piece of
data needs to be kept the same across all BE, why keep it in
non-shared space?
Thanks,
-ethan