Hi Evan, As discussed over the phone, I now understand the changes and agree with them. Thanks for being patient with me!
- Keith Evan Layton wrote: > Keith Mitchell wrote: >> >> >> Evan Layton wrote: >>> Keith Mitchell wrote: >>>> >>>> >>>> Evan Layton wrote: >>>>> Ethan Quach wrote: >>>>>> >>>>>> >>>>>> Evan Layton wrote: >>>>>>> Ethan Quach wrote: >>>>>>>> >>>>>>>> >>>>>>>> Evan Layton wrote: >>>>>>>>> jeanm wrote: >>>>>>>>>> Looks OK to me. Maybe some comments explaining the B_FALSE >>>>>>>>>> and what it does? >>>>>>>>>> I looked in the zfs code but that's kind of awkward. >>>>>>>>> >>>>>>>>> Probably a good idea to add that since it's a bit convoluted... >>>>>>>>> >>>>>>>>> I've added: >>>>>>>>> /* >>>>>>>>> * The boolean set to B_FALSE and passed to >>>>>>>>> zfs_destroy_snaps() tells >>>>>>>>> * zfs to process and destroy the snapshots now. Otherwise ZFS >>>>>>>>> will >>>>>>>>> * wait to destroy these snapshots until a zfs release is >>>>>>>>> called and >>>>>>>>> * the references for all the snapshots have been released. >>>>>>>>> */ >>>>>>>> >>>>>>>> nit - for the second sentence, "Otherwise the call will >>>>>>>> potentially >>>>>>>> return where the snapshot isn't actually destroyed yet, and ZFS is >>>>>>>> waiting until all the references to the snapshot have been >>>>>>>> released >>>>>>>> before actually destroying the snapshot." >>>>>>>> >>>>>>> >>>>>>> Much better! I'll make that change. >>>>>> >>>>>> Hmm, I may have made an incorrect assumption on the behavior >>>>>> though. Will the call actually return and ZFS waits, or wait before >>>>>> returning? >>>>> >>>>> If I understand the comments and the code correctly, ZFS returns >>>>> without >>>>> actually destroying the snapshots. It waits to do that until all the >>>>> references are released and the zfs_release has been called. >>>> Devil's advocate: What's the problem with using B_TRUE if this is >>>> how it works? If someone has truly set a hold on a BE snapshot, it >>>> seems more appropriate to let ZFS clean it up later when whoever >>>> put the hold on it is done with it. As long as it gets cleaned up >>>> eventually & automatically... And if we don't want people putting >>>> holds on it, would it be more appropriate to set the ZFS property >>>> such that normal users can't put a hold on it? >>> >>> There are a couple of problems with this. >>> >>> This approach has us returning form the zfs_destroy_snap call >>> without knowing if or when the snapshot will be destroyed. We think >>> we're done but may not be... >>> >>> Also this approach adds the need to place a zfs release in the >>> appropriate spots so that the last reference is released. With the >>> changes done for 6803121 when a snapshot is taken a reference is >>> automatically added. This is used to keep the snapshot around until >>> the zfs release is called. If we don't call it or call it in the >>> wrong place we end up returning from the zfs_destroy_snap believing >>> that the snapshot will be destroyed but since we either didn't call >>> release or it wasn't in the right spot, we then have no way to know >>> that the destroy didn't happen. The use of B_FALSE here forces the >>> destroy to happen now and not at some later point when zfs release >>> is called. What if we have a bunch of snapshots we're destroying and >>> we have all of them stacked up waiting for the zfs release. Then >>> when we call the zfs release it starts destroying them but one of >>> them fails for some reason, I'm not sure how strait forward it would >>> be to map that back to which snapshot destroy call we need to >>> reference to return the error. It may not be as big a deal as I'm >>> thinking right now but why add another call (zfs release) to do the >>> actual destroy later if we really don't need to? >> I think I am either misunderstanding the ZFS changes, or >> misunderstanding this specific instance of using a BE snapshot. From >> what I understand: >> >> zfs_create will create a snapshot, and set the user reference count to 1 >> zfs_release will decrement the user reference count, and destroy the >> snapshot if it brings the reference count to 0 >> zfs_hold will increment the reference count >> zfs_destroy will succeed if reference count is 1, and fail otherwise. > > We're using zfs_destroy_snap here. > >> >> If the snapshot being destroyed was created during this invocation of >> beadm, I see no problem with immediately destroying it. However, I >> think the appropriate call would be zfs_release, as this fits the new >> model more appropriately. Unless I misunderstand the new >> functionality, or the proposed fix in the CR is not what was >> implemented, zfs_release should delete the snapshot without a >> separate call to zfs_destroy. >> > > as I understand things with the new changes both the > zfs_destroy_snap() call and the zfs_release call are needed. > >> If the snapshot being destroyed was NOT created during this >> invocation of beadm, then it is not safe to assume that no user has >> placed a hold on it, and thus, immediate destruction is the incorrect >> choice here. Again, zfs_release should be used here. > > We are always either dealing with snapshots created during this > invocation of beadm, snapshots related directly to the clone and > snapshots we are specifically destroying or a snapshot that was > specifically requested to be destroyed. > >> >> In either case, by calling zfs_release, we can then immediately check >> the user reference count or return value, and immediately know >> whether (a) the snapshot was destroyed or (b) the snapshot is still >> alive due to a hold. > > We still need the zfs_destroy_snap call. I really don't want to have > to add an additional call to zfs_release which is what would be > required if we set the boolean to true. Plus it wouldn't buy us > anything to have zfs wait for the reference count to drop since we've > already gone through the process of making sure that this snapshot > doesn't have any dependencies before we get to this point. > Additionally if there are still references we don't want to destroy > this snapshot. If we use the B_TRUE and zfs_release route here it > wouldn't destroy the snapshot right then. If we then exit and another > call comes into the library that releases that reference the snapshot > would automatically be destroyed out from underneath us which may not > be exactly what we wanted. > >> >> Related question: Are there any situations where failure to delete a >> snapshot would preclude us from finishing a call to beadm? > > yes. If it fails to destroy the snapshot a BE's datasets where cloned > from then we fail the destroy and we end up with a snapshot hanging > around that should have been removed. > > > The short answer to all of this is that the use of B_FALSE here > defaults us to the original behavior which is what we want. If there's > still a reference to this snapshot then something in our code has > failed and we should catch this. > >>> >>> -evan >
