Hi Boris,
I now have working code that implements this feature, and defaults to
ignoring hole_birth data for sends if this feature is not enabled; I'm
going to post patches for it after I've tested it on both ZoL and illumos.

The latter portion of the above is easily changed, but "always correct and
marginally less efficient for old data" seemed a better default than
"usually correct".

For a filesystem that prides itself on not allowing silent corruption,
requiring manual detection and intervention for correctness seems
unreasonable to me.

- Rich

On Fri, Jul 8, 2016 at 12:15 PM, Boris <bprotopo...@hotmail.com> wrote:

> Hi, Rich,
>
>
> I agree that unconditional switch using the tunable is heavy if done 'as a
> matter of standard practice' as opposed to 'for a short time, to fixup the
> known corrupted backups'.
>
>
> To clarify the earlier suggestions, people with data affected by the bug
> can do two things:
>
>
> 1) install the code with 6513 fix and the patch with the tunable,
> then temporarily turn off the hole birth optimization, resend the
> 'difference' (a selected subset of incrementals) affected by the problem,
> then turn the optimization back on
>
> 2) install the code with 6513 fix without the patch, do a full send of
> the affected snapshots
>
>
> For 2) the non-incremental send would need to happen only once per the
> affected snapshot lineage. Once the missed holes are re-instated with the
> full send, the new fixed code will perform proper incremental sends.
>
>
> 1) is potentially more optimal in terms of resources (network bw, etc.)
>
> 2) is potentially simpler from the operational standpoint, does not
> require building/installing patched code, twiddling the tunables, etc.
>
>
> Boris.
>
>
>
>
> ------------------------------
> *From:* Rich <rincebr...@gmail.com>
> *Sent:* Friday, July 8, 2016 11:26:27 AM
> *To:* developer
> *Cc:* Developer Lists Illumos
> *Subject:* Re: [developer] Improvements to 6513 handling
>
> Hi Boris,
> A full send of the affected snapshots should be safe, AIUI - but that
> means people would need to do non-incremental snapshot sends to be certain
> of not hitting this bug, which becomes increasingly infeasible as your
> datasets grow.
>
> If we're looking for the simplest solution without risk of data
> corruption, unconditionally ignoring the hole_birth data for doing a zfs
> send fits the bill, but seems a bit heavy-handed.
>
> This seemed like the best way to permit people to safely send older
> datasets while also permitting use of the hole_birth data going forward.
>
> - Rich
>
> On Fri, Jul 8, 2016 at 11:03 AM, Boris <bprotopo...@hotmail.com> wrote:
>
>> Hi, Rich,
>>
>>
>> perhaps there is a simpler solution here.
>>
>>
>> I think for the datasets affected by this feature, a full (not
>> incremental) send of the source snapshot that has some holes that have not
>> been transmitted by the faulty incremental send code, should fix the issue,
>> as far as the on-disk layout is concerned.
>>
>>
>> Boris.
>>
>> ------------------------------
>> *From:* Rich <rincebr...@gmail.com>
>> *Sent:* Thursday, July 7, 2016 8:30:54 PM
>> *To:* developer; Developer Lists Illumos
>> *Subject:* [developer] Improvements to 6513 handling
>>
>> Hi all,
>>
>> So, ZFS on Linux just noticed it was getting bitten by what ultimately
>> turned out to be Illumos #6513, partially filled holes losing birth time.
>>
>> Implementing that fix removes this problem for new data, but on all
>> platforms, this doesn't help data already written on existing pools,
>> getting munged silently in incremental sends forever.
>>
>> pcd pointed out that a relatively trivial workaround would be possible by
>> simply ignoring the hole_birth metadata with something like a global
>> tunable, but that seems too heavy-handed to me - either you're disabling
>> the feature everywhere because you don't know when you can start trusting
>> the birth times, or you're risking silent mangling of affected files
>> forever.
>>
>> I'd like to suggest using a read-compatible feature, call it something
>> like hole_birth_fix, in conjunction with the enabled_txg feature, to permit
>> a reasonable default of ignoring hole_birth information before the
>> hole_birth_fix feature was enabled, but still permitting use of it
>> afterward.
>>
>> This has the unfortunate behavior of breaking write support if you enable
>> hole_birth_fix and then try to go back to a prior codebase, but I can't
>> think of a reasonable way to avoid this.
>>
>> I filed illumos #7175 to track this proposal - I'll happily write the
>> code to implement this shortly.
>>
>> (Apologies if I've over-CCed or missed someone I should be asking for
>> comment, I've not done this workflow before.)
>>
>> - Rich
>>
>
> *openzfs-developer* | Archives
> <https://www.listbox.com/member/archive/274414/=now>
> <https://www.listbox.com/member/archive/rss/274414/28015372-192ee3b3> |
> Modify
> <https://www.listbox.com/member/?&;>
> Your Subscription <http://www.listbox.com>
>



-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com

Reply via email to