Re: [OpenAFS] Re: volume offline due to too low uniquifier (and salvage cannot fix it)

Jakub Moscicki Tue, 16 Apr 2013 12:28:05 -0700

Hello,

Sorry for late reply - at this side of the pond we are off at home by now ;-)


Yes, the issue seems to be the overflowing 32bit uniquifier counter which we 
suspected and which was later confirmed by Derrick.

In any case for now I took out the "check for less then the max" to restore 
access to data and everything seemed to work okay. Users are happy ;-)

I am not sure, but if the uniquifier is just to uniquify the potentialy reused 
vnode id, then the risk of collisions is really low even without this check, 
right? One would have to have the uniquifier wrapped around 32bits in between 
the reuse of vnode id and hit exactly the same uniquifier number.  It depends 
on the details of course of vnode id reuse algorithm but looks like very low 
probability.

Many thanks for help to all!

kuba

--


On Apr 16, 2013, at 8:07 PM, Andrew Deason <adea...@sinenomine.net>
 wrote:

> On Tue, 16 Apr 2013 13:34:18 -0400
> Derrick Brashear <sha...@gmail.com> wrote:
> 
>> The problem he's having (apparently I replied without replying all) is
>> he's wrapping uniquifiers, and currently the volume package deals
>> poorly, since ~1300 from maxuint plus 2000 plus 1 results in a number
>> "less than the max uniquifier"
> 
> Okay; yeah, makes sense.
> 
>> We need to decide whether OpenAFS should
>> 1) compact the uniquifier space via the salvager (re-uniquifying all
>> outstanding vnodes save 1.1, presumably).
>> or
>> 2) simply allow the uniquifier to wrap, removing the check for "less
>> than the max", but ensuring we skip 0 and 1. there will be no direct
>> collisions as no vnode can exist twice
>> 
>> either way, there is a slight chance a vnode,unique tuple which
>> previously existed may exist again.
> 
> Yes, but that is inevitable unless we keep trackof uniqs per-vnode or
> something. If we do option (1), I feel like that makes the possible
> collisions more infrequent in a way, since the event triggering the
> collisions is a salvage, which has 'undefined' new contents for caching
> purposes anyway. In option (2) you can have a collision by just removing
> a file and creating one. Maybe those aren't _so_ different, but that's
> my impression.
> 
> I feel like the fileserver could also maybe not increment the uniq
> counter so much, if we issue a lot of create's/mkdir's with no other
> interleaving operations. That is, if we create 3 files in a row, it's
> fine if they were given fids 1.2.9, 1.4.9, and 1.6.9, right? We would
> guarantee that we wouldn't collide on the whole fid (at least, no more
> so than now), just on the uniq, which is okay, right? That might help
> avoid this in some scenarios.
> 
> And for kuba's sake, I guess the immediate workaround to get the volume
> online could be to just remove that check for the uniq. I would use that
> to just get the data online long enough to copy the data to another
> volume, effectively re-uniq-ifying them. I think I'd be uneasy with just
> not having the check in general, but I'd need to think about it...
> 
> -- 
> Andrew Deason
> adea...@sinenomine.net
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] Re: volume offline due to too low uniquifier (and salvage cannot fix it)

Reply via email to