Re: ENB: Fixing gnx collisions without a post scan

'Terry Brown' via leo-editor Wed, 15 Oct 2014 07:19:01 -0700

On Wed, 15 Oct 2014 06:47:25 -0700 (PDT)
"Edward K. Ream" <edream...@gmail.com> wrote:

> On Wednesday, October 15, 2014 8:28:05 AM UTC-5, Terry Brown wrote:
> 
> > I don't think the post-scan does require Bob's always incrementing 
> timestamp fix, I think the post-scan is a more general solution which 
> addresses the ".leo files from other sources" aspect of the
> duplicated gnx problem, as well as the "loading the same file twice
> in one second" aspect of the problem fixed by Bob.
> 
> I just don't see how collisions can happen except in automated
> situations. The gnx contains the committer id plus a timestamp.  That
> combination *is* going to be unique in general, unless the timestamps
> collide, which isn't going to happen except in Bob's case.
> 
> > So for maximum code cleanliness Bob's fix could be removed.
> 
> Again, I don't see how that statement can be correct.  Each
> invocation of Leo must be based on a unique timestamp.

That's not really an absolute - the absolute here is the statement you
make below, "We must not *ever* reassign gnx's, so any scheme that
guarantees no *new* collisions will suffice."  Having a system that
generates always incrementing timestamps locally doesn't address files
from other sources.  Of course the full gnx is unlikely to collide when
you include the username etc., but unlikely is not never.  When was the
last time anyone got their first choice of username signing up for a
web based service :-)  We probably don't want to be relying on Leo
having a small number of users to keep the chances of usernames
colliding in gnxs low :-) :-)

> > It seems that collecting the maximum index value for the gnx
> > timestamp 
> for a particular c /could/ be done in one of the existing scans, and
> that that seems like a less invasive fix than changing the gnx format.
> 
> Less visible externally, but I really don't like it.  And no,
> existing scans aren't up to the job because they happen too early,
> before we know what gnx's have been read.

You know the read code better than anyone, I was just assuming that
there would be opportunity to collect gnx info. on all gnxs in a c
prior to the new post-scan, even if that collection was spread across
different parts of the read cycle.  I can certainly see how the post
scan would be the simplest / cleanest to implement.

> > Also, I'm not convinced any non-scan based solutions (other than a 
> significant number of random bits) can really guarantee no
> collisions. The ".leo files from other sources" case basically means
> you have no idea what gnxs lurk in the file, unless you look.
> 
> If that were true, Leo would have been fundamentally broken all these
> many years.  But no, the combination of id and timestamp is almost
> always unique.

"Almost always", vs. "not *ever*".  We're definitely targeting
situations which will impact 99% of Leo users zero times in their Leo
using career, there's no argument that the present system is almost
always sufficient.

> We must not *ever* reassign gnx's, so any scheme that guarantees no
> *new* collisions will suffice.  That's what uuids or file numbers do.

I'm not seeing how the file number lets me know that the gnx

tbrown.20141015060623.1234.6 does not already exist in the .leo file
being read, particularly when the .leo file was created on another
system.

Cheers -Terry

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To post to this group, send email to leo-editor@googlegroups.com.
Visit this group at http://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Re: ENB: Fixing gnx collisions without a post scan

Reply via email to