On Sat, Jan 12, 2013 at 5:31 PM, Richard Hipp <d...@sqlite.org> wrote:
On Sat, Jan 12, 2013 at 6:41 PM, Matt Welland <estifo...@gmail.com>
wrote:
This is with regards to the problem described here:
http://lists.fossil-scm.org:8080/pipermail/fossil-users/2008-February/000060.html
We are seeing on the order of 3-5 of these a year in our heaviest hit
repos. While this may seem like no big deal the fact that it is so
silent
is quite disruptive. The problem is that a developer working intently
on a
problem may not notice for hours or even days that they are no longer
actually working on the main thread of development.
I contend that this points up issues with your development process, not
with Fossil. If your developers do not notice that a fork has occurred
for
days, then they are doing "heads down" programming. They are not
maintaining situational awareness. (
http://en.wikipedia.org/wiki/Situation_awareness) They are fixating on
their own (small) problems and missing the big picture. This can lead
dissatisfied customers and/or quality problems.
"Situational awareness" is usually studied in dynamic environments that
are safety critical, such as aviation and surgery. Loss of situational
awareness is a leading cause of airplane crashes and medical errors.
Loss
of situational awareness is sometimes referred to as "tunnel vision".
The
person fixates on one tiny aspect of the problem and ignores the much
large
crisis unfolding around him. Eastern Airlines flight 401 (
http://en.wikipedia.org/wiki/Eastern_Air_Lines_Flight_401) is a classic
example of this: All three pilots of an L-1011 where "working intently"
on
a malfunctioning indicator light to the point that none of them noticed
that the plane was losing altitude until seconds before it crashed in
the
Florida Everglades.
Though usually studied in safety critical environments, situational
awareness is applicable in any complex and dynamic problem environment,
such as a developing advanced software. When you tell me that your
developers are "intently working" on one small aspect of the problem, to
the point of not noticing for several days that the trunk as forked -
that
tells me that there are likely other far more serious problems that they
are also not noticing. The fork is easily fixed with a merge. The
other
more serious problems might not have such an easy fix. And they might
go
undetected until your customer stumbles over them.
So, I would use the observation that forks are going undetected as a
symptom of more serious process problems in your organization, and
encourage you to seek ways of getting your developers to spend more time
"heads up" and looking at the big picture.
(Did you notice - "situational awareness" is kind of a big issue with
me.
Fossil is my effort at building a DVCS that does a better job of
promoting
situational awareness that the other popular VCSes out there. I'm
constantly looking for ways to enhance Fossil to promote better
situational
awareness. Suggestions are welcomed.)
Curious response. Did you intend to be insulting? I'm working with a
bunch
of very smart people who are very reluctantly learning a new tool and a
different way of doing things and forks are very confusing when they
happen
in a scenario where they seemingly should not. We are not operating in a
disconnected fashion here. Fossil falls somewhat short in the support of
people who like to get their job done at the command line (about 80% of
users on my team). Distilling from the fossil timeline command that there
is a fork and how to fix it is not easy. It is very tiresome to have to
go
back to the ui to ensure that a fork hasn't magically appeared.
Anyhow, I misunderstood the exact nature of the cause. I assumed that the
race condition lay within the users fossil process between the time the
db
query that checked for leaf and the insertion of the new checkin data in
to
the db. That is of course incorrect. The actual cause is that the central
database is free to receive a commit via sync after having just done a
sync
that informs the users fossil process that it is fine to commit.
Something
like the following:
User1 User2 central
sync
leafcheck sync
commit leafcheck
sync commit receives delta from user1 just fine
sync receives delta from user2 and now a fork
exists
As you point out below that is very difficult if not impossible to "fix".
What I think would alleviate this issue would be a check for fork
creation
at the end of the final sync. If a fork is found notify the user so it
can
be dealt with before confusion is created.
Just to illustrate, I think monotone deals rather nicely with the natural
but annoying creation of forks. The user is informed immediately the fork
occurs. Then the user only has to issue "mtn merge" and it does the easy
and obvious merge. With fossil I have to poll the ui to ensure I don't
have
a fork, if I do have a fork I have to browse the UI and figure out the
hash
id of the fork, do the merge and finally do a commit, manually doing what
could probably be mostly automated.
Contrast with git where you know when you are causing a fork because you
do
it all the time and dealing with forks is just day to day business.
Fossil
will silently fork and only by starting up the ui and digging around will
it become apparent that there is a fork.
In the referred to message DRH writes:
DVCSs make it very easy to fork the tree. To listen to
Linus Torvalds you would think this is a good thing. But
experience suggests otherwise.
I still mostly agree with this, but requiring that every developer poll
the
database for forks or risk confusion makes me think that the git approach
is perhaps not so crazy after all. If forks suck but only take seconds to
resolve, get people used to dealing with them, don't randomly create them
for no apparent reason. At least provide a heads up when they happen and
provide some help to resolve them.
In short fossil does an imperfect job of hiding the pain of forking and
so
when it does occur it can be surprising and a hassle..
We added the fork detection code to the fossil wrapper which helps (we
also see forks due to time lag on syncing between remote sites) but it
is
still a rather annoying problem.
My question is can this be solved by wrapping the code that determines
that we are at a leaf and the code that does the final commit with a
"BEGIN
IMMEDIATE;" ... "END;"?
No. Fossil already does that. Has done so for years.
Ah, I saw the calls to db_begin_transaction in commit.c wrapping the
check
for a fork and db_begin_transaction does "BEGIN" not "BEGIN IMMEDIATE".
The problem is that there are multiple disconnected replica of the
database. You cannot (reasonably) lock them all. See
http://en.wikipedia.org/wiki/CAP_theorem - DVCSes like Fossil choose
availability and partition tolerance and the expense of (immediate)
consistency, since consistency is easily restored later by merging in
the
rare event where it doesn't work out straight away.
To "fix" this problem (and again - I'm not yet convinced that it is a
problem that needs fixing) I think what you would need to do is create
some
kind of "reservation" system for commits. Suppose user A and user B
both
are about to commit. Each local fossil sends a message to the central
repository that tries to "reserve" the tip of trunk for some limited
period
of time, say 60 seconds. (The reservation interval might need to be
adjusted depending on network latencies). The first reservation wins.
If
user B is second, he gets back an error that says "User A is also
trying to
commit - wait 60 seconds and try again". That gives user B an
opportunity
to go for coffee, then merge in user A's changes before he tries again
later. You can make a reasonable argument that this is a good approach
to
development. In terms of the CAP theorem, you are selecting CP rather
than
the current AP.
Of course, this fix doesn't really work if you try to do a commit while
off network, since then you cannot make a reservation. It also doesn't
work if you don't have a single central repository that everybody
commits
to. So it isn't for everybody. But I can understand how some
organizations would want this.
This increases the risk of leaving the db in a locked state so having a
fossil command to unlock a database would be nice.
In this same vein it would be very nice to be able to control the
sqlite3
timeout. I'm fairly sure that a longer timeout would give us much
better
behaviour in our usage model.
I have some scripting that can generate the forks and I'm willing to
take
a stab at making this change but wanted to hear from the list if this
solution was worth trying.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
--
D. Richard Hipp
d...@sqlite.org
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users