Re: [PATCH/WIP v2 00/14] inotify support

2014-01-28 Thread Duy Nguyen
On Mon, Jan 20, 2014 at 12:04 AM, Thomas Rast t...@thomasrast.ch wrote:
 There's also the problem of ordering guarantees between the socket and
 inotify.  I haven't found any, so I would conservatively assume that the
 socket messages may in fact arrive before inotify, which is a race in
 the current code.  E.g., in the sequence 'touch foo; git status' the
 daemon may see

   socketinotify
hello...
status
new empty list
 touch foo

 I think a clever way to handle this would be to add a new command:

   Wait::
 This command serves synchronization.  Git creates a file of its
 choice in $GIT_DIR/watch (say, `.git/watch/wait.random`).  Then it
 sends wait path.  The watcher MUST block until it has processed
 all change notifications up to and including path.

Assuming that the time between foo is touched and the time an event is
put in the daemon's queue is reasonably small, would emptying the
event queue at hello be enough? To my innocent eyes (at the kernel),
it seems inotify handling happens immediately after an fs event, and
it's uninterruptable (or at least not interruptable by another user
space process, I don't think we need to care about true interrupts).
If that's true, by the time the touch syscall is finished, the event
is already sitting in the daemon's queue.

The problem with wait.random is we need to tell the daemon to expect
it. Otherwise if the daemon processes the wait.random even before
wait is sent, it would try to wait for the (lost) wait.random
event forever. An extension is git touch wait.random regularly. Or
to keep a queue of processed wait.* events. Both look ugly imo.
Another option is send expect wait.random first, wait for ack,
touch wait.random, then send wait, which is too much.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v2 00/14] inotify support

2014-01-20 Thread Thomas Rast
Duy Nguyen pclo...@gmail.com writes:

 I think a clever way to handle this would be to add a new command:

   Wait::
 This command serves synchronization.  Git creates a file of its
 choice in $GIT_DIR/watch (say, `.git/watch/wait.random`).  Then it
 sends wait path.  The watcher MUST block until it has processed
 all change notifications up to and including path.

 So wait.random inotify event functions as a barrier. Nice.

I forgot to specify a return for wait.  Not sure you need one, though
correctly handling the timeout (that you apply for all select()) may be
somewhat tricky without it.

 Ok, that's probably a confused sum of rambles.  Let me know if you can
 make any sense of it.

 Thank you for your input. Now I'm back to the white board (or paper).

Don't go too far ;-)

Thanks a lot for doing this!  It's good that you picked it up, and I
think your design strikes a good balance in the complexity of the
protocol and the daemon's state.

-- 
Thomas Rast
t...@thomasrast.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v2 00/14] inotify support

2014-01-19 Thread Thomas Rast
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:
   read-cache: save trailing sha-1
   read-cache: new extension to mark what file is watched
   read-cache: connect to file watcher
   read-cache: ask file watcher to watch files
   read-cache: put some limits on file watching
   read-cache: get modified file list from file watcher
   read-cache: add config to start file watcher automatically
   read-cache: add GIT_TEST_FORCE_WATCHER for testing
   file-watcher: add --shutdown and --log options
   file-watcher: automatically quit
   file-watcher: support inotify
   file-watcher: exit when cwd is gone
   pkt-line.c: increase buffer size to 8192
   t1301: exclude sockets from file permission check

I never got the last three patches, did you send them?

Also, this doesn't cleanly apply anywhere at my end.  Can you push it
somewhere for easier experimentation?

 This is getting in better shape. Still wondering if the design is
 right, so documentation, tests and some error cases are still
 neglected. I have not addressed Jonathan's and Jeff's comments in this
 reroll, but I haven't forgotten them yet. The test suite seems to be
 fine when file-watcher is forced on with GIT_TEST_FORCE_WATCHER set..

I tried to figure out whether there were any corners you are painting it
into, but doing so without a clear overview of the protocol makes my
head spin.

So here's what I gather from the patches (I would have started from the
final result, but see above).

The slow path, before the watcher is ready, works like:

  spawn watcher
  send hello $index_sha1
  receive hello 
  mark most files CE_WATCHED
  send watch $paths for each CE_WATCHED
the paths are actually a pkt-line-encoded series of paths
get back fine $n (n = number of watches processed)
  do work as usual (including lstat())

The fast path:

  load CE_WATCHED from index
  send hello $index_sha1
  receive hello $index_sha1
  send status
get back new $path for each changed file
again aggregate
  mark files that are new as ~CE_WATCHED
  files that are CE_WATCHED can skip their lstat(), since they are unchanged

On index writing (both paths):

  save CE_WATCHED in index extension
  send bye $index_sha1
  send forget $index_sha1 $path for every path that is known to be changed

So I think the watcher protocol can be specified roughly as:

  Definitions: The watcher is a separate process that maintains a set of
  _watched paths_.  Git uses the commands below to add or remove paths
  from this set.

  In addition it maintains a set of _changed paths_, which is a subset
  of the watched paths.  Git occasionally queries the changed paths.  If
  at any time after a path P was added to the watched set, P has or
  may have changed, it MUST be added to the changed set.

  Note that a path being unchanged under this definition does NOT mean
  that it is unchanged in the index as per `git diff`.  It may have been
  changed before the watch was issued!  Therefore, Git MUST test whether
  the path started out unchanged using lstat() AFTER it was added to the
  watched set.

  Handshake::
On connecting, git sends hello magic.  magic is an opaque
string that identifies the state of the index.  The watcher MUST
respond with hello previous_magic, where previous_magic is
obtained from the bye command (below).  If magic !=
previous_magic, the watcher MUST reset state for this repository.

  Watch::
Git sends watch pathlist where the pathlist consists of a
series of 4-digit lengths and literal pathnames (as in the pkt-line
format) for each path it is interested in.  The watcher MUST respond
with fine n after processing a message that contained n paths.
The watcher MUST add each path to the watched set and remove it
from the changed set (no error if it was already watched, or not
changed).

  Status::
Git sends status.  The watcher MUST respond with its current
changed set.  This uses the format new pathlist, with
pathlist formatted as for the watch command.

  Bye::
Git sends bye magic to indicate it has finished writing the
index.  The watcher must store magic so as to use it as the
previous_magic in a hello response.

  Forget::
Git sends forget pathlist, with the pathlist formatted as for
the watch command.  The watcher SHOULD remove each path in
pathlist from its watched set.

Did I get that approximately straight?  Perhaps you can fix up as needed
and then turn it into the documentation for the protocol.

There are several points about this that I don't like:

* What does the datagram socket buy us?  You already do a lot of
  pkt-line work for the really big messages, so why not just use
  pkt-line everywhere and a streaming socket?

* The handshake should have room for capabilities, so that we can extend
  the protocol.

* The hello/hello handshake is confusing, and probably would allow you
  to point two watchers at each other without them noticing.  Make it
  hello/ready, or 

Re: [PATCH/WIP v2 00/14] inotify support

2014-01-19 Thread Duy Nguyen
On Mon, Jan 20, 2014 at 12:04 AM, Thomas Rast t...@thomasrast.ch wrote:
 I never got the last three patches, did you send them?

 Also, this doesn't cleanly apply anywhere at my end.  Can you push it
 somewhere for easier experimentation?

Sorry I rebased it on top of kb/fast-hashmap but never got around to
actually using hash tables. The code is here (and on top of master)

https://github.com/pclouds/git.git file-watcher

 So I think the watcher protocol can be specified roughly as:

   Definitions: The watcher is a separate process that maintains a set of
   _watched paths_.  Git uses the commands below to add or remove paths
   from this set.

   In addition it maintains a set of _changed paths_, which is a subset
   of the watched paths.  Git occasionally queries the changed paths.  If
   at any time after a path P was added to the watched set, P has or
   may have changed, it MUST be added to the changed set.

   Note that a path being unchanged under this definition does NOT mean
   that it is unchanged in the index as per `git diff`.  It may have been
   changed before the watch was issued!  Therefore, Git MUST test whether
   the path started out unchanged using lstat() AFTER it was added to the
   watched set.

Correct.

   Handshake::
 On connecting, git sends hello magic.  magic is an opaque
 string that identifies the state of the index.  The watcher MUST
 respond with hello previous_magic, where previous_magic is
 obtained from the bye command (below).  If magic !=
 previous_magic, the watcher MUST reset state for this repository.

In addition, git must reset itself (i.e. clear all CE_WATCHED flags)
because nothing can be trusted at this point.

...

 Did I get that approximately straight?  Perhaps you can fix up as needed
 and then turn it into the documentation for the protocol.

Will do (and probably steal some of your text above).

 There are several points about this that I don't like:

 * What does the datagram socket buy us?  You already do a lot of
   pkt-line work for the really big messages, so why not just use
   pkt-line everywhere and a streaming socket?

With datagram sockets I did not have to maintain the connected
sockets, which made it somewhat simpler to handle so far.

The default SO_SNDBUF goes up to 212k, so we can send up to that
amount without blocking. With current pkt-line we send up to 64k in
watch message then we have to wait for fine, which results in more
context switches. But I think we can extend pkt-line's length field to
5 hex digit to cover this.

Streaming sockets are probably the way to go for per-user daemon, so
we can identify a socket with a repo.

 * The handshake should have room for capabilities, so that we can extend
   the protocol.

Yeah. One more point for streaming sockets. With datagram sockets it's
harder to define a session and thus hard to add capabilities.

 * The hello/hello handshake is confusing, and probably would allow you
   to point two watchers at each other without them noticing.  Make it
   hello/ready, or some other unambiguous choice.

OK

 * I took some liberty and tried to fully define the transitions between
   the sets.  So even though I'm not sure this is currently handled, I
   made it well-defined to issue watch for a path that is in the
   changed set.

Yes that should avoid races. The path can be removed from watched
set later after git acknowledges it.

 * bye is confusing, because in practice git issues forgets after
   bye.  The best I can come up with is setstate, I'm sure you have
   better ideas.

Originally it was forget, forget, forget then bye. But with
that order, if git crashes before sending bye we could lose info in
changed set so the order was changed but I did not update the
command name.

 There's also the problem of ordering guarantees between the socket and
 inotify.  I haven't found any, so I would conservatively assume that the
 socket messages may in fact arrive before inotify, which is a race in
 the current code.  E.g., in the sequence 'touch foo; git status' the
 daemon may see

   socketinotify
hello...
status
new empty list
 touch foo

 I think a clever way to handle this would be to add a new command:

   Wait::
 This command serves synchronization.  Git creates a file of its
 choice in $GIT_DIR/watch (say, `.git/watch/wait.random`).  Then it
 sends wait path.  The watcher MUST block until it has processed
 all change notifications up to and including path.

So wait.random inotify event functions as a barrier. Nice.

 As far as inotify corner-cases go, the only one I'm aware of is
 directory renames.  I suspect we'll have to watch directories all the
 way up to the repository root to reliably detect when this happens.  Not
 sure how to best handle this.  Perhaps we should declare Git completely
 agnostic wrt such issues, and behind the scenes issue all watches up to
 the root even if we don't need 

[PATCH/WIP v2 00/14] inotify support

2014-01-17 Thread Nguyễn Thái Ngọc Duy
This is getting in better shape. Still wondering if the design is
right, so documentation, tests and some error cases are still
neglected. I have not addressed Jonathan's and Jeff's comments in this
reroll, but I haven't forgotten them yet. The test suite seems to be
fine when file-watcher is forced on with GIT_TEST_FORCE_WATCHER set..

Thomas, you were a proponent of per-user daemon last time. I agree
that is a better solution when you need to support submodules. So if
you have time, have a look and see if anything I did may prevent
per-user daemon changes later (hint, I have a few unfriendly exit() in
file-watcher.c). You also worked with inotify before maybe you can
help spot some mishandling too as I'm totally new to inotify.

Nguyễn Thái Ngọc Duy (14):
  read-cache: save trailing sha-1
  read-cache: new extension to mark what file is watched
  read-cache: connect to file watcher
  read-cache: ask file watcher to watch files
  read-cache: put some limits on file watching
  read-cache: get modified file list from file watcher
  read-cache: add config to start file watcher automatically
  read-cache: add GIT_TEST_FORCE_WATCHER for testing
  file-watcher: add --shutdown and --log options
  file-watcher: automatically quit
  file-watcher: support inotify
  file-watcher: exit when cwd is gone
  pkt-line.c: increase buffer size to 8192
  t1301: exclude sockets from file permission check

 .gitignore   |   1 +
 Documentation/config.txt |  14 ++
 Makefile |   2 +
 cache.h  |   8 +
 config.mak.uname |   1 +
 file-watcher-lib.c (new) | 109 +++
 file-watcher-lib.h (new) |   9 +
 file-watcher.c (new) | 483 +++
 git-compat-util.h|   3 +
 pkt-line.c   |   4 +-
 pkt-line.h   |   2 +
 read-cache.c | 338 -
 t/t1301-shared-repo.sh   |   3 +-
 trace.c  |   3 +-
 14 files changed, 969 insertions(+), 11 deletions(-)
 create mode 100644 file-watcher-lib.c
 create mode 100644 file-watcher-lib.h
 create mode 100644 file-watcher.c

-- 
1.8.5.1.208.g05b12ea

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html