Re: [PATCH/WIP v2 00/14] inotify support
On Mon, Jan 20, 2014 at 12:04 AM, Thomas Rast t...@thomasrast.ch wrote: There's also the problem of ordering guarantees between the socket and inotify. I haven't found any, so I would conservatively assume that the socket messages may in fact arrive before inotify, which is a race in the current code. E.g., in the sequence 'touch foo; git status' the daemon may see socketinotify hello... status new empty list touch foo I think a clever way to handle this would be to add a new command: Wait:: This command serves synchronization. Git creates a file of its choice in $GIT_DIR/watch (say, `.git/watch/wait.random`). Then it sends wait path. The watcher MUST block until it has processed all change notifications up to and including path. Assuming that the time between foo is touched and the time an event is put in the daemon's queue is reasonably small, would emptying the event queue at hello be enough? To my innocent eyes (at the kernel), it seems inotify handling happens immediately after an fs event, and it's uninterruptable (or at least not interruptable by another user space process, I don't think we need to care about true interrupts). If that's true, by the time the touch syscall is finished, the event is already sitting in the daemon's queue. The problem with wait.random is we need to tell the daemon to expect it. Otherwise if the daemon processes the wait.random even before wait is sent, it would try to wait for the (lost) wait.random event forever. An extension is git touch wait.random regularly. Or to keep a queue of processed wait.* events. Both look ugly imo. Another option is send expect wait.random first, wait for ack, touch wait.random, then send wait, which is too much. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/WIP v2 00/14] inotify support
Duy Nguyen pclo...@gmail.com writes: I think a clever way to handle this would be to add a new command: Wait:: This command serves synchronization. Git creates a file of its choice in $GIT_DIR/watch (say, `.git/watch/wait.random`). Then it sends wait path. The watcher MUST block until it has processed all change notifications up to and including path. So wait.random inotify event functions as a barrier. Nice. I forgot to specify a return for wait. Not sure you need one, though correctly handling the timeout (that you apply for all select()) may be somewhat tricky without it. Ok, that's probably a confused sum of rambles. Let me know if you can make any sense of it. Thank you for your input. Now I'm back to the white board (or paper). Don't go too far ;-) Thanks a lot for doing this! It's good that you picked it up, and I think your design strikes a good balance in the complexity of the protocol and the daemon's state. -- Thomas Rast t...@thomasrast.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/WIP v2 00/14] inotify support
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: read-cache: save trailing sha-1 read-cache: new extension to mark what file is watched read-cache: connect to file watcher read-cache: ask file watcher to watch files read-cache: put some limits on file watching read-cache: get modified file list from file watcher read-cache: add config to start file watcher automatically read-cache: add GIT_TEST_FORCE_WATCHER for testing file-watcher: add --shutdown and --log options file-watcher: automatically quit file-watcher: support inotify file-watcher: exit when cwd is gone pkt-line.c: increase buffer size to 8192 t1301: exclude sockets from file permission check I never got the last three patches, did you send them? Also, this doesn't cleanly apply anywhere at my end. Can you push it somewhere for easier experimentation? This is getting in better shape. Still wondering if the design is right, so documentation, tests and some error cases are still neglected. I have not addressed Jonathan's and Jeff's comments in this reroll, but I haven't forgotten them yet. The test suite seems to be fine when file-watcher is forced on with GIT_TEST_FORCE_WATCHER set.. I tried to figure out whether there were any corners you are painting it into, but doing so without a clear overview of the protocol makes my head spin. So here's what I gather from the patches (I would have started from the final result, but see above). The slow path, before the watcher is ready, works like: spawn watcher send hello $index_sha1 receive hello mark most files CE_WATCHED send watch $paths for each CE_WATCHED the paths are actually a pkt-line-encoded series of paths get back fine $n (n = number of watches processed) do work as usual (including lstat()) The fast path: load CE_WATCHED from index send hello $index_sha1 receive hello $index_sha1 send status get back new $path for each changed file again aggregate mark files that are new as ~CE_WATCHED files that are CE_WATCHED can skip their lstat(), since they are unchanged On index writing (both paths): save CE_WATCHED in index extension send bye $index_sha1 send forget $index_sha1 $path for every path that is known to be changed So I think the watcher protocol can be specified roughly as: Definitions: The watcher is a separate process that maintains a set of _watched paths_. Git uses the commands below to add or remove paths from this set. In addition it maintains a set of _changed paths_, which is a subset of the watched paths. Git occasionally queries the changed paths. If at any time after a path P was added to the watched set, P has or may have changed, it MUST be added to the changed set. Note that a path being unchanged under this definition does NOT mean that it is unchanged in the index as per `git diff`. It may have been changed before the watch was issued! Therefore, Git MUST test whether the path started out unchanged using lstat() AFTER it was added to the watched set. Handshake:: On connecting, git sends hello magic. magic is an opaque string that identifies the state of the index. The watcher MUST respond with hello previous_magic, where previous_magic is obtained from the bye command (below). If magic != previous_magic, the watcher MUST reset state for this repository. Watch:: Git sends watch pathlist where the pathlist consists of a series of 4-digit lengths and literal pathnames (as in the pkt-line format) for each path it is interested in. The watcher MUST respond with fine n after processing a message that contained n paths. The watcher MUST add each path to the watched set and remove it from the changed set (no error if it was already watched, or not changed). Status:: Git sends status. The watcher MUST respond with its current changed set. This uses the format new pathlist, with pathlist formatted as for the watch command. Bye:: Git sends bye magic to indicate it has finished writing the index. The watcher must store magic so as to use it as the previous_magic in a hello response. Forget:: Git sends forget pathlist, with the pathlist formatted as for the watch command. The watcher SHOULD remove each path in pathlist from its watched set. Did I get that approximately straight? Perhaps you can fix up as needed and then turn it into the documentation for the protocol. There are several points about this that I don't like: * What does the datagram socket buy us? You already do a lot of pkt-line work for the really big messages, so why not just use pkt-line everywhere and a streaming socket? * The handshake should have room for capabilities, so that we can extend the protocol. * The hello/hello handshake is confusing, and probably would allow you to point two watchers at each other without them noticing. Make it hello/ready, or
Re: [PATCH/WIP v2 00/14] inotify support
On Mon, Jan 20, 2014 at 12:04 AM, Thomas Rast t...@thomasrast.ch wrote: I never got the last three patches, did you send them? Also, this doesn't cleanly apply anywhere at my end. Can you push it somewhere for easier experimentation? Sorry I rebased it on top of kb/fast-hashmap but never got around to actually using hash tables. The code is here (and on top of master) https://github.com/pclouds/git.git file-watcher So I think the watcher protocol can be specified roughly as: Definitions: The watcher is a separate process that maintains a set of _watched paths_. Git uses the commands below to add or remove paths from this set. In addition it maintains a set of _changed paths_, which is a subset of the watched paths. Git occasionally queries the changed paths. If at any time after a path P was added to the watched set, P has or may have changed, it MUST be added to the changed set. Note that a path being unchanged under this definition does NOT mean that it is unchanged in the index as per `git diff`. It may have been changed before the watch was issued! Therefore, Git MUST test whether the path started out unchanged using lstat() AFTER it was added to the watched set. Correct. Handshake:: On connecting, git sends hello magic. magic is an opaque string that identifies the state of the index. The watcher MUST respond with hello previous_magic, where previous_magic is obtained from the bye command (below). If magic != previous_magic, the watcher MUST reset state for this repository. In addition, git must reset itself (i.e. clear all CE_WATCHED flags) because nothing can be trusted at this point. ... Did I get that approximately straight? Perhaps you can fix up as needed and then turn it into the documentation for the protocol. Will do (and probably steal some of your text above). There are several points about this that I don't like: * What does the datagram socket buy us? You already do a lot of pkt-line work for the really big messages, so why not just use pkt-line everywhere and a streaming socket? With datagram sockets I did not have to maintain the connected sockets, which made it somewhat simpler to handle so far. The default SO_SNDBUF goes up to 212k, so we can send up to that amount without blocking. With current pkt-line we send up to 64k in watch message then we have to wait for fine, which results in more context switches. But I think we can extend pkt-line's length field to 5 hex digit to cover this. Streaming sockets are probably the way to go for per-user daemon, so we can identify a socket with a repo. * The handshake should have room for capabilities, so that we can extend the protocol. Yeah. One more point for streaming sockets. With datagram sockets it's harder to define a session and thus hard to add capabilities. * The hello/hello handshake is confusing, and probably would allow you to point two watchers at each other without them noticing. Make it hello/ready, or some other unambiguous choice. OK * I took some liberty and tried to fully define the transitions between the sets. So even though I'm not sure this is currently handled, I made it well-defined to issue watch for a path that is in the changed set. Yes that should avoid races. The path can be removed from watched set later after git acknowledges it. * bye is confusing, because in practice git issues forgets after bye. The best I can come up with is setstate, I'm sure you have better ideas. Originally it was forget, forget, forget then bye. But with that order, if git crashes before sending bye we could lose info in changed set so the order was changed but I did not update the command name. There's also the problem of ordering guarantees between the socket and inotify. I haven't found any, so I would conservatively assume that the socket messages may in fact arrive before inotify, which is a race in the current code. E.g., in the sequence 'touch foo; git status' the daemon may see socketinotify hello... status new empty list touch foo I think a clever way to handle this would be to add a new command: Wait:: This command serves synchronization. Git creates a file of its choice in $GIT_DIR/watch (say, `.git/watch/wait.random`). Then it sends wait path. The watcher MUST block until it has processed all change notifications up to and including path. So wait.random inotify event functions as a barrier. Nice. As far as inotify corner-cases go, the only one I'm aware of is directory renames. I suspect we'll have to watch directories all the way up to the repository root to reliably detect when this happens. Not sure how to best handle this. Perhaps we should declare Git completely agnostic wrt such issues, and behind the scenes issue all watches up to the root even if we don't need
[PATCH/WIP v2 00/14] inotify support
This is getting in better shape. Still wondering if the design is right, so documentation, tests and some error cases are still neglected. I have not addressed Jonathan's and Jeff's comments in this reroll, but I haven't forgotten them yet. The test suite seems to be fine when file-watcher is forced on with GIT_TEST_FORCE_WATCHER set.. Thomas, you were a proponent of per-user daemon last time. I agree that is a better solution when you need to support submodules. So if you have time, have a look and see if anything I did may prevent per-user daemon changes later (hint, I have a few unfriendly exit() in file-watcher.c). You also worked with inotify before maybe you can help spot some mishandling too as I'm totally new to inotify. Nguyễn Thái Ngọc Duy (14): read-cache: save trailing sha-1 read-cache: new extension to mark what file is watched read-cache: connect to file watcher read-cache: ask file watcher to watch files read-cache: put some limits on file watching read-cache: get modified file list from file watcher read-cache: add config to start file watcher automatically read-cache: add GIT_TEST_FORCE_WATCHER for testing file-watcher: add --shutdown and --log options file-watcher: automatically quit file-watcher: support inotify file-watcher: exit when cwd is gone pkt-line.c: increase buffer size to 8192 t1301: exclude sockets from file permission check .gitignore | 1 + Documentation/config.txt | 14 ++ Makefile | 2 + cache.h | 8 + config.mak.uname | 1 + file-watcher-lib.c (new) | 109 +++ file-watcher-lib.h (new) | 9 + file-watcher.c (new) | 483 +++ git-compat-util.h| 3 + pkt-line.c | 4 +- pkt-line.h | 2 + read-cache.c | 338 - t/t1301-shared-repo.sh | 3 +- trace.c | 3 +- 14 files changed, 969 insertions(+), 11 deletions(-) create mode 100644 file-watcher-lib.c create mode 100644 file-watcher-lib.h create mode 100644 file-watcher.c -- 1.8.5.1.208.g05b12ea -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html