Re: [PATCH v14 00/21] index-helper/watchman
Duy Nguyen gmail.com> writes: > > On Wed, Jul 13, 2016 at 11:59 PM, David Turner novalis.org> wrote: > > On 07/12/2016 02:24 PM, Duy Nguyen wrote: > >> > >> Just thinking out loud. I've been thinking about this more about this. > >> After the move from signal-based to unix socket for communication, we > >> probably are better off with a simpler design than the shm-alike one > >> we have now. > >> > >> What if we send everything over a socket or a pipe? Sending 500MB over > >> a unix socket takes 253ms, that's insignificant when operations on an > >> index that size usually take seconds. If we send everything over > >> socket/pipe, we can trust data integrity and don't have to verify, > >> even the trailing SHA-1 in shm file. > > > > > > I think it would be good to make index operations not take seconds. > > > > In general, we should not need to verify the trailing SHA-1 for shm data. > > So the index-helper verifies it when it loads it, but the git (e.g.) status > > should not need to verify. > > > > Also, if we have two git commands running at the same time, the index-helper > > can only serve one at a time; with shm, both can run at full speed. > > We still have an option to send a (shm, possibly) path to git to pick > up and skip verification. If we can exchange capabilities then sending > the index some way else is always possible. > > >> So, what I have in mind is this, at read index time, instead of open a > >> socket, we run a separate program and communicate via pipes. We can > >> exchange capabilities if needed, then the program sends the entire > >> current index, the list of updated files back (and/or the list of dirs > >> to invalidate). The design looks very much like a smudge/clean filter. > > > > > > This seems very complicated. Now git status talks to the separate program, > > which talks to the index-helper, which talks to watchman. That is a lot of > > steps! > > I was suggesting this because I think it would simplify things, not > complicate stuff further. Yes the separate program plays the role of > our unix client, if we keep the index-helper. But we don't have to. > > Do you remember Junio once suggested to put the index on tmpfs? That's > what I imagine in common, medium scale setups. We don't need an extra > daemon: > > 1) when git needs the index, the script looks at its tmpfs mount, if > found, pass the path back > 2) when git announces the index has been updated, the script reads the > index and saves it in tmpfs > 3) when git refreshes and asks for watchman support, the script simply > runs "watchman" command, post processes the output a bit and send the > file list to git > > Because there is no separate daemon in this case, we don't need > --kill, we don't need --autorun. We still need WAMA extension but it > can contain just an arbitrary clock string, this is completely opaque > to git. If we can get rid of the index-helper (with an example script > probably landed in contrib folder), that's a lot of less headache down > the road. > > For giant-scale repos, you probably want something more efficient than > a script like this. And the good thing is you have freedom to do > whatever you want. You can run one daemon per repo, you can run one > daemon per system... In some previous mail exchange with Dscho, it was > mentioned that something other than watchman may be desired. This > opens up that door without much headache from outside. > > > I think the daemon also has the advantage that it can reload the index as > > soon as it changes. This is not quite implemented, but it would be pretty > > easy to do. That would save a lot of time in the typical workflow. > > A script has the same advantage, that is if git notifies it (like we > do now). You can also do it using watchman trigger, which does not > need any special support from git. Taking a step back, git needs (at least) 3 things to update the index quickly, the old index, the list of potentially modified files and the list of directories with changes. When looked at this way, the question becomes how do we provide this data to git in the fastest, simplest, most compatible way. It makes sense to separate the code that git can call to abstract away some of the potentially platform dependent aspects of how we store and retrieve this data quickly. For example, we would like to use something other than Watchman to track changes as it isn’t well supported on Windows. It would be nice to simplify this for git as much as possible by eliminating the need for it to manage a daemon. No need for --autorun, --kill, poking sockets, etc - just run the program and get back the data. It also makes sense to allow git to negotiate for the data it needs. It may only need the index, it may need the list of updated files and if untrackedcache is turned on, it would need the list of directories to invalidate. The separate code should also be able to say which parts it can provide quickly and have git
Re: [PATCH v14 00/21] index-helper/watchman
Big typo.. On Thu, Jul 14, 2016 at 5:56 PM, Duy Nguyenwrote: > For giant-scale repos, you probably want something more efficient than > a script like this. And the good thing is you have freedom to do > whatever you want. You can run one daemon per repo, you can run one > daemon per system... In some previous mail exchange with Dscho, it was > mentioned that something other than watchman may be desired. This > opens up that door without much headache from outside. s/outside/our side/ -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 00/21] index-helper/watchman
On Wed, Jul 13, 2016 at 11:59 PM, David Turnerwrote: > On 07/12/2016 02:24 PM, Duy Nguyen wrote: >> >> Just thinking out loud. I've been thinking about this more about this. >> After the move from signal-based to unix socket for communication, we >> probably are better off with a simpler design than the shm-alike one >> we have now. >> >> What if we send everything over a socket or a pipe? Sending 500MB over >> a unix socket takes 253ms, that's insignificant when operations on an >> index that size usually take seconds. If we send everything over >> socket/pipe, we can trust data integrity and don't have to verify, >> even the trailing SHA-1 in shm file. > > > I think it would be good to make index operations not take seconds. > > In general, we should not need to verify the trailing SHA-1 for shm data. > So the index-helper verifies it when it loads it, but the git (e.g.) status > should not need to verify. > > Also, if we have two git commands running at the same time, the index-helper > can only serve one at a time; with shm, both can run at full speed. We still have an option to send a (shm, possibly) path to git to pick up and skip verification. If we can exchange capabilities then sending the index some way else is always possible. >> So, what I have in mind is this, at read index time, instead of open a >> socket, we run a separate program and communicate via pipes. We can >> exchange capabilities if needed, then the program sends the entire >> current index, the list of updated files back (and/or the list of dirs >> to invalidate). The design looks very much like a smudge/clean filter. > > > This seems very complicated. Now git status talks to the separate program, > which talks to the index-helper, which talks to watchman. That is a lot of > steps! I was suggesting this because I think it would simplify things, not complicate stuff further. Yes the separate program plays the role of our unix client, if we keep the index-helper. But we don't have to. Do you remember Junio once suggested to put the index on tmpfs? That's what I imagine in common, medium scale setups. We don't need an extra daemon: 1) when git needs the index, the script looks at its tmpfs mount, if found, pass the path back 2) when git announces the index has been updated, the script reads the index and saves it in tmpfs 3) when git refreshes and asks for watchman support, the script simply runs "watchman" command, post processes the output a bit and send the file list to git Because there is no separate daemon in this case, we don't need --kill, we don't need --autorun. We still need WAMA extension but it can contain just an arbitrary clock string, this is completely opaque to git. If we can get rid of the index-helper (with an example script probably landed in contrib folder), that's a lot of less headache down the road. For giant-scale repos, you probably want something more efficient than a script like this. And the good thing is you have freedom to do whatever you want. You can run one daemon per repo, you can run one daemon per system... In some previous mail exchange with Dscho, it was mentioned that something other than watchman may be desired. This opens up that door without much headache from outside. > I think the daemon also has the advantage that it can reload the index as > soon as it changes. This is not quite implemented, but it would be pretty > easy to do. That would save a lot of time in the typical workflow. A script has the same advantage, that is if git notifies it (like we do now). You can also do it using watchman trigger, which does not need any special support from git. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 00/21] index-helper/watchman
On 07/12/2016 02:24 PM, Duy Nguyen wrote: Just thinking out loud. I've been thinking about this more about this. After the move from signal-based to unix socket for communication, we probably are better off with a simpler design than the shm-alike one we have now. What if we send everything over a socket or a pipe? Sending 500MB over a unix socket takes 253ms, that's insignificant when operations on an index that size usually take seconds. If we send everything over socket/pipe, we can trust data integrity and don't have to verify, even the trailing SHA-1 in shm file. I think it would be good to make index operations not take seconds. In general, we should not need to verify the trailing SHA-1 for shm data. So the index-helper verifies it when it loads it, but the git (e.g.) status should not need to verify. Also, if we have two git commands running at the same time, the index-helper can only serve one at a time; with shm, both can run at full speed. So, what I have in mind is this, at read index time, instead of open a socket, we run a separate program and communicate via pipes. We can exchange capabilities if needed, then the program sends the entire current index, the list of updated files back (and/or the list of dirs to invalidate). The design looks very much like a smudge/clean filter. This seems very complicated. Now git status talks to the separate program, which talks to the index-helper, which talks to watchman. That is a lot of steps! I think we should wait until we heard from the Windows folks what the problems with the current solution are, and see what design they come up with. For people who don't want extra daemon, they can write a short script that saves indexes somewhere in tmpfs, and talk to watchman or something else. I haven't written this script, but I don't think it takes long to write one. Windows folks have total freedom to implement a daemon, a service or whatever and use this program as front end. How the service talks to this program is totally up to them. For people who want to centralize everything, they can have just one daemon and have the script to talk to this daemon. I can see that getting rid of file-based stuff simplifies some patches. We can still provide a daemon to do more advanced stuff (or to make it work out of the box). But it's not a hard requirement and we probably don't need to include one right now. And I think it makes it easier to test as well because we can just go with some fake file monitor service instead of real watchman. I think the daemon also has the advantage that it can reload the index as soon as it changes. This is not quite implemented, but it would be pretty easy to do. That would save a lot of time in the typical workflow. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 00/21] index-helper/watchman
Just thinking out loud. I've been thinking about this more about this. After the move from signal-based to unix socket for communication, we probably are better off with a simpler design than the shm-alike one we have now. What if we send everything over a socket or a pipe? Sending 500MB over a unix socket takes 253ms, that's insignificant when operations on an index that size usually take seconds. If we send everything over socket/pipe, we can trust data integrity and don't have to verify, even the trailing SHA-1 in shm file. So, what I have in mind is this, at read index time, instead of open a socket, we run a separate program and communicate via pipes. We can exchange capabilities if needed, then the program sends the entire current index, the list of updated files back (and/or the list of dirs to invalidate). The design looks very much like a smudge/clean filter. For people who don't want extra daemon, they can write a short script that saves indexes somewhere in tmpfs, and talk to watchman or something else. I haven't written this script, but I don't think it takes long to write one. Windows folks have total freedom to implement a daemon, a service or whatever and use this program as front end. How the service talks to this program is totally up to them. For people who want to centralize everything, they can have just one daemon and have the script to talk to this daemon. I can see that getting rid of file-based stuff simplifies some patches. We can still provide a daemon to do more advanced stuff (or to make it work out of the box). But it's not a hard requirement and we probably don't need to include one right now. And I think it makes it easier to test as well because we can just go with some fake file monitor service instead of real watchman. -- Duy On Sun, Jul 3, 2016 at 9:57 AM, David Turnerwrote: > This addresses comments on v13: > removed unnecessary no_mmap ifdef > add an ifdef in unix-socket > OS X fix for select() > test improvement > > Thanks to all for suggestions. > > David Turner (10): > pkt-line: add gentle version of packet_write > index-helper: log warnings > unpack-trees: preserve index extensions > watchman: add a config option to enable the extension > index-helper: kill mode > index-helper: don't run if already running > index-helper: autorun mode > index-helper: optionally automatically run > index-helper: indexhelper.exitAfter config > mailmap: use main email address for dturner > > Nguyễn Thái Ngọc Duy (11): > read-cache: allow to keep mmap'd memory after reading > unix-socket.c: add stub implementation when unix sockets are not > supported > index-helper: new daemon for caching index and related stuff > index-helper: add --strict > daemonize(): set a flag before exiting the main process > index-helper: add --detach > read-cache: add watchman 'WAMA' extension > watchman: support watchman to reduce index refresh cost > index-helper: use watchman to avoid refreshing index with lstat() > update-index: enable/disable watchman support > trace: measure where the time is spent in the index-heavy operations > > .gitignore | 2 + > .mailmap | 1 + > Documentation/config.txt | 12 + > Documentation/git-index-helper.txt | 86 + > Documentation/git-update-index.txt | 6 + > Documentation/technical/index-format.txt | 22 ++ > Makefile | 22 ++ > builtin/gc.c | 2 +- > builtin/update-index.c | 15 + > cache.h | 25 +- > command-list.txt | 1 + > config.c | 5 + > configure.ac | 8 + > contrib/completion/git-completion.bash | 1 + > daemon.c | 2 +- > diff-lib.c | 4 + > dir.c| 25 +- > dir.h| 6 + > environment.c| 2 + > git-compat-util.h| 1 + > index-helper.c | 469 +++ > name-hash.c | 2 + > pkt-line.c | 18 ++ > pkt-line.h | 2 + > preload-index.c | 2 + > read-cache.c | 531 > ++- > refs/files-backend.c | 2 + > setup.c | 4 +- > t/t1701-watchman-extension.sh| 37 +++ > t/t7063-status-untracked-cache.sh| 22 ++ > t/t7900-index-helper.sh | 79 + > t/test-lib-functions.sh | 4 + > test-dump-watchman.c | 16 + > unix-socket.h
Re: [PATCH v14 00/21] index-helper/watchman
David Turnerwrites: > This addresses comments on v13: > removed unnecessary no_mmap ifdef > add an ifdef in unix-socket > OS X fix for select() > test improvement Thanks. It seems that what used to be 809fd05a (read-cache.c: fix constness of verify_hdr(), 2016-06-26) is no longer included in the series, even though the tightening of the constness seems to still be OK (i.e. verify_hdr() function takes the parameter hdr and only uses it read-only). Do you want me to keep it, or (more likely) did I miss some discussion that reached the concensus that we do not want it? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 00/21] index-helper/watchman
Hi Dave, On Sun, 3 Jul 2016, Johannes Schindelin wrote: > On Sun, 3 Jul 2016, David Turner wrote: > > > This addresses comments on v13: > > removed unnecessary no_mmap ifdef > > add an ifdef in unix-socket > > OS X fix for select() > > test improvement > > Thanks. > > Would you mind re-sending 20 & 21, they seem to have gotten lost. Or is > there a public repository where I can simply fetch the branch? (I am not > really a fan of applying patches from my mailbox...) The patches arrived, thanks. For interested parties: I rebased the patches (adjusting for t/helper) and pushed the result to refs/tags/index-helper-v14-rebased at https://github.com/dscho/git. Thanks, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 00/21] index-helper/watchman
Hi Dave, On Sun, 3 Jul 2016, David Turner wrote: > This addresses comments on v13: > removed unnecessary no_mmap ifdef > add an ifdef in unix-socket > OS X fix for select() > test improvement Thanks. Would you mind re-sending 20 & 21, they seem to have gotten lost. Or is there a public repository where I can simply fetch the branch? (I am not really a fan of applying patches from my mailbox...) Thanks, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v14 00/21] index-helper/watchman
This addresses comments on v13: removed unnecessary no_mmap ifdef add an ifdef in unix-socket OS X fix for select() test improvement Thanks to all for suggestions. David Turner (10): pkt-line: add gentle version of packet_write index-helper: log warnings unpack-trees: preserve index extensions watchman: add a config option to enable the extension index-helper: kill mode index-helper: don't run if already running index-helper: autorun mode index-helper: optionally automatically run index-helper: indexhelper.exitAfter config mailmap: use main email address for dturner Nguyễn Thái Ngọc Duy (11): read-cache: allow to keep mmap'd memory after reading unix-socket.c: add stub implementation when unix sockets are not supported index-helper: new daemon for caching index and related stuff index-helper: add --strict daemonize(): set a flag before exiting the main process index-helper: add --detach read-cache: add watchman 'WAMA' extension watchman: support watchman to reduce index refresh cost index-helper: use watchman to avoid refreshing index with lstat() update-index: enable/disable watchman support trace: measure where the time is spent in the index-heavy operations .gitignore | 2 + .mailmap | 1 + Documentation/config.txt | 12 + Documentation/git-index-helper.txt | 86 + Documentation/git-update-index.txt | 6 + Documentation/technical/index-format.txt | 22 ++ Makefile | 22 ++ builtin/gc.c | 2 +- builtin/update-index.c | 15 + cache.h | 25 +- command-list.txt | 1 + config.c | 5 + configure.ac | 8 + contrib/completion/git-completion.bash | 1 + daemon.c | 2 +- diff-lib.c | 4 + dir.c| 25 +- dir.h| 6 + environment.c| 2 + git-compat-util.h| 1 + index-helper.c | 469 +++ name-hash.c | 2 + pkt-line.c | 18 ++ pkt-line.h | 2 + preload-index.c | 2 + read-cache.c | 531 ++- refs/files-backend.c | 2 + setup.c | 4 +- t/t1701-watchman-extension.sh| 37 +++ t/t7063-status-untracked-cache.sh| 22 ++ t/t7900-index-helper.sh | 79 + t/test-lib-functions.sh | 4 + test-dump-watchman.c | 16 + unix-socket.h| 18 ++ unpack-trees.c | 1 + watchman-support.c | 135 watchman-support.h | 7 + 37 files changed, 1578 insertions(+), 19 deletions(-) create mode 100644 Documentation/git-index-helper.txt create mode 100644 index-helper.c create mode 100755 t/t1701-watchman-extension.sh create mode 100755 t/t7900-index-helper.sh create mode 100644 test-dump-watchman.c create mode 100644 watchman-support.c create mode 100644 watchman-support.h -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html