Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On 6/7/2017 5:46 PM, Ævar Arnfjörð Bjarmason wrote: On Wed, Jun 7, 2017 at 9:51 PM, Ben Peartwrote: On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote: I don't have time to update the perf test now or dig into it, but most of what you're describing in this mail doesn't at all match with the ad-hoc tests I ran in https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/ There (at the very end of the E-Mail) I'm running watchman in a tight loop while I flush the entire fs cache, its runtime is never longer than 600ms, with 3ms being the norm. I added a perf trace around the entire query-fsmonitor hook proc (patch below) to measure the total actual impact of running the hook script + querying watchman + parsing the output with perl + passing the result back to git. On my machine, the total cost of the hook runs between 130 ms and 180 ms when there are zero changes to report (ie best case). With short status times, the overhead of watchman simply outweighs any gains in performance - especially when you have a warm file system cache as that cancels out the biggest win of avoiding the IO associated with scanning the working directory. diff --git a/fsmonitor.c b/fsmonitor.c index 763a8a3a3f..cb47f31863 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate) * If we have a last update time, call query-monitor for the set of * changes since that time. */ - if (istate->fsmonitor_last_update) + if (istate->fsmonitor_last_update) { query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION, istate->fsmonitor_last_update, _result); + trace_performance_since(last_update, "query-fsmonitor"); + } if (query_success) { /* Mark all entries returned by the monitor as dirty */ I.e. flushing the cache doesn't slow things down much at all compared to how long a "git status" takes from cold cache. Something else must be going on, and the smoking gun is the gprof output I posted in the follow-up E-Mail: https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=qkyjfhktma3p...@mail.gmail.com/ There with the fsmonitor we end up calling blk_SHA1_Block ~100K times during "status", but IIRC (I don't have the output in front of me, this is from memory) something like twenty times without the fsmonitor. It can't be a coincidence that with the fscache: $ pwd; git ls-files | wc -l /home/avar/g/linux 59844 And you can see that in the fsmonitor "git status" we make exactly that many calls to cache_entry_from_ondisk(), but those calls don't show up at all in the non-fscache codepath. I don't see how the gprof numbers for the non-fsmonitor case can be correct. It appears they don't contain any calls related to loading the index while the fsmonitor gprof numbers do. Here is a typical call stack: git.exe!cache_entry_from_ondisk() git.exe!create_from_disk() git.exe!do_read_index() git.exe!read_index_from() git.exe!read_index() During read_index(), cache_entry_from_ondisk() gets called for every item in the index (which explains the 59K calls). How can the non-fsmonitor codepath not be loading the index? So, again, I haven't dug and really must step away from the computer now, but this really looks like the fscache saves us the recursive readdir() / lstat() etc, but in return we somehow fall though to a codepath where we re-read the entire on-disk state back into the index, which we don't do in the non-fscache codepath. I've run multiple profiles and compared them with fsmonitor on and off and have been unable to find any performance regression caused by fsmonitor (other than flagging the index as dirty at times when it isn't required which I have fixed for the next patch series). I have done many performance runs and when I subtract the _actual_ time spent in the hook from the overall command time, it comes in at slightly less time than when status is run with fsmonitor off. This also leads me to believe there is no regression with fsmonitor on. All this leads me back to my original conclusion: the reason status is slower in these specific cases is because the overhead of calling the hook exceeds the savings gained. If your status calls are taking less than a second, it just doesn't make sense to add the complexity and overhead of calling a file system watcher. I'm working on an updated perf test that will demonstrate the best case scenario (warm watchman, cold file system cache) in addition to the worst case (cold watchman, warm file system cache). The reality is that in normal use cases, perf will be between the two. I'll add that to the next iteration of the patch series. I'll try to dig further once we have the next submission + that perf test. On Linux the time spent calling the hook itself is minimal: $ touch foo; time
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On Wed, Jun 7, 2017 at 9:51 PM, Ben Peartwrote: > > > On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote: >> >> >> I don't have time to update the perf test now or dig into it, but most >> of what you're describing in this mail doesn't at all match with the >> ad-hoc tests I ran in >> >> https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/ >> >> There (at the very end of the E-Mail) I'm running watchman in a tight >> loop while I flush the entire fs cache, its runtime is never longer >> than 600ms, with 3ms being the norm. > > > I added a perf trace around the entire query-fsmonitor hook proc (patch > below) to measure the total actual impact of running the hook script + > querying watchman + parsing the output with perl + passing the result back > to git. On my machine, the total cost of the hook runs between 130 ms and > 180 ms when there are zero changes to report (ie best case). > > With short status times, the overhead of watchman simply outweighs any gains > in performance - especially when you have a warm file system cache as that > cancels out the biggest win of avoiding the IO associated with scanning the > working directory. > > > diff --git a/fsmonitor.c b/fsmonitor.c > index 763a8a3a3f..cb47f31863 100644 > --- a/fsmonitor.c > +++ b/fsmonitor.c > @@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate) > * If we have a last update time, call query-monitor for the set of > * changes since that time. > */ > - if (istate->fsmonitor_last_update) > + if (istate->fsmonitor_last_update) { > query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION, > istate->fsmonitor_last_update, _result); > + trace_performance_since(last_update, "query-fsmonitor"); > + } > > if (query_success) { > /* Mark all entries returned by the monitor as dirty */ > > > >> >> I.e. flushing the cache doesn't slow things down much at all compared >> to how long a "git status" takes from cold cache. Something else must >> be going on, and the smoking gun is the gprof output I posted in the >> follow-up E-Mail: >> >> https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=qkyjfhktma3p...@mail.gmail.com/ >> >> There with the fsmonitor we end up calling blk_SHA1_Block ~100K times >> during "status", but IIRC (I don't have the output in front of me, >> this is from memory) something like twenty times without the >> fsmonitor. >> >> It can't be a coincidence that with the fscache: >> >> $ pwd; git ls-files | wc -l >> /home/avar/g/linux >> 59844 >> >> And you can see that in the fsmonitor "git status" we make exactly >> that many calls to cache_entry_from_ondisk(), but those calls don't >> show up at all in the non-fscache codepath. >> > > I don't see how the gprof numbers for the non-fsmonitor case can be correct. > It appears they don't contain any calls related to loading the index while > the fsmonitor gprof numbers do. Here is a typical call stack: > > git.exe!cache_entry_from_ondisk() > git.exe!create_from_disk() > git.exe!do_read_index() > git.exe!read_index_from() > git.exe!read_index() > > During read_index(), cache_entry_from_ondisk() gets called for every item in > the index (which explains the 59K calls). How can the non-fsmonitor > codepath not be loading the index? > >> So, again, I haven't dug and really must step away from the computer >> now, but this really looks like the fscache saves us the recursive >> readdir() / lstat() etc, but in return we somehow fall though to a >> codepath where we re-read the entire on-disk state back into the >> index, which we don't do in the non-fscache codepath. >> > > I've run multiple profiles and compared them with fsmonitor on and off and > have been unable to find any performance regression caused by fsmonitor > (other than flagging the index as dirty at times when it isn't required > which I have fixed for the next patch series). > > I have done many performance runs and when I subtract the _actual_ time > spent in the hook from the overall command time, it comes in at slightly > less time than when status is run with fsmonitor off. This also leads me to > believe there is no regression with fsmonitor on. > > All this leads me back to my original conclusion: the reason status is > slower in these specific cases is because the overhead of calling the hook > exceeds the savings gained. If your status calls are taking less than a > second, it just doesn't make sense to add the complexity and overhead of > calling a file system watcher. > > I'm working on an updated perf test that will demonstrate the best case > scenario (warm watchman, cold file system cache) in addition to the worst > case (cold watchman, warm file system cache). The reality is that in normal > use cases, perf will be between the two. I'll add that to the next iteration > of the patch series. I'll try
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote: I don't have time to update the perf test now or dig into it, but most of what you're describing in this mail doesn't at all match with the ad-hoc tests I ran in https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/ There (at the very end of the E-Mail) I'm running watchman in a tight loop while I flush the entire fs cache, its runtime is never longer than 600ms, with 3ms being the norm. I added a perf trace around the entire query-fsmonitor hook proc (patch below) to measure the total actual impact of running the hook script + querying watchman + parsing the output with perl + passing the result back to git. On my machine, the total cost of the hook runs between 130 ms and 180 ms when there are zero changes to report (ie best case). With short status times, the overhead of watchman simply outweighs any gains in performance - especially when you have a warm file system cache as that cancels out the biggest win of avoiding the IO associated with scanning the working directory. diff --git a/fsmonitor.c b/fsmonitor.c index 763a8a3a3f..cb47f31863 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate) * If we have a last update time, call query-monitor for the set of * changes since that time. */ - if (istate->fsmonitor_last_update) + if (istate->fsmonitor_last_update) { query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION, istate->fsmonitor_last_update, _result); + trace_performance_since(last_update, "query-fsmonitor"); + } if (query_success) { /* Mark all entries returned by the monitor as dirty */ I.e. flushing the cache doesn't slow things down much at all compared to how long a "git status" takes from cold cache. Something else must be going on, and the smoking gun is the gprof output I posted in the follow-up E-Mail: https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=qkyjfhktma3p...@mail.gmail.com/ There with the fsmonitor we end up calling blk_SHA1_Block ~100K times during "status", but IIRC (I don't have the output in front of me, this is from memory) something like twenty times without the fsmonitor. It can't be a coincidence that with the fscache: $ pwd; git ls-files | wc -l /home/avar/g/linux 59844 And you can see that in the fsmonitor "git status" we make exactly that many calls to cache_entry_from_ondisk(), but those calls don't show up at all in the non-fscache codepath. I don't see how the gprof numbers for the non-fsmonitor case can be correct. It appears they don't contain any calls related to loading the index while the fsmonitor gprof numbers do. Here is a typical call stack: git.exe!cache_entry_from_ondisk() git.exe!create_from_disk() git.exe!do_read_index() git.exe!read_index_from() git.exe!read_index() During read_index(), cache_entry_from_ondisk() gets called for every item in the index (which explains the 59K calls). How can the non-fsmonitor codepath not be loading the index? So, again, I haven't dug and really must step away from the computer now, but this really looks like the fscache saves us the recursive readdir() / lstat() etc, but in return we somehow fall though to a codepath where we re-read the entire on-disk state back into the index, which we don't do in the non-fscache codepath. I've run multiple profiles and compared them with fsmonitor on and off and have been unable to find any performance regression caused by fsmonitor (other than flagging the index as dirty at times when it isn't required which I have fixed for the next patch series). I have done many performance runs and when I subtract the _actual_ time spent in the hook from the overall command time, it comes in at slightly less time than when status is run with fsmonitor off. This also leads me to believe there is no regression with fsmonitor on. All this leads me back to my original conclusion: the reason status is slower in these specific cases is because the overhead of calling the hook exceeds the savings gained. If your status calls are taking less than a second, it just doesn't make sense to add the complexity and overhead of calling a file system watcher. I'm working on an updated perf test that will demonstrate the best case scenario (warm watchman, cold file system cache) in addition to the worst case (cold watchman, warm file system cache). The reality is that in normal use cases, perf will be between the two. I'll add that to the next iteration of the patch series.
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On 6/2/2017 5:44 PM, David Turner wrote: BTW, a medium-sized (~250k files across 40k dirs) synthetic repo is available over bittorrent at: http://bitmover.com/2015-04-03-1M-git-bare.tar.bz2.torrent I tried Ævar's perf test with that (on a beefy laptop with SSD), and got significantly slower results with bp/fsmonitor: Test origin/master bp/fsmonitor --- 7519.2: status (first)0.32(0.23+0.39) 0.32(0.26+0.36) +0.0% 7519.3: status (subsequent) 0.18(0.14+0.34) 0.32(0.24+0.37) +77.8% 7519.4: status -uno 0.11(0.08+0.32) 0.24(0.18+0.34) +118.2% 7519.5: status -uall 0.49(0.22+0.56) 0.62(0.36+0.55) +26.5% 7519.2: status (first)0.32(0.23+0.39) 0.32(0.26+0.36) +0.0% 7519.3: status (subsequent) 0.18(0.14+0.34) 0.32(0.24+0.37) +77.8% 7519.4: status -uno 0.11(0.08+0.32) 0.24(0.18+0.34) +118.2% 7519.5: status -uall 0.49(0.22+0.56) 0.62(0.36+0.55) +26.5% I have not yet looked into why this is. I was very focused on getting minute long status calls down to seconds and multiple seconds down to sub-second. The greatest benefits are when the file system cache doesn't already have all the file information cached and the current perf test doesn't test that case - just the warm cache test which has the least benefit. That said, status times shouldn't be getting worse and this has highlighted that they are. I've found one reason (the current patch series always flags the index as dirty so it gets written out every time). I've got a fix that only flags it dirty when the extension is turned on or off or when it actually finds an entry that has become dirty. This helps but there is more going on than that. I'm looking into why the minimum status time with fsmonitor turned on and a warm cache seems to be ~30ms. More to come... -Original Message- From: Ævar Arnfjörð Bjarmason [mailto:ava...@gmail.com] Sent: Friday, June 2, 2017 6:29 AM To: git@vger.kernel.org Cc: Junio C Hamano; Ben Peart ; Nguyễn Thái Ngọc Duy ; Johannes Schindelin ; David Turner ; Jeff King ; Christian Couder ; Ævar Arnfjörð Bjarmason Subject: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor Add a performance test for the new core.fsmonitor facility using the sample query-fsmonitor hook. This is WIP code for the reasons explained in the setup comments, unfortunately the perf code doesn't easily allow you to run different setup code for different versions you're testing. This test will stop working if the fsmonitor is merged into the master branch. Output against linxu.git: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519- fsmonitor.sh [...] Test origin/master avar/fsmonitor --- 7519.2: status (first)0.08(0.04+0.09) 0.12(0.07+0.10) +50.0% 7519.3: status (subsequent) 0.08(0.04+0.09) 0.12(0.06+0.11) +50.0% 7519.4: status -uno 0.02(0.02+0.05) 0.06(0.05+0.06) +200.0% 7519.5: status -uall 0.08(0.06+0.07) 0.12(0.07+0.10) +50.0% And against a larger in-house monorepo I have here, with the same options (except the repo path): Test origin/master avar/fsmonitor --- 7519.2: status (first)0.20(0.11+0.18) 0.27(0.15+0.21) +35.0% 7519.3: status (subsequent) 0.20(0.11+0.18) 0.27(0.15+0.21) +35.0% 7519.4: status -uno 0.04(0.03+0.10) 0.22(0.08+0.12) +450.0% 7519.5: status -uall 0.20(0.13+0.16) 0.27(0.18+0.19) +35.0% Against linux.git with a hack to flush the FS cache (on Linux) before running the first 'git status', only running one test so the result isn't discarded as the slowest of N: $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null && make -j8' ./run origin/master avar/fsmonitor ./p7519-fsmonitor.sh [...] Test origin/master avar/fsmonitor 7519.2: status (first)0.30(0.18+0.10) 8.26(0.22+0.10) +2653.3% 7519.3: status (subsequent) 0.08(0.04+0.08) 0.81(0.09+0.07) +912.5% 7519.4: status -uno 0.02(0.01+0.06) 0.08(0.04+0.07) +300.0% 7519.5: status -uall 0.08(0.06+0.07) 0.15(0.08+0.09) +87.5% Now obviously due to 1 run that has a lot of noise, but I would expect that first invocation to be blindingly fast
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On Sun, Jun 04, 2017 at 09:46:19AM +0200, Ævar Arnfjörð Bjarmason wrote: > What I'm referring to is not a limitation of git (we'll always be able > to turn off core.fsmonitor), but a limitation of the perf framework. > There's no way to run a test like this: > > ./run master next -- p*.sh > > And have some information passed to the test to apply different > runtime options to the test depending on master or next, or be to test > master twice, once with the fsmonitor, once without, which this > hypothetical feature would do: > > ./run master:"GIT_PERF_7519_NO_FSMONITOR=Y" master -- p*.sh Yeah, the perf framework was originally designed to find regressions between versions of Git. It's really bad at comparing results between any other dimensions. You can test different repository sizes by tweaking the environment, but it can't aggregate or compare results between those runs. Likewise, you can have two tests in a script which time Git with and without certain options set, but there's no way to compare the results of those tests. It would be nice if the perf framework was aware of all of these dimensions, stored each result as a tuple of all of the properties (rather than just the version), and then let you group the results along any dimension (I suspect there are cases where multi-dimensional summaries would be useful, but that could come later). For some dimensions you'd probably want support in the perf scripts themselves to run a test with two variants. E.g., something like: test_perf_group 'fsmonitor' \ 'false' 'git config core.fsmonitor false' \ 'true' 'git config core.fsmonitor true' test_perf 'status' 'git status' test_perf_group_end where that would run the "git status" test for each of the named setups, and store the timing results under "($VERSION, fsmonitor=false)", and so on. That's less flexible than specifying it to "./run" (which would let you run just one tuple if you chose). But it also relieves the burden from the user of figuring out which dimensions are interesting to tweak. -Peff
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On Sun, Jun 4, 2017 at 3:59 AM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> This is WIP code for the reasons explained in the setup comments, >> unfortunately the perf code doesn't easily allow you to run different >> setup code for different versions you're testing. This test will stop >> working if the fsmonitor is merged into the master branch. >> ... >> + >> + # Relies on core.fsmonitor not being merged into master. Needs >> + # better per-test ways to disable it if it gets merged. >> + git config core.fsmonitor true && > > Will stop working and relies on not merged can be read but I cannot > read "why" explained, and I cannot quite guess what the reason is. > > If the code to read the configuration is not there, setting this > would not have any effect. If the code is there, setting this would > have effect (either talking fsmonitor helps or it hurts). > > And I do not think we'd ever see a version of Git that always relies > on talking to fsmonitor, i.e. "git config core.fsmonitor false" is not > a way to disable it, so... > > Puzzled. Sorry about the unclear brevity. What I'm referring to is not a limitation of git (we'll always be able to turn off core.fsmonitor), but a limitation of the perf framework. There's no way to run a test like this: ./run master next -- p*.sh And have some information passed to the test to apply different runtime options to the test depending on master or next, or be to test master twice, once with the fsmonitor, once without, which this hypothetical feature would do: ./run master:"GIT_PERF_7519_NO_FSMONITOR=Y" master -- p*.sh So right now the test works because there's no core.fsmonitor in master, so turning it on all the time only impacts avar/fsmonitor, not master. I started trying to add this to the perf framework the other day but ran out of time, the option should also be passed down early enough to be intercepted by the GIT_PERF_MAKE_COMMAND, so you could do e.g.: GIT_PERF_MAKE_COMMAND='make CFLAGS="$F"' \ ./run v2.13.0:"F=-O0" v2.13.0:"F=-O1" v2.13.0:"F=-O2" v2.13.0:"F=-O3" -- p*.sh To test the same revision with different compilation flags. A change like this would break the ability to enact certain perf optimizations, right now we unpack the revision(s) you specify into , and e.g. make use of the fact that that directory is already unpacked so we don't need to unpack it again. If there was no way to pass a flag to specific revisions being tested, then perf could just optimize: ./run v2.12.0 v2.13.0 b06d364310 -- p*.sh To skip the b06d364310 run entirely since it's the same as v2.13.0. I think breaking this minor assumption in the framework is fine, but it's worth noting that it's an assumption we couldn't make anymore.
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
Ævar Arnfjörð Bjarmasonwrites: > This is WIP code for the reasons explained in the setup comments, > unfortunately the perf code doesn't easily allow you to run different > setup code for different versions you're testing. This test will stop > working if the fsmonitor is merged into the master branch. > ... > + > + # Relies on core.fsmonitor not being merged into master. Needs > + # better per-test ways to disable it if it gets merged. > + git config core.fsmonitor true && Will stop working and relies on not merged can be read but I cannot read "why" explained, and I cannot quite guess what the reason is. If the code to read the configuration is not there, setting this would not have any effect. If the code is there, setting this would have effect (either talking fsmonitor helps or it hurts). And I do not think we'd ever see a version of Git that always relies on talking to fsmonitor, i.e. "git config core.fsmonitor false" is not a way to disable it, so... Puzzled.
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On Fri, Jun 2, 2017 at 11:44 PM, David Turnerwrote: > BTW, a medium-sized (~250k files across 40k dirs) synthetic repo is available > over bittorrent at: > http://bitmover.com/2015-04-03-1M-git-bare.tar.bz2.torrent Can you seed this please? I've been trying to download that since yesterday, still at 0%.
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On Sat, Jun 3, 2017 at 12:05 AM, Ben Peartwrote: > > > On 6/2/2017 6:28 AM, Ævar Arnfjörð Bjarmason wrote: >> >> Add a performance test for the new core.fsmonitor facility using the >> sample query-fsmonitor hook. >> >> This is WIP code for the reasons explained in the setup comments, >> unfortunately the perf code doesn't easily allow you to run different >> setup code for different versions you're testing. This test will stop >> working if the fsmonitor is merged into the master branch. >> >> Output against linxu.git: >> >> $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux >> GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor >> ./p7519-fsmonitor.sh >> [...] >> Test origin/master avar/fsmonitor >> >> --- >> 7519.2: status (first)0.08(0.04+0.09) 0.12(0.07+0.10) >> +50.0% >> 7519.3: status (subsequent) 0.08(0.04+0.09) 0.12(0.06+0.11) >> +50.0% >> 7519.4: status -uno 0.02(0.02+0.05) 0.06(0.05+0.06) >> +200.0% >> 7519.5: status -uall 0.08(0.06+0.07) 0.12(0.07+0.10) >> +50.0% >> > > With regular status times this low, the overhead of calling the hook + > watchman + perl results in slower overall times as I noted in my initial > cover letter. If status calls are already this fast, fsmonitor + watchman > isn't needed and obviously doesn't help. > > This does highlight an optimization I could add. Even though -uno is > passed, the fsmonitor code currently still goes through the work of marking > the untracked entries as dirty. I'll look at optimizing that out to speed > status up when using that option. > >> And against a larger in-house monorepo I have here, with the same >> options (except the repo path): >> >> Test origin/master avar/fsmonitor >> >> --- >> 7519.2: status (first)0.20(0.11+0.18) 0.27(0.15+0.21) >> +35.0% >> 7519.3: status (subsequent) 0.20(0.11+0.18) 0.27(0.15+0.21) >> +35.0% >> 7519.4: status -uno 0.04(0.03+0.10) 0.22(0.08+0.12) >> +450.0% >> 7519.5: status -uall 0.20(0.13+0.16) 0.27(0.18+0.19) >> +35.0% >> >> Against linux.git with a hack to flush the FS cache (on Linux) before >> running the first 'git status', only running one test so the result >> isn't discarded as the slowest of N: >> > > I don't know know about on Linux but with Windows, when you flush the file > system cache via unmount/mount, it causes Watchman to do a full scan with > the next query. This has a significant negative performance impact on the > next status call as it returns the set of _all_ files which git then uses to > find and mark all index and untracked cache entries as fsmonitor dirty then > does a complete scan. This combination makes the first status after these > events slower than without Watchman. I'm not sure, but I don't think this is a thing at all on Linux. How inotify works is unrelated to whether some part of the filesystem happens to be cached, and the echo+proc command I have is just flushing all those pages. That I'm flushing all the pages at once should be as much of a non-event as one page being claimed from the fscache for use by the RAM allocation of some application. I suspect Windows somehow conflates its fs watching implementation with what on Linux is the inotify queue the application has to consume, if that queue isn't consumed watchman will need to re-scan. Or maybe my mental model of this whole thing is wrong and this also happens on Linux... > I'm currently working with the Watchman team to return a code indicating we > should just scan everything ourselves to avoid the extra overhead, but that > solution is TBD. Given it only happens the first time a query is done on a > new clone or the first time after watchman is started, I didn't intend to > hold up the patch series for it but will submit an enhanced version once a > solution is available. The enhanced version will then be the same > performance (for the first status call) as when not running with fsmonitor - > not faster - as the state is unknown so must be gathered from the working > directory. I don't have time to update the perf test now or dig into it, but most of what you're describing in this mail doesn't at all match with the ad-hoc tests I ran in https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/ There (at the very end of the E-Mail) I'm running watchman in a tight loop while I flush the entire fs cache, its runtime is never longer than 600ms, with 3ms being the norm. I.e. flushing the cache doesn't slow things down much at all compared to how long a "git status" takes from cold cache. Something else must be going on, and the smoking gun is the gprof output I posted in the follow-up E-Mail:
Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
On 6/2/2017 6:28 AM, Ævar Arnfjörð Bjarmason wrote: Add a performance test for the new core.fsmonitor facility using the sample query-fsmonitor hook. This is WIP code for the reasons explained in the setup comments, unfortunately the perf code doesn't easily allow you to run different setup code for different versions you're testing. This test will stop working if the fsmonitor is merged into the master branch. Output against linxu.git: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519-fsmonitor.sh [...] Test origin/master avar/fsmonitor --- 7519.2: status (first)0.08(0.04+0.09) 0.12(0.07+0.10) +50.0% 7519.3: status (subsequent) 0.08(0.04+0.09) 0.12(0.06+0.11) +50.0% 7519.4: status -uno 0.02(0.02+0.05) 0.06(0.05+0.06) +200.0% 7519.5: status -uall 0.08(0.06+0.07) 0.12(0.07+0.10) +50.0% With regular status times this low, the overhead of calling the hook + watchman + perl results in slower overall times as I noted in my initial cover letter. If status calls are already this fast, fsmonitor + watchman isn't needed and obviously doesn't help. This does highlight an optimization I could add. Even though -uno is passed, the fsmonitor code currently still goes through the work of marking the untracked entries as dirty. I'll look at optimizing that out to speed status up when using that option. And against a larger in-house monorepo I have here, with the same options (except the repo path): Test origin/master avar/fsmonitor --- 7519.2: status (first)0.20(0.11+0.18) 0.27(0.15+0.21) +35.0% 7519.3: status (subsequent) 0.20(0.11+0.18) 0.27(0.15+0.21) +35.0% 7519.4: status -uno 0.04(0.03+0.10) 0.22(0.08+0.12) +450.0% 7519.5: status -uall 0.20(0.13+0.16) 0.27(0.18+0.19) +35.0% Against linux.git with a hack to flush the FS cache (on Linux) before running the first 'git status', only running one test so the result isn't discarded as the slowest of N: I don't know know about on Linux but with Windows, when you flush the file system cache via unmount/mount, it causes Watchman to do a full scan with the next query. This has a significant negative performance impact on the next status call as it returns the set of _all_ files which git then uses to find and mark all index and untracked cache entries as fsmonitor dirty then does a complete scan. This combination makes the first status after these events slower than without Watchman. I'm currently working with the Watchman team to return a code indicating we should just scan everything ourselves to avoid the extra overhead, but that solution is TBD. Given it only happens the first time a query is done on a new clone or the first time after watchman is started, I didn't intend to hold up the patch series for it but will submit an enhanced version once a solution is available. The enhanced version will then be the same performance (for the first status call) as when not running with fsmonitor - not faster - as the state is unknown so must be gathered from the working directory. $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null && make -j8' ./run origin/master avar/fsmonitor ./p7519-fsmonitor.sh [...] Test origin/master avar/fsmonitor 7519.2: status (first)0.30(0.18+0.10) 8.26(0.22+0.10) +2653.3% 7519.3: status (subsequent) 0.08(0.04+0.08) 0.81(0.09+0.07) +912.5% 7519.4: status -uno 0.02(0.01+0.06) 0.08(0.04+0.07) +300.0% 7519.5: status -uall 0.08(0.06+0.07) 0.15(0.08+0.09) +87.5% Now obviously due to 1 run that has a lot of noise, but I would expect that first invocation to be blindingly fast since watchman has info on what files were modified since the cache was flushed. Every (first) run of the performance test will be very expensive for the reasons outlined above. This is clearest when you only have 1 run as it doesn't get masked by the faster 2nd+ runs. That first run will never be "blindingly fast" as git has to gather the initial state and save out the cache - it's all the subsequent calls that will be faster. The same on the large monorepo noted above: Test origin/master avar/fsmonitor --- 7519.2: status (first)0.59(0.28+0.24) 0.93(0.35+0.19) +57.6% 7519.3: status (subsequent) 0.20(0.10+0.19)
RE: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
BTW, a medium-sized (~250k files across 40k dirs) synthetic repo is available over bittorrent at: http://bitmover.com/2015-04-03-1M-git-bare.tar.bz2.torrent I tried Ævar's perf test with that (on a beefy laptop with SSD), and got significantly slower results with bp/fsmonitor: Test origin/master bp/fsmonitor --- 7519.2: status (first)0.32(0.23+0.39) 0.32(0.26+0.36) +0.0% 7519.3: status (subsequent) 0.18(0.14+0.34) 0.32(0.24+0.37) +77.8% 7519.4: status -uno 0.11(0.08+0.32) 0.24(0.18+0.34) +118.2% 7519.5: status -uall 0.49(0.22+0.56) 0.62(0.36+0.55) +26.5% 7519.2: status (first)0.32(0.23+0.39) 0.32(0.26+0.36) +0.0% 7519.3: status (subsequent) 0.18(0.14+0.34) 0.32(0.24+0.37) +77.8% 7519.4: status -uno 0.11(0.08+0.32) 0.24(0.18+0.34) +118.2% 7519.5: status -uall 0.49(0.22+0.56) 0.62(0.36+0.55) +26.5% I have not yet looked into why this is. > -Original Message- > From: Ævar Arnfjörð Bjarmason [mailto:ava...@gmail.com] > Sent: Friday, June 2, 2017 6:29 AM > To: git@vger.kernel.org > Cc: Junio C Hamano; Ben Peart > ; Nguyễn Thái Ngọc Duy ; > Johannes Schindelin ; David Turner > ; Jeff King ; Christian > Couder ; Ævar Arnfjörð Bjarmason > > Subject: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor > > Add a performance test for the new core.fsmonitor facility using the sample > query-fsmonitor hook. > > This is WIP code for the reasons explained in the setup comments, > unfortunately the perf code doesn't easily allow you to run different setup > code for different versions you're testing. This test will stop working if the > fsmonitor is merged into the master branch. > > Output against linxu.git: > > $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux > GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519- > fsmonitor.sh > [...] > Test origin/master avar/fsmonitor > --- > 7519.2: status (first)0.08(0.04+0.09) 0.12(0.07+0.10) +50.0% > 7519.3: status (subsequent) 0.08(0.04+0.09) 0.12(0.06+0.11) +50.0% > 7519.4: status -uno 0.02(0.02+0.05) 0.06(0.05+0.06) +200.0% > 7519.5: status -uall 0.08(0.06+0.07) 0.12(0.07+0.10) +50.0% > > And against a larger in-house monorepo I have here, with the same options > (except the repo path): > > Test origin/master avar/fsmonitor > --- > 7519.2: status (first)0.20(0.11+0.18) 0.27(0.15+0.21) +35.0% > 7519.3: status (subsequent) 0.20(0.11+0.18) 0.27(0.15+0.21) +35.0% > 7519.4: status -uno 0.04(0.03+0.10) 0.22(0.08+0.12) +450.0% > 7519.5: status -uall 0.20(0.13+0.16) 0.27(0.18+0.19) +35.0% > > Against linux.git with a hack to flush the FS cache (on Linux) before running > the first 'git status', only running one test so the result isn't discarded > as the > slowest of N: > > $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux > GIT_PERF_MAKE_COMMAND='sudo sync && echo 3 | sudo tee > /proc/sys/vm/drop_caches >/dev/null && make -j8' ./run origin/master > avar/fsmonitor ./p7519-fsmonitor.sh > [...] > Test origin/master avar/fsmonitor > > 7519.2: status (first)0.30(0.18+0.10) 8.26(0.22+0.10) +2653.3% > 7519.3: status (subsequent) 0.08(0.04+0.08) 0.81(0.09+0.07) +912.5% > 7519.4: status -uno 0.02(0.01+0.06) 0.08(0.04+0.07) +300.0% > 7519.5: status -uall 0.08(0.06+0.07) 0.15(0.08+0.09) +87.5% > > Now obviously due to 1 run that has a lot of noise, but I would expect that > first invocation to be blindingly fast since watchman has info on what files > were modified since the cache was flushed. > > The same on the large monorepo noted above: > > Test origin/master avar/fsmonitor > --- > 7519.2: status (first)0.59(0.28+0.24) 0.93(0.35+0.19) +57.6% > 7519.3: status (subsequent) 0.20(0.10+0.19) 0.28(0.16+0.20) +40.0% > 7519.4: status -uno 0.04(0.04+0.09) 0.11(0.08+0.12) +175.0% > 7519.5: status -uall 0.29(0.11+0.18) 0.40(0.16+0.19) +37.9% > > Signed-off-by: Ævar Arnfjörð Bjarmason > --- > > > On Fri, Jun 2, 2017 at 2:40 AM, Ben Peart wrote: > > Any chance you can provide me with a bash