Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-07 Thread Ben Peart



On 6/7/2017 5:46 PM, Ævar Arnfjörð Bjarmason wrote:

On Wed, Jun 7, 2017 at 9:51 PM, Ben Peart  wrote:



On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote:



I don't have time to update the perf test now or dig into it, but most
of what you're describing in this mail doesn't at all match with the
ad-hoc tests I ran in

https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/

There (at the very end of the E-Mail) I'm running watchman in a tight
loop while I flush the entire fs cache, its runtime is never longer
than 600ms, with 3ms being the norm.



I added a perf trace around the entire query-fsmonitor hook proc (patch
below) to measure the total actual impact of running the hook script +
querying watchman + parsing the output with perl + passing the result back
to git. On my machine, the total cost of the hook runs between 130 ms and
180 ms when there are zero changes to report (ie best case).

With short status times, the overhead of watchman simply outweighs any gains
in performance - especially when you have a warm file system cache as that
cancels out the biggest win of avoiding the IO associated with scanning the
working directory.


diff --git a/fsmonitor.c b/fsmonitor.c
index 763a8a3a3f..cb47f31863 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate)
  * If we have a last update time, call query-monitor for the set of
  * changes since that time.
  */
-   if (istate->fsmonitor_last_update)
+   if (istate->fsmonitor_last_update) {
 query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION,
 istate->fsmonitor_last_update, _result);
+   trace_performance_since(last_update, "query-fsmonitor");
+   }

 if (query_success) {
 /* Mark all entries returned by the monitor as dirty */





I.e. flushing the cache doesn't slow things down much at all compared
to how long a "git status" takes from cold cache. Something else must
be going on, and the smoking gun is the gprof output I posted in the
follow-up E-Mail:

https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=qkyjfhktma3p...@mail.gmail.com/

There with the fsmonitor we end up calling blk_SHA1_Block ~100K times
during "status", but IIRC (I don't have the output in front of me,
this is from memory) something like twenty times without the
fsmonitor.

It can't be a coincidence that with the fscache:

$ pwd; git ls-files | wc -l
/home/avar/g/linux
59844

And you can see that in the fsmonitor "git status" we make exactly
that many calls to cache_entry_from_ondisk(), but those calls don't
show up at all in the non-fscache codepath.



I don't see how the gprof numbers for the non-fsmonitor case can be correct.
It appears they don't contain any calls related to loading the index while
the fsmonitor gprof numbers do.  Here is a typical call stack:

git.exe!cache_entry_from_ondisk()
git.exe!create_from_disk()
git.exe!do_read_index()
git.exe!read_index_from()
git.exe!read_index()

During read_index(), cache_entry_from_ondisk() gets called for every item in
the index (which explains the 59K calls).  How can the non-fsmonitor
codepath not be loading the index?


So, again, I haven't dug and really must step away from the computer
now, but this really looks like the fscache saves us the recursive
readdir() / lstat() etc, but in return we somehow fall though to a
codepath where we re-read the entire on-disk state back into the
index, which we don't do in the non-fscache codepath.



I've run multiple profiles and compared them with fsmonitor on and off and
have been unable to find any performance regression caused by fsmonitor
(other than flagging the index as dirty at times when it isn't required
which I have fixed for the next patch series).

I have done many performance runs and when I subtract the _actual_ time
spent in the hook from the overall command time, it comes in at slightly
less time than when status is run with fsmonitor off.  This also leads me to
believe there is no regression with fsmonitor on.

All this leads me back to my original conclusion: the reason status is
slower in these specific cases is because the overhead of calling the hook
exceeds the savings gained. If your status calls are taking less than a
second, it just doesn't make sense to add the complexity and overhead of
calling a file system watcher.

I'm working on an updated perf test that will demonstrate the best case
scenario (warm watchman, cold file system cache) in addition to the worst
case (cold watchman, warm file system cache).  The reality is that in normal
use cases, perf will be between the two. I'll add that to the next iteration
of the patch series.


I'll try to dig further once we have the next submission + that perf test.

On Linux the time spent calling the hook itself is minimal:

 $ touch foo; time 

Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-07 Thread Ævar Arnfjörð Bjarmason
On Wed, Jun 7, 2017 at 9:51 PM, Ben Peart  wrote:
>
>
> On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>>
>> I don't have time to update the perf test now or dig into it, but most
>> of what you're describing in this mail doesn't at all match with the
>> ad-hoc tests I ran in
>>
>> https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/
>>
>> There (at the very end of the E-Mail) I'm running watchman in a tight
>> loop while I flush the entire fs cache, its runtime is never longer
>> than 600ms, with 3ms being the norm.
>
>
> I added a perf trace around the entire query-fsmonitor hook proc (patch
> below) to measure the total actual impact of running the hook script +
> querying watchman + parsing the output with perl + passing the result back
> to git. On my machine, the total cost of the hook runs between 130 ms and
> 180 ms when there are zero changes to report (ie best case).
>
> With short status times, the overhead of watchman simply outweighs any gains
> in performance - especially when you have a warm file system cache as that
> cancels out the biggest win of avoiding the IO associated with scanning the
> working directory.
>
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 763a8a3a3f..cb47f31863 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate)
>  * If we have a last update time, call query-monitor for the set of
>  * changes since that time.
>  */
> -   if (istate->fsmonitor_last_update)
> +   if (istate->fsmonitor_last_update) {
> query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION,
> istate->fsmonitor_last_update, _result);
> +   trace_performance_since(last_update, "query-fsmonitor");
> +   }
>
> if (query_success) {
> /* Mark all entries returned by the monitor as dirty */
>
>
>
>>
>> I.e. flushing the cache doesn't slow things down much at all compared
>> to how long a "git status" takes from cold cache. Something else must
>> be going on, and the smoking gun is the gprof output I posted in the
>> follow-up E-Mail:
>>
>> https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=qkyjfhktma3p...@mail.gmail.com/
>>
>> There with the fsmonitor we end up calling blk_SHA1_Block ~100K times
>> during "status", but IIRC (I don't have the output in front of me,
>> this is from memory) something like twenty times without the
>> fsmonitor.
>>
>> It can't be a coincidence that with the fscache:
>>
>> $ pwd; git ls-files | wc -l
>> /home/avar/g/linux
>> 59844
>>
>> And you can see that in the fsmonitor "git status" we make exactly
>> that many calls to cache_entry_from_ondisk(), but those calls don't
>> show up at all in the non-fscache codepath.
>>
>
> I don't see how the gprof numbers for the non-fsmonitor case can be correct.
> It appears they don't contain any calls related to loading the index while
> the fsmonitor gprof numbers do.  Here is a typical call stack:
>
> git.exe!cache_entry_from_ondisk()
> git.exe!create_from_disk()
> git.exe!do_read_index()
> git.exe!read_index_from()
> git.exe!read_index()
>
> During read_index(), cache_entry_from_ondisk() gets called for every item in
> the index (which explains the 59K calls).  How can the non-fsmonitor
> codepath not be loading the index?
>
>> So, again, I haven't dug and really must step away from the computer
>> now, but this really looks like the fscache saves us the recursive
>> readdir() / lstat() etc, but in return we somehow fall though to a
>> codepath where we re-read the entire on-disk state back into the
>> index, which we don't do in the non-fscache codepath.
>>
>
> I've run multiple profiles and compared them with fsmonitor on and off and
> have been unable to find any performance regression caused by fsmonitor
> (other than flagging the index as dirty at times when it isn't required
> which I have fixed for the next patch series).
>
> I have done many performance runs and when I subtract the _actual_ time
> spent in the hook from the overall command time, it comes in at slightly
> less time than when status is run with fsmonitor off.  This also leads me to
> believe there is no regression with fsmonitor on.
>
> All this leads me back to my original conclusion: the reason status is
> slower in these specific cases is because the overhead of calling the hook
> exceeds the savings gained. If your status calls are taking less than a
> second, it just doesn't make sense to add the complexity and overhead of
> calling a file system watcher.
>
> I'm working on an updated perf test that will demonstrate the best case
> scenario (warm watchman, cold file system cache) in addition to the worst
> case (cold watchman, warm file system cache).  The reality is that in normal
> use cases, perf will be between the two. I'll add that to the next iteration
> of the patch series.

I'll try 

Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-07 Thread Ben Peart



On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote:


I don't have time to update the perf test now or dig into it, but most
of what you're describing in this mail doesn't at all match with the
ad-hoc tests I ran in
https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/

There (at the very end of the E-Mail) I'm running watchman in a tight
loop while I flush the entire fs cache, its runtime is never longer
than 600ms, with 3ms being the norm.


I added a perf trace around the entire query-fsmonitor hook proc (patch 
below) to measure the total actual impact of running the hook script + 
querying watchman + parsing the output with perl + passing the result 
back to git. On my machine, the total cost of the hook runs between 130 
ms and 180 ms when there are zero changes to report (ie best case).


With short status times, the overhead of watchman simply outweighs any 
gains in performance - especially when you have a warm file system cache 
as that cancels out the biggest win of avoiding the IO associated with 
scanning the working directory.



diff --git a/fsmonitor.c b/fsmonitor.c
index 763a8a3a3f..cb47f31863 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate)
 * If we have a last update time, call query-monitor for the set of
 * changes since that time.
 */
-   if (istate->fsmonitor_last_update)
+   if (istate->fsmonitor_last_update) {
query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION,
istate->fsmonitor_last_update, _result);
+   trace_performance_since(last_update, "query-fsmonitor");
+   }

if (query_success) {
/* Mark all entries returned by the monitor as dirty */





I.e. flushing the cache doesn't slow things down much at all compared
to how long a "git status" takes from cold cache. Something else must
be going on, and the smoking gun is the gprof output I posted in the
follow-up E-Mail:
https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=qkyjfhktma3p...@mail.gmail.com/

There with the fsmonitor we end up calling blk_SHA1_Block ~100K times
during "status", but IIRC (I don't have the output in front of me,
this is from memory) something like twenty times without the
fsmonitor.

It can't be a coincidence that with the fscache:

$ pwd; git ls-files | wc -l
/home/avar/g/linux
59844

And you can see that in the fsmonitor "git status" we make exactly
that many calls to cache_entry_from_ondisk(), but those calls don't
show up at all in the non-fscache codepath.



I don't see how the gprof numbers for the non-fsmonitor case can be 
correct.  It appears they don't contain any calls related to loading the 
index while the fsmonitor gprof numbers do.  Here is a typical call stack:


git.exe!cache_entry_from_ondisk()
git.exe!create_from_disk()
git.exe!do_read_index()
git.exe!read_index_from()
git.exe!read_index()

During read_index(), cache_entry_from_ondisk() gets called for every 
item in the index (which explains the 59K calls).  How can the 
non-fsmonitor codepath not be loading the index?



So, again, I haven't dug and really must step away from the computer
now, but this really looks like the fscache saves us the recursive
readdir() / lstat() etc, but in return we somehow fall though to a
codepath where we re-read the entire on-disk state back into the
index, which we don't do in the non-fscache codepath.



I've run multiple profiles and compared them with fsmonitor on and off 
and have been unable to find any performance regression caused by 
fsmonitor (other than flagging the index as dirty at times when it isn't 
required which I have fixed for the next patch series).


I have done many performance runs and when I subtract the _actual_ time 
spent in the hook from the overall command time, it comes in at slightly 
less time than when status is run with fsmonitor off.  This also leads 
me to believe there is no regression with fsmonitor on.


All this leads me back to my original conclusion: the reason status is 
slower in these specific cases is because the overhead of calling the 
hook exceeds the savings gained. If your status calls are taking less 
than a second, it just doesn't make sense to add the complexity and 
overhead of calling a file system watcher.


I'm working on an updated perf test that will demonstrate the best case 
scenario (warm watchman, cold file system cache) in addition to the 
worst case (cold watchman, warm file system cache).  The reality is that 
in normal use cases, perf will be between the two. I'll add that to the 
next iteration of the patch series.


Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-05 Thread Ben Peart



On 6/2/2017 5:44 PM, David Turner wrote:

BTW, a medium-sized (~250k files across 40k dirs) synthetic repo is available 
over bittorrent at:
http://bitmover.com/2015-04-03-1M-git-bare.tar.bz2.torrent

I tried Ævar's perf test with that (on a beefy laptop with SSD), and got 
significantly slower results with bp/fsmonitor:
Test  origin/master bp/fsmonitor
---
7519.2: status (first)0.32(0.23+0.39)   0.32(0.26+0.36) +0.0%
7519.3: status (subsequent)   0.18(0.14+0.34)   0.32(0.24+0.37) +77.8%
7519.4: status -uno   0.11(0.08+0.32)   0.24(0.18+0.34) +118.2%
7519.5: status -uall  0.49(0.22+0.56)   0.62(0.36+0.55) +26.5%
7519.2: status (first)0.32(0.23+0.39)   0.32(0.26+0.36) +0.0%
7519.3: status (subsequent)   0.18(0.14+0.34)   0.32(0.24+0.37) +77.8%
7519.4: status -uno   0.11(0.08+0.32)   0.24(0.18+0.34) +118.2%
7519.5: status -uall  0.49(0.22+0.56)   0.62(0.36+0.55) +26.5%

I have not yet looked into why this is.



I was very focused on getting minute long status calls down to seconds 
and multiple seconds down to sub-second.  The greatest benefits are when 
the file system cache doesn't already have all the file information 
cached and the current perf test doesn't test that case - just the warm 
cache test which has the least benefit.


That said, status times shouldn't be getting worse and this has 
highlighted that they are.  I've found one reason (the current patch 
series always flags the index as dirty so it gets written out every 
time). I've got a fix that only flags it dirty when the extension is 
turned on or off or when it actually finds an entry that has become 
dirty.  This helps but there is more going on than that.


I'm looking into why the minimum status time with fsmonitor turned on 
and a warm cache seems to be ~30ms.  More to come...



-Original Message-
From: Ævar Arnfjörð Bjarmason [mailto:ava...@gmail.com]
Sent: Friday, June 2, 2017 6:29 AM
To: git@vger.kernel.org
Cc: Junio C Hamano ; Ben Peart
; Nguyễn Thái Ngọc Duy ;
Johannes Schindelin ; David Turner
; Jeff King ; Christian
Couder ; Ævar Arnfjörð Bjarmason

Subject: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

Add a performance test for the new core.fsmonitor facility using the sample
query-fsmonitor hook.

This is WIP code for the reasons explained in the setup comments,
unfortunately the perf code doesn't easily allow you to run different setup
code for different versions you're testing. This test will stop working if the
fsmonitor is merged into the master branch.

Output against linxu.git:

 $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux
GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519-
fsmonitor.sh
 [...]
 Test  origin/master avar/fsmonitor
 ---
 7519.2: status (first)0.08(0.04+0.09)   0.12(0.07+0.10) +50.0%
 7519.3: status (subsequent)   0.08(0.04+0.09)   0.12(0.06+0.11) +50.0%
 7519.4: status -uno   0.02(0.02+0.05)   0.06(0.05+0.06) +200.0%
 7519.5: status -uall  0.08(0.06+0.07)   0.12(0.07+0.10) +50.0%

And against a larger in-house monorepo I have here, with the same options
(except the repo path):

 Test  origin/master avar/fsmonitor
 ---
 7519.2: status (first)0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
 7519.3: status (subsequent)   0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
 7519.4: status -uno   0.04(0.03+0.10)   0.22(0.08+0.12) +450.0%
 7519.5: status -uall  0.20(0.13+0.16)   0.27(0.18+0.19) +35.0%

Against linux.git with a hack to flush the FS cache (on Linux) before running
the first 'git status', only running one test so the result isn't discarded as 
the
slowest of N:

 $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux
GIT_PERF_MAKE_COMMAND='sudo sync && echo 3 | sudo tee
/proc/sys/vm/drop_caches >/dev/null && make -j8' ./run origin/master
avar/fsmonitor ./p7519-fsmonitor.sh
 [...]
 Test  origin/master avar/fsmonitor
 
 7519.2: status (first)0.30(0.18+0.10)   8.26(0.22+0.10) +2653.3%
 7519.3: status (subsequent)   0.08(0.04+0.08)   0.81(0.09+0.07) +912.5%
 7519.4: status -uno   0.02(0.01+0.06)   0.08(0.04+0.07) +300.0%
 7519.5: status -uall  0.08(0.06+0.07)   0.15(0.08+0.09) +87.5%

Now obviously due to 1 run that has a lot of noise, but I would expect that
first invocation to be blindingly fast 

Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-04 Thread Jeff King
On Sun, Jun 04, 2017 at 09:46:19AM +0200, Ævar Arnfjörð Bjarmason wrote:

> What I'm referring to is not a limitation of git (we'll always be able
> to turn off core.fsmonitor), but a limitation of the perf framework.
> There's no way to run a test like this:
> 
> ./run master next -- p*.sh
> 
> And have some information passed to the test to apply different
> runtime options to the test depending on master or next, or be to test
> master twice, once with the fsmonitor, once without, which this
> hypothetical feature would do:
> 
> ./run master:"GIT_PERF_7519_NO_FSMONITOR=Y" master -- p*.sh

Yeah, the perf framework was originally designed to find regressions
between versions of Git. It's really bad at comparing results between
any other dimensions. You can test different repository sizes by
tweaking the environment, but it can't aggregate or compare results
between those runs. Likewise, you can have two tests in a script which
time Git with and without certain options set, but there's no way to
compare the results of those tests.

It would be nice if the perf framework was aware of all of
these dimensions, stored each result as a tuple of all of the properties
(rather than just the version), and then let you group the results along
any dimension (I suspect there are cases where multi-dimensional
summaries would be useful, but that could come later).

For some dimensions you'd probably want support in the perf scripts
themselves to run a test with two variants. E.g., something like:

  test_perf_group 'fsmonitor' \
'false' 'git config core.fsmonitor false' \
 'true' 'git config core.fsmonitor true'
  test_perf 'status' 'git status'
  test_perf_group_end

where that would run the "git status" test for each of the named setups,
and store the timing results under "($VERSION, fsmonitor=false)", and so
on.

That's less flexible than specifying it to "./run" (which would let you
run just one tuple if you chose). But it also relieves the burden from
the user of figuring out which dimensions are interesting to tweak.

-Peff


Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-04 Thread Ævar Arnfjörð Bjarmason
On Sun, Jun 4, 2017 at 3:59 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> This is WIP code for the reasons explained in the setup comments,
>> unfortunately the perf code doesn't easily allow you to run different
>> setup code for different versions you're testing. This test will stop
>> working if the fsmonitor is merged into the master branch.
>> ...
>> +
>> + # Relies on core.fsmonitor not being merged into master. Needs
>> + # better per-test ways to disable it if it gets merged.
>> + git config core.fsmonitor true &&
>
> Will stop working and relies on not merged can be read but I cannot
> read "why" explained, and I cannot quite guess what the reason is.
>
> If the code to read the configuration is not there, setting this
> would not have any effect.  If the code is there, setting this would
> have effect (either talking fsmonitor helps or it hurts).
>
> And I do not think we'd ever see a version of Git that always relies
> on talking to fsmonitor, i.e. "git config core.fsmonitor false" is not
> a way to disable it, so...
>
> Puzzled.

Sorry about the unclear brevity.

What I'm referring to is not a limitation of git (we'll always be able
to turn off core.fsmonitor), but a limitation of the perf framework.
There's no way to run a test like this:

./run master next -- p*.sh

And have some information passed to the test to apply different
runtime options to the test depending on master or next, or be to test
master twice, once with the fsmonitor, once without, which this
hypothetical feature would do:

./run master:"GIT_PERF_7519_NO_FSMONITOR=Y" master -- p*.sh

So right now the test works because there's no core.fsmonitor in
master, so turning it on all the time only impacts avar/fsmonitor, not
master.

I started trying to add this to the perf framework the other day but
ran out of time, the option should also be passed down early enough to
be intercepted by the GIT_PERF_MAKE_COMMAND, so you could do e.g.:

GIT_PERF_MAKE_COMMAND='make CFLAGS="$F"' \
./run v2.13.0:"F=-O0" v2.13.0:"F=-O1" v2.13.0:"F=-O2"
v2.13.0:"F=-O3" -- p*.sh

To test the same revision with different compilation flags.

A change like this would break the ability to enact certain perf
optimizations, right now we unpack the revision(s) you specify into
, and e.g. make use of the fact that that directory
is already unpacked so we don't need to unpack it again.

If there was no way to pass a flag to specific revisions being tested,
then perf could just optimize:

./run v2.12.0 v2.13.0 b06d364310 -- p*.sh

To skip the b06d364310 run entirely since it's the same as v2.13.0. I
think breaking this minor assumption in the framework is fine, but
it's worth noting that it's an assumption we couldn't make anymore.


Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-03 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> This is WIP code for the reasons explained in the setup comments,
> unfortunately the perf code doesn't easily allow you to run different
> setup code for different versions you're testing. This test will stop
> working if the fsmonitor is merged into the master branch.
> ...
> +
> + # Relies on core.fsmonitor not being merged into master. Needs
> + # better per-test ways to disable it if it gets merged.
> + git config core.fsmonitor true &&

Will stop working and relies on not merged can be read but I cannot
read "why" explained, and I cannot quite guess what the reason is.

If the code to read the configuration is not there, setting this
would not have any effect.  If the code is there, setting this would
have effect (either talking fsmonitor helps or it hurts).

And I do not think we'd ever see a version of Git that always relies
on talking to fsmonitor, i.e. "git config core.fsmonitor false" is not
a way to disable it, so...

Puzzled.




Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-03 Thread Ævar Arnfjörð Bjarmason
On Fri, Jun 2, 2017 at 11:44 PM, David Turner  wrote:
> BTW, a medium-sized (~250k files across 40k dirs) synthetic repo is available 
> over bittorrent at:
> http://bitmover.com/2015-04-03-1M-git-bare.tar.bz2.torrent

Can you seed this please? I've been trying to download that since
yesterday, still at 0%.


Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-02 Thread Ævar Arnfjörð Bjarmason
On Sat, Jun 3, 2017 at 12:05 AM, Ben Peart  wrote:
>
>
> On 6/2/2017 6:28 AM, Ævar Arnfjörð Bjarmason wrote:
>>
>> Add a performance test for the new core.fsmonitor facility using the
>> sample query-fsmonitor hook.
>>
>> This is WIP code for the reasons explained in the setup comments,
>> unfortunately the perf code doesn't easily allow you to run different
>> setup code for different versions you're testing. This test will stop
>> working if the fsmonitor is merged into the master branch.
>>
>> Output against linxu.git:
>>
>>  $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux
>> GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor
>> ./p7519-fsmonitor.sh
>>  [...]
>>  Test  origin/master avar/fsmonitor
>>
>> ---
>>  7519.2: status (first)0.08(0.04+0.09)   0.12(0.07+0.10)
>> +50.0%
>>  7519.3: status (subsequent)   0.08(0.04+0.09)   0.12(0.06+0.11)
>> +50.0%
>>  7519.4: status -uno   0.02(0.02+0.05)   0.06(0.05+0.06)
>> +200.0%
>>  7519.5: status -uall  0.08(0.06+0.07)   0.12(0.07+0.10)
>> +50.0%
>>
>
> With regular status times this low, the overhead of calling the hook +
> watchman + perl results in slower overall times as I noted in my initial
> cover letter.  If status calls are already this fast, fsmonitor + watchman
> isn't needed and obviously doesn't help.
>
> This does highlight an optimization I could add.  Even though -uno is
> passed, the fsmonitor code currently still goes through the work of marking
> the untracked entries as dirty.  I'll look at optimizing that out to speed
> status up when using that option.
>
>> And against a larger in-house monorepo I have here, with the same
>> options (except the repo path):
>>
>>  Test  origin/master avar/fsmonitor
>>
>> ---
>>  7519.2: status (first)0.20(0.11+0.18)   0.27(0.15+0.21)
>> +35.0%
>>  7519.3: status (subsequent)   0.20(0.11+0.18)   0.27(0.15+0.21)
>> +35.0%
>>  7519.4: status -uno   0.04(0.03+0.10)   0.22(0.08+0.12)
>> +450.0%
>>  7519.5: status -uall  0.20(0.13+0.16)   0.27(0.18+0.19)
>> +35.0%
>>
>> Against linux.git with a hack to flush the FS cache (on Linux) before
>> running the first 'git status', only running one test so the result
>> isn't discarded as the slowest of N:
>>
>
> I don't know know about on Linux but with Windows, when you flush the file
> system cache via unmount/mount, it causes Watchman to do a full scan with
> the next query.  This has a significant negative performance impact on the
> next status call as it returns the set of _all_ files which git then uses to
> find and mark all index and untracked cache entries as fsmonitor dirty then
> does a complete scan.  This combination makes the first status after these
> events slower than without Watchman.

I'm not sure, but I don't think this is a thing at all on Linux. How
inotify works is unrelated to whether some part of the filesystem
happens to be cached, and the echo+proc command I have is just
flushing all those pages. That I'm flushing all the pages at once
should be as much of a non-event as one page being claimed from the
fscache for use by the RAM allocation of some application.

I suspect Windows somehow conflates its fs watching implementation
with what on Linux is the inotify queue the application has to
consume, if that queue isn't consumed watchman will need to re-scan.

Or maybe my mental model of this whole thing is wrong and this also
happens on Linux...

> I'm currently working with the Watchman team to return a code indicating we
> should just scan everything ourselves to avoid the extra overhead, but that
> solution is TBD.  Given it only happens the first time a query is done on a
> new clone or the first time after watchman is started, I didn't intend to
> hold up the patch series for it but will submit an enhanced version once a
> solution is available.  The enhanced version will then be the same
> performance (for the first status call) as when not running with fsmonitor -
> not faster - as the state is unknown so must be gathered from the working
> directory.

I don't have time to update the perf test now or dig into it, but most
of what you're describing in this mail doesn't at all match with the
ad-hoc tests I ran in
https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vxd8nur...@mail.gmail.com/

There (at the very end of the E-Mail) I'm running watchman in a tight
loop while I flush the entire fs cache, its runtime is never longer
than 600ms, with 3ms being the norm.

I.e. flushing the cache doesn't slow things down much at all compared
to how long a "git status" takes from cold cache. Something else must
be going on, and the smoking gun is the gprof output I posted in the
follow-up E-Mail:

Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-02 Thread Ben Peart



On 6/2/2017 6:28 AM, Ævar Arnfjörð Bjarmason wrote:

Add a performance test for the new core.fsmonitor facility using the
sample query-fsmonitor hook.

This is WIP code for the reasons explained in the setup comments,
unfortunately the perf code doesn't easily allow you to run different
setup code for different versions you're testing. This test will stop
working if the fsmonitor is merged into the master branch.

Output against linxu.git:

 $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519-fsmonitor.sh
 [...]
 Test  origin/master avar/fsmonitor
 ---
 7519.2: status (first)0.08(0.04+0.09)   0.12(0.07+0.10) +50.0%
 7519.3: status (subsequent)   0.08(0.04+0.09)   0.12(0.06+0.11) +50.0%
 7519.4: status -uno   0.02(0.02+0.05)   0.06(0.05+0.06) +200.0%
 7519.5: status -uall  0.08(0.06+0.07)   0.12(0.07+0.10) +50.0%



With regular status times this low, the overhead of calling the hook + 
watchman + perl results in slower overall times as I noted in my initial 
cover letter.  If status calls are already this fast, fsmonitor + 
watchman isn't needed and obviously doesn't help.


This does highlight an optimization I could add.  Even though -uno is 
passed, the fsmonitor code currently still goes through the work of 
marking the untracked entries as dirty.  I'll look at optimizing that 
out to speed status up when using that option.



And against a larger in-house monorepo I have here, with the same
options (except the repo path):

 Test  origin/master avar/fsmonitor
 ---
 7519.2: status (first)0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
 7519.3: status (subsequent)   0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
 7519.4: status -uno   0.04(0.03+0.10)   0.22(0.08+0.12) +450.0%
 7519.5: status -uall  0.20(0.13+0.16)   0.27(0.18+0.19) +35.0%

Against linux.git with a hack to flush the FS cache (on Linux) before
running the first 'git status', only running one test so the result
isn't discarded as the slowest of N:



I don't know know about on Linux but with Windows, when you flush the 
file system cache via unmount/mount, it causes Watchman to do a full 
scan with the next query.  This has a significant negative performance 
impact on the next status call as it returns the set of _all_ files 
which git then uses to find and mark all index and untracked cache 
entries as fsmonitor dirty then does a complete scan.  This combination 
makes the first status after these events slower than without Watchman.


I'm currently working with the Watchman team to return a code indicating 
we should just scan everything ourselves to avoid the extra overhead, 
but that solution is TBD.  Given it only happens the first time a query 
is done on a new clone or the first time after watchman is started, I 
didn't intend to hold up the patch series for it but will submit an 
enhanced version once a solution is available.  The enhanced version 
will then be the same performance (for the first status call) as when 
not running with fsmonitor - not faster - as the state is unknown so 
must be gathered from the working directory.



 $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='sudo sync 
&& echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null && make -j8' ./run 
origin/master avar/fsmonitor ./p7519-fsmonitor.sh
 [...]
 Test  origin/master avar/fsmonitor
 
 7519.2: status (first)0.30(0.18+0.10)   8.26(0.22+0.10) +2653.3%
 7519.3: status (subsequent)   0.08(0.04+0.08)   0.81(0.09+0.07) +912.5%
 7519.4: status -uno   0.02(0.01+0.06)   0.08(0.04+0.07) +300.0%
 7519.5: status -uall  0.08(0.06+0.07)   0.15(0.08+0.09) +87.5%

Now obviously due to 1 run that has a lot of noise, but I would expect
that first invocation to be blindingly fast since watchman has info on
what files were modified since the cache was flushed.



Every (first) run of the performance test will be very expensive for the 
reasons outlined above.  This is clearest when you only have 1 run as it 
doesn't get masked by the faster 2nd+ runs.  That first run will never 
be "blindingly fast" as git has to gather the initial state and save out 
the cache - it's all the subsequent calls that will be faster.



The same on the large monorepo noted above:

 Test  origin/master avar/fsmonitor
 ---
 7519.2: status (first)0.59(0.28+0.24)   0.93(0.35+0.19) +57.6%
 7519.3: status (subsequent)   0.20(0.10+0.19)   

RE: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor

2017-06-02 Thread David Turner
BTW, a medium-sized (~250k files across 40k dirs) synthetic repo is available 
over bittorrent at:
http://bitmover.com/2015-04-03-1M-git-bare.tar.bz2.torrent

I tried Ævar's perf test with that (on a beefy laptop with SSD), and got 
significantly slower results with bp/fsmonitor:
Test  origin/master bp/fsmonitor   
---
7519.2: status (first)0.32(0.23+0.39)   0.32(0.26+0.36) +0.0%  
7519.3: status (subsequent)   0.18(0.14+0.34)   0.32(0.24+0.37) +77.8% 
7519.4: status -uno   0.11(0.08+0.32)   0.24(0.18+0.34) +118.2%
7519.5: status -uall  0.49(0.22+0.56)   0.62(0.36+0.55) +26.5% 
7519.2: status (first)0.32(0.23+0.39)   0.32(0.26+0.36) +0.0%  
7519.3: status (subsequent)   0.18(0.14+0.34)   0.32(0.24+0.37) +77.8% 
7519.4: status -uno   0.11(0.08+0.32)   0.24(0.18+0.34) +118.2%
7519.5: status -uall  0.49(0.22+0.56)   0.62(0.36+0.55) +26.5% 

I have not yet looked into why this is.

> -Original Message-
> From: Ævar Arnfjörð Bjarmason [mailto:ava...@gmail.com]
> Sent: Friday, June 2, 2017 6:29 AM
> To: git@vger.kernel.org
> Cc: Junio C Hamano ; Ben Peart
> ; Nguyễn Thái Ngọc Duy ;
> Johannes Schindelin ; David Turner
> ; Jeff King ; Christian
> Couder ; Ævar Arnfjörð Bjarmason
> 
> Subject: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
> 
> Add a performance test for the new core.fsmonitor facility using the sample
> query-fsmonitor hook.
> 
> This is WIP code for the reasons explained in the setup comments,
> unfortunately the perf code doesn't easily allow you to run different setup
> code for different versions you're testing. This test will stop working if the
> fsmonitor is merged into the master branch.
> 
> Output against linxu.git:
> 
> $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux
> GIT_PERF_MAKE_OPTS='-j8' ./run origin/master avar/fsmonitor ./p7519-
> fsmonitor.sh
> [...]
> Test  origin/master avar/fsmonitor
> ---
> 7519.2: status (first)0.08(0.04+0.09)   0.12(0.07+0.10) +50.0%
> 7519.3: status (subsequent)   0.08(0.04+0.09)   0.12(0.06+0.11) +50.0%
> 7519.4: status -uno   0.02(0.02+0.05)   0.06(0.05+0.06) +200.0%
> 7519.5: status -uall  0.08(0.06+0.07)   0.12(0.07+0.10) +50.0%
> 
> And against a larger in-house monorepo I have here, with the same options
> (except the repo path):
> 
> Test  origin/master avar/fsmonitor
> ---
> 7519.2: status (first)0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
> 7519.3: status (subsequent)   0.20(0.11+0.18)   0.27(0.15+0.21) +35.0%
> 7519.4: status -uno   0.04(0.03+0.10)   0.22(0.08+0.12) +450.0%
> 7519.5: status -uall  0.20(0.13+0.16)   0.27(0.18+0.19) +35.0%
> 
> Against linux.git with a hack to flush the FS cache (on Linux) before running
> the first 'git status', only running one test so the result isn't discarded 
> as the
> slowest of N:
> 
> $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_LARGE_REPO=~/g/linux
> GIT_PERF_MAKE_COMMAND='sudo sync && echo 3 | sudo tee
> /proc/sys/vm/drop_caches >/dev/null && make -j8' ./run origin/master
> avar/fsmonitor ./p7519-fsmonitor.sh
> [...]
> Test  origin/master avar/fsmonitor
> 
> 7519.2: status (first)0.30(0.18+0.10)   8.26(0.22+0.10) +2653.3%
> 7519.3: status (subsequent)   0.08(0.04+0.08)   0.81(0.09+0.07) +912.5%
> 7519.4: status -uno   0.02(0.01+0.06)   0.08(0.04+0.07) +300.0%
> 7519.5: status -uall  0.08(0.06+0.07)   0.15(0.08+0.09) +87.5%
> 
> Now obviously due to 1 run that has a lot of noise, but I would expect that
> first invocation to be blindingly fast since watchman has info on what files
> were modified since the cache was flushed.
> 
> The same on the large monorepo noted above:
> 
> Test  origin/master avar/fsmonitor
> ---
> 7519.2: status (first)0.59(0.28+0.24)   0.93(0.35+0.19) +57.6%
> 7519.3: status (subsequent)   0.20(0.10+0.19)   0.28(0.16+0.20) +40.0%
> 7519.4: status -uno   0.04(0.04+0.09)   0.11(0.08+0.12) +175.0%
> 7519.5: status -uall  0.29(0.11+0.18)   0.40(0.16+0.19) +37.9%
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> 
> 
> On Fri, Jun 2, 2017 at 2:40 AM, Ben Peart  wrote:
> > Any chance you can provide me with a bash