[RFC] On watchman support

2014-11-11 Thread Duy Nguyen
I've come to the last piece to speed up "git status", watchman
support. And I realized it's not as good as I thought.

Watchman could be used for two things: to avoid refreshing the index,
and to avoid searching for ignored files. The first one can be done
(with the patch below as demonstration). And it should keep refresh
cost to near zero in the best case, the cost is proportional to the
number of modified files.

For avoiding searching for ignored files. My intention was to build on
top of untracked cache. If watchman can tell me what files are added
or deleted since last observed time, then I can invalidate just
directories that contain them, or even better, calculate ignore status
for those files only.

This is important because in reality compilers and editors tend to
update files by creating a new version then rename them, updating
directory mtime and invalidating untracked cache as a consequence. As
you edit more files (or your rebuild touches more dirs), untracked
cache performance drops (until the next "git status"). The numbers I
posted so far are the best case.

The problem with watchman is it cannot tell me "new" files since the
last observed time (let's say 'T'). If a file exists at 'T', gets
deleted then recreated, then watchman tells me it's a new file. I want
to separate those from ones that do not exist before 'T'.

David's watchman approach does not have this problem because he keeps
track of all entries under $GIT_WORK_TREE and knows which files are
truely new. But I don't really want to keep the whole file list around,
especially when watchman already manages the same list.

So we got a few options:

1) Convince watchman devs to add something to make it work

2) Fork watchman

3) Make another daemon to keep file list around, or put it in a shared
   memory.

4) Move David's watchman series forward (and maybe make use of shared
   mem for fs_cache).

5) Go with something similar to the patch below and accept untracked
   cache performance degrades from time to time

6) ??

I'm working on 1). 2) is just bad taste, listed for completeness
only. If we go with 3) and watchman starts to support Windows (seems
to be in their plan), we'll need to rework some how. And I really
don't like 3)

If 1-3 does not work out, we're left without 4) and 5). We could
support both, but proobably not worth the code complexity and should
just go with one.

And if we go with 4) we should probably think of dropping untracked
cache if watchman will support Windows in the end. 4) also has another
advantage over untracked cache, that it could speed up listing ignored
files as well as untracked files.

Comments?

-- 8< --
diff --git a/Makefile b/Makefile
index fa58a53..a2be728 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,7 @@ TCLTK_PATH = wish
 XGETTEXT = xgettext
 MSGFMT = msgfmt
 PTHREAD_LIBS = -lpthread
+WATCHMAN_LIBS = -lwatchman
 PTHREAD_CFLAGS =
 GCOV = gcov
 
@@ -1453,6 +1454,13 @@ else
LIB_OBJS += thread-utils.o
 endif
 
+ifdef USE_WATCHMAN
+   LIB_H += watchman-support.h
+   LIB_OBJS += watchman-support.o
+   EXTLIBS += $(WATCHMAN_LIBS)
+   BASIC_CFLAGS += -DUSE_WATCHMAN
+endif
+
 ifdef HAVE_PATHS_H
BASIC_CFLAGS += -DHAVE_PATHS_H
 endif
@@ -,6 +2230,7 @@ GIT-BUILD-OPTIONS: FORCE
@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@
@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@
@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst 
','\'',$(NO_UNIX_SOCKETS)))'\' >>$@
+   @echo USE_WATCHMAN=\''$(subst ','\'',$(subst 
','\'',$(USE_WATCHMAN)))'\' >>$@
 ifdef TEST_OUTPUT_DIRECTORY
@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst 
','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@
 endif
diff --git a/builtin/update-index.c b/builtin/update-index.c
index f23ec83..1da2b15 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -882,6 +882,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
 {
int newfd, entries, has_errors = 0, line_termination = '\n';
int untracked_cache = -1;
+   int use_watchman = -1;
int read_from_stdin = 0;
int prefix_length = prefix ? strlen(prefix) : 0;
int preferred_index_format = 0;
@@ -977,6 +978,8 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
N_("enable/disable untracked cache")),
OPT_SET_INT(0, "force-untracked-cache", &untracked_cache,
N_("enable untracked cache without testing the 
filesystem"), 2),
+   OPT_BOOL(0, "watchman", &use_watchman,
+   N_("use or not use watchman to reduce refresh cost")),
OPT_END()
};
 
@@ -1102,6 +1105,14 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
the_index.cache_changed |= UNTRACKED_CHANGED;
}
 
+   if (use_watchman > 0) {
+   the_index.last_update  

Re: [RFC] On watchman support

2014-11-12 Thread Torsten Bögershausen
On 2014-11-11 13.49, Duy Nguyen wrote:
> I've come to the last piece to speed up "git status", watchman
> support. And I realized it's not as good as I thought.
> 
> Watchman could be used for two things: to avoid refreshing the index,
> and to avoid searching for ignored files. The first one can be done
> (with the patch below as demonstration). And it should keep refresh
> cost to near zero in the best case, the cost is proportional to the
> number of modified files.
> 
> For avoiding searching for ignored files. My intention was to build on
> top of untracked cache. If watchman can tell me what files are added
> or deleted since last observed time, then I can invalidate just
> directories that contain them, or even better, calculate ignore status
> for those files only.
> 
> This is important because in reality compilers and editors tend to
> update files by creating a new version then rename them, updating
> directory mtime and invalidating untracked cache as a consequence. As
> you edit more files (or your rebuild touches more dirs), untracked
> cache performance drops (until the next "git status"). The numbers I
> posted so far are the best case.
> 
> The problem with watchman is it cannot tell me "new" files since the
> last observed time (let's say 'T'). If a file exists at 'T', gets
> deleted then recreated, then watchman tells me it's a new file. I want
> to separate those from ones that do not exist before 'T'.
> 
> David's watchman approach does not have this problem because he keeps
> track of all entries under $GIT_WORK_TREE and knows which files are
> truely new. But I don't really want to keep the whole file list around,
> especially when watchman already manages the same list.
> 
> So we got a few options:
> 
> 1) Convince watchman devs to add something to make it work
> 
> 2) Fork watchman
> 
> 3) Make another daemon to keep file list around, or put it in a shared
>memory.
> 
> 4) Move David's watchman series forward (and maybe make use of shared
>mem for fs_cache).
> 
> 5) Go with something similar to the patch below and accept untracked
>cache performance degrades from time to time
> 
> 6) ??
> 
> I'm working on 1). 2) is just bad taste, listed for completeness
> only. If we go with 3) and watchman starts to support Windows (seems
> to be in their plan), we'll need to rework some how. And I really
> don't like 3)
> 
> If 1-3 does not work out, we're left without 4) and 5). We could
> support both, but proobably not worth the code complexity and should
> just go with one.
> 
> And if we go with 4) we should probably think of dropping untracked
> cache if watchman will support Windows in the end. 4) also has another
> advantage over untracked cache, that it could speed up listing ignored
> files as well as untracked files.
> 
> Comments?
> 
[remove the patch]
>From a Git user perspective it could be good to have something like this:

a) git status -u
b) git status -uno
c) git status -umtime
d) git status -uwatchman

We know that a) and b) already exist.
c) Can be convenient to have, in order to do benchmarking and testing.
  When the UNTR extension is not found, Git can give an error,
  saying something like this:
  No mtime information found, use "git update-index --untracked-cache"
d) does not yet exist

Of course we may want to configure the default for "git status" in a default 
variable,
like status.findUntrackedFiles, which can be empty "", "mtime" or "watchman",
and we may add other backends later.

A short test showed that watchman compiles under Mac OS.
The patch did not compile out of the box (both Git and watchman declare
there own version of usage(), some C99 complaints from the compiler in watchman,
nothing that can not be fixed easily)


I will test the mtime patch under networked file systems the next weeks.


The short version:
Go with c), d) then 5) until we have something better :-) 



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-13 Thread Duy Nguyen
On Thu, Nov 13, 2014 at 12:05 PM, Torsten Bögershausen  wrote:
> From a Git user perspective it could be good to have something like this:
>
> a) git status -u
> b) git status -uno
> c) git status -umtime
> d) git status -uwatchman
>
> We know that a) and b) already exist.
> c) Can be convenient to have, in order to do benchmarking and testing.
>   When the UNTR extension is not found, Git can give an error,
>   saying something like this:
>   No mtime information found, use "git update-index --untracked-cache"
> d) does not yet exist
>
> Of course we may want to configure the default for "git status" in a default 
> variable,
> like status.findUntrackedFiles, which can be empty "", "mtime" or "watchman",
> and we may add other backends later.

While "git status" is in the spotlight, these optimizations have wider
impact. Faster index read/refresh/write helps the majority of
commands. Faster untracked listing hits git-status, git-add,
git-commit -A... This is why I go with environment variable for
temporarily disabling something, or we'll need many config and command
line options, one per command.

> A short test showed that watchman compiles under Mac OS.
> The patch did not compile out of the box (both Git and watchman declare
> there own version of usage(), some C99 complaints from the compiler in 
> watchman,
> nothing that can not be fixed easily)

Yeah it's not perfect. It's mainly to show speeding up refresh with
watchman could be done easily and with low impact

> I will test the mtime patch under networked file systems the next weeks.

Hmm.. you remind me mtime series may have this as an advantage over watchman..
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-14 Thread Torsten Bögershausen
On 11/13/2014 01:22 PM, Duy Nguyen wrote:
> On Thu, Nov 13, 2014 at 12:05 PM, Torsten Bögershausen  wrote:
>> From a Git user perspective it could be good to have something like this:
>>
>> a) git status -u
>> b) git status -uno
>> c) git status -umtime
>> d) git status -uwatchman
>>
>> We know that a) and b) already exist.
>> c) Can be convenient to have, in order to do benchmarking and testing.
>>   When the UNTR extension is not found, Git can give an error,
>>   saying something like this:
>>   No mtime information found, use "git update-index --untracked-cache"
>> d) does not yet exist
>>
>> Of course we may want to configure the default for "git status" in a default 
>> variable,
>> like status.findUntrackedFiles, which can be empty "", "mtime" or "watchman",
>> and we may add other backends later.
> While "git status" is in the spotlight, these optimizations have wider
> impact. Faster index read/refresh/write helps the majority of
> commands. Faster untracked listing hits git-status, git-add,
> git-commit -A... This is why I go with environment variable for
> temporarily disabling something, or we'll need many config and command
> line options, one per command.
>
>> A short test showed that watchman compiles under Mac OS.
>> The patch did not compile out of the box (both Git and watchman declare
>> there own version of usage(), some C99 complaints from the compiler in 
>> watchman,
>> nothing that can not be fixed easily)
> Yeah it's not perfect. It's mainly to show speeding up refresh with
> watchman could be done easily and with low impact
>
>> I will test the mtime patch under networked file systems the next weeks.


Thinks become to get a little bit clearer.
What I can understand is that we have 2 different "update-helpers" for Git,
thanks for that.

just in case there is re-roll, does the following makes sense:
We want to enable them (probably only one at a time) either by command line or
persistent in a repo.

As I think we have 2 different update helpers
(and may be more in the future)
GIT_UPDATE_HELPER=dirmtime git status
GIT_UPDATE_HELPER=watchman git status
GIT_UPDATE_HELPER=none git status

of course we want to be able to configure it:
git config core.updatehelper dirmtime


After configuring we may want to override it:
GIT_UPDATE_HELPER=none git status
or
git -c core.updatehelper=none status

> Hmm.. you remind me mtime series may have this as an advantage over watchman..
I had the time to do a short test, sharing a copy of git.git under NFS:
The time for git status dropped from 0.4 seconds to 0.15 seconds or so.
Very nice.
The next test will be to share the same repo under samba to Windows
and Mac OS and see how this works.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-17 Thread David Turner
On Tue, 2014-11-11 at 19:49 +0700, Duy Nguyen wrote:
> I've come to the last piece to speed up "git status", watchman
> support. And I realized it's not as good as I thought.
> 
> Watchman could be used for two things: to avoid refreshing the index,
> and to avoid searching for ignored files. The first one can be done
> (with the patch below as demonstration). And it should keep refresh
> cost to near zero in the best case, the cost is proportional to the
> number of modified files.
> 
> For avoiding searching for ignored files. My intention was to build on
> top of untracked cache. If watchman can tell me what files are added
> or deleted since last observed time, then I can invalidate just
> directories that contain them, or even better, calculate ignore status
> for those files only.
> 
> This is important because in reality compilers and editors tend to
> update files by creating a new version then rename them, updating
> directory mtime and invalidating untracked cache as a consequence. As
> you edit more files (or your rebuild touches more dirs), untracked
> cache performance drops (until the next "git status"). The numbers I
> posted so far are the best case.
> 
> The problem with watchman is it cannot tell me "new" files since the
> last observed time (let's say 'T'). If a file exists at 'T', gets
> deleted then recreated, then watchman tells me it's a new file. I want
> to separate those from ones that do not exist before 'T'.
> 
> David's watchman approach does not have this problem because he keeps
> track of all entries under $GIT_WORK_TREE and knows which files are
> truely new. But I don't really want to keep the whole file list around,
> especially when watchman already manages the same list.
> 
> So we got a few options:
> 
> 1) Convince watchman devs to add something to make it work

Based on the thread on the watchman github it looks like this won't
happen. 

> 2) Fork watchman
> 
> 3) Make another daemon to keep file list around, or put it in a shared
>memory.
> 
> 4) Move David's watchman series forward (and maybe make use of shared
>mem for fs_cache).
> 
> 5) Go with something similar to the patch below and accept untracked
>cache performance degrades from time to time
> 
> 6) ??
> 
> I'm working on 1). 2) is just bad taste, listed for completeness
> only. If we go with 3) and watchman starts to support Windows (seems
> to be in their plan), we'll need to rework some how. And I really
> don't like 3)
> 
> If 1-3 does not work out, we're left without 4) and 5). We could
> support both, but proobably not worth the code complexity and should
> just go with one.
> 
> And if we go with 4) we should probably think of dropping untracked
> cache if watchman will support Windows in the end. 4) also has another
> advantage over untracked cache, that it could speed up listing ignored
> files as well as untracked files.
> 
> Comments?

I don't think it would be impossible to add Windows support to watchman;
the necessary functions exist, although I don't know how well they work.
My experience with watchman is that it is something of a stress test of
a filesystem's notification layer.  It has exposed bugs in inotify, and
caused system instability on OS X.

My patches are not the world's most beautiful, but they do work.  I
think some improvement might be possible by keeping info about tracked
files in the index, and only storing the tree of ignored and untracked
files separately.  But I have not thought this through fully.  In any
case, making use of shared memory for the fs_cache (as some of your
other patches do for the index) would definitely save time.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-18 Thread Duy Nguyen
On Tue, Nov 18, 2014 at 7:25 AM, David Turner  wrote:
>> So we got a few options:
>>
>> 1) Convince watchman devs to add something to make it work
>
> Based on the thread on the watchman github it looks like this won't
> happen.

Yeah. I came to the conclusion that I needed an extra daemon. And
because I would need an extra daemon anyway to speed up index read
time, that one could be used for caching something else like watchman.
It works out quite nice: because watchman is not tied to the main
'git' binary, people don't need libwatchman and libjansson by default.
When people want watchman, they can install an extra package that
includes the index-helper and its dependencies. This only matters for
binary-based distros of course.

>> Comments?
>
> I don't think it would be impossible to add Windows support to watchman;
> the necessary functions exist, although I don't know how well they work.
> My experience with watchman is that it is something of a stress test of
> a filesystem's notification layer.  It has exposed bugs in inotify, and
> caused system instability on OS X.

The way i'm adding watchman to index-helper should work on Windows as
well, IPC is really simple. But let's wait and see.

> My patches are not the world's most beautiful, but they do work.  I
> think some improvement might be possible by keeping info about tracked
> files in the index, and only storing the tree of ignored and untracked
> files separately.  But I have not thought this through fully.  In any
> case, making use of shared memory for the fs_cache (as some of your
> other patches do for the index) would definitely save time.

By the way, what happened to your sse optimization in refs.c? I see
it's reverted but I didn't follow closely to know why. Or will you go
with cityhash now.. I ask because you have another sse optimization
for hashmap on your watchman branch and that could reduce init time
for name-hash. Name-hash is used often on case-insensitive fs (less
often on case-sensitive fs).

I did a simple test and your optimization could init name-hash (on
webkit) in 35ms, while unmodified hashmap took 88ms. Loading index on
this machine took 360ms for reference (probably down too 100ms with
index-helper running, when that 88ms starts to become significant).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-18 Thread David Turner
On Tue, 2014-11-18 at 17:48 +0700, Duy Nguyen wrote:
> > My patches are not the world's most beautiful, but they do work.  I
> > think some improvement might be possible by keeping info about tracked
> > files in the index, and only storing the tree of ignored and untracked
> > files separately.  But I have not thought this through fully.  In any
> > case, making use of shared memory for the fs_cache (as some of your
> > other patches do for the index) would definitely save time.
> 
> By the way, what happened to your sse optimization in refs.c? I see
> it's reverted but I didn't follow closely to know why. 

I don't know why either -- it works just fine.  There was a bug, but I
fixed it.  Junio?

> Or will you go
> with cityhash now.. I ask because you have another sse optimization
> for hashmap on your watchman branch and that could reduce init time
> for name-hash. Name-hash is used often on case-insensitive fs (less
> often on case-sensitive fs).

Cityhash would be better, because it has actual engineering effort put
into it; what I did on my branch is a hack that happens to work
decently.  As the comment notes, I did not spend much effort on tuning
my implementation.  Also, Cityhash doesn't require SSE, so it's more
portable.

> I did a simple test and your optimization could init name-hash (on
> webkit) in 35ms, while unmodified hashmap took 88ms. Loading index on
> this machine took 360ms for reference (probably down too 100ms with
> index-helper running, when that 88ms starts to become significant).

OK, that sounds like a big win.  

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-18 Thread Junio C Hamano
David Turner  writes:

> On Tue, 2014-11-18 at 17:48 +0700, Duy Nguyen wrote:
>> > My patches are not the world's most beautiful, but they do work.  I
>> > think some improvement might be possible by keeping info about tracked
>> > files in the index, and only storing the tree of ignored and untracked
>> > files separately.  But I have not thought this through fully.  In any
>> > case, making use of shared memory for the fs_cache (as some of your
>> > other patches do for the index) would definitely save time.
>> 
>> By the way, what happened to your sse optimization in refs.c? I see
>> it's reverted but I didn't follow closely to know why. 
>
> I don't know why either -- it works just fine.  There was a bug, but I
> fixed it.  Junio?

I vaguely recall that the reason why we dropped it was because it
was too much code churn in an area that was being worked on in
parallel, but you may need to go back to the list archive for
details.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-18 Thread David Turner
On Tue, 2014-11-18 at 12:55 -0800, Junio C Hamano wrote:
> David Turner  writes:
> 
> > On Tue, 2014-11-18 at 17:48 +0700, Duy Nguyen wrote:
> >> > My patches are not the world's most beautiful, but they do work.  I
> >> > think some improvement might be possible by keeping info about tracked
> >> > files in the index, and only storing the tree of ignored and untracked
> >> > files separately.  But I have not thought this through fully.  In any
> >> > case, making use of shared memory for the fs_cache (as some of your
> >> > other patches do for the index) would definitely save time.
> >> 
> >> By the way, what happened to your sse optimization in refs.c? I see
> >> it's reverted but I didn't follow closely to know why. 
> >
> > I don't know why either -- it works just fine.  There was a bug, but I
> > fixed it.  Junio?
> 
> I vaguely recall that the reason why we dropped it was because it
> was too much code churn in an area that was being worked on in
> parallel, but you may need to go back to the list archive for
> details.

OK, in that case I'll try to remember to reroll it once the rest of the
refs stuff lands.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-18 Thread Junio C Hamano
David Turner  writes:

> On Tue, 2014-11-18 at 12:55 -0800, Junio C Hamano wrote:
>> I vaguely recall that the reason why we dropped it was because it
>> was too much code churn in an area that was being worked on in
>> parallel, but you may need to go back to the list archive for
>> details.
>
> OK, in that case I'll try to remember to reroll it once the rest of the
> refs stuff lands.

Sure.  But I would much prefer to see us explore an arch independent
optimisation of the caller before starting to micro-optimize a leaf
function.  

It is not check_refname_format() that is the real problem. It's the
fact that we do O(# of refs) work whenever we have to access the
packed-refs file. check_refname_format() is part of that, surely,
but so is reading the file, creating all of the refname structs in
memory, etc. (credit to peff@).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-18 Thread Jeff King
On Tue, Nov 18, 2014 at 01:26:56PM -0800, Junio C Hamano wrote:

> It is not check_refname_format() that is the real problem. It's the
> fact that we do O(# of refs) work whenever we have to access the
> packed-refs file. check_refname_format() is part of that, surely,
> but so is reading the file, creating all of the refname structs in
> memory, etc. (credit to peff@).

Yeah, I'd agree very much with that. I am not sure if I am cc'd here
because of my general complaining about packed-refs, or if I have said
something clever on the subject.

I did implement at one point a packed-refs reader that does a binary
search on the mmap'd packed-refs file, and can return a single value or
even locate the first entry matching a prefix (like "refs/tags/") and
iterate until we're out of the prefix. Unfortunately this runs very
contrary to the caching design of the refs.c code. It is focused on
caching _loose_ refs, where we may read an outer directory (like
"refs/"), and would like to avoid descending into an inner directory
(likes "refs/foo/") unless we are interested in what is in it. But
caching partial reads of packed-refs like this is "inside out"; we might
read all of "refs/tags/*", but have no clue what else is in "refs/". So
integrating it into refs.c would take pretty major surgery.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-19 Thread Paolo Ciarrocchi
On Tue, Nov 18, 2014 at 1:25 AM, David Turner  wrote:
>
> My patches are not the world's most beautiful, but they do work.

Out of curiosity: do you run the patches at twitter?

Thanks.

-- Paolo


-- 
Paolo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-19 Thread David Turner
On Wed, 2014-11-19 at 16:26 +0100, Paolo Ciarrocchi wrote:
> On Tue, Nov 18, 2014 at 1:25 AM, David Turner  
> wrote:
> >
> > My patches are not the world's most beautiful, but they do work.
> 
> Out of curiosity: do you run the patches at twitter?

An increasing number of us do, yes. 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-11-28 Thread Duy Nguyen
On Wed, Nov 19, 2014 at 1:12 AM, David Turner  wrote:
>> Or will you go
>> with cityhash now.. I ask because you have another sse optimization
>> for hashmap on your watchman branch and that could reduce init time
>> for name-hash. Name-hash is used often on case-insensitive fs (less
>> often on case-sensitive fs).
>
> Cityhash would be better, because it has actual engineering effort put
> into it; what I did on my branch is a hack that happens to work
> decently.  As the comment notes, I did not spend much effort on tuning
> my implementation.  Also, Cityhash doesn't require SSE, so it's more
> portable.

Cityhash looks less appealing to me. For one thing it's C++ so linking
to C can't be done. I could add a few "extern "C"" to make it work.
But if we plan to support it eventually, cityhash must support C out
of the box.

Then cityhash does not support case-insensitive hashing. I had to make
a CityHash32i version based on CityHash32. It's probably my bugs
there, but performance is worse (~120ms) than original hashmap.c
(90ms). Enabling sse4.2 helps a bit, but still worse. Using the
case-sensitive version in place for memihash and strihash does make
cityhash win over hashmap.c, around 50ms (with or without sse4.2). But
that's still not as good as your version (~35ms)..
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] On watchman support

2014-12-01 Thread David Turner
On Fri, 2014-11-28 at 18:13 +0700, Duy Nguyen wrote:
> On Wed, Nov 19, 2014 at 1:12 AM, David Turner  
> wrote:
> >> Or will you go
> >> with cityhash now.. I ask because you have another sse optimization
> >> for hashmap on your watchman branch and that could reduce init time
> >> for name-hash. Name-hash is used often on case-insensitive fs (less
> >> often on case-sensitive fs).
> >
> > Cityhash would be better, because it has actual engineering effort put
> > into it; what I did on my branch is a hack that happens to work
> > decently.  As the comment notes, I did not spend much effort on tuning
> > my implementation.  Also, Cityhash doesn't require SSE, so it's more
> > portable.
> 
> Cityhash looks less appealing to me. For one thing it's C++ so linking
> to C can't be done. I could add a few "extern "C"" to make it work.
> But if we plan to support it eventually, cityhash must support C out
> of the box.
> 
> Then cityhash does not support case-insensitive hashing. I had to make
> a CityHash32i version based on CityHash32. It's probably my bugs
> there, but performance is worse (~120ms) than original hashmap.c
> (90ms). Enabling sse4.2 helps a bit, but still worse. Using the
> case-sensitive version in place for memihash and strihash does make
> cityhash win over hashmap.c, around 50ms (with or without sse4.2). But
> that's still not as good as your version (~35ms)..

Can you post your CityHash32i?  

Have you tried this C port of Cityhash?

https://github.com/santeri-io/cityhash-c


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html