Re: [PATCH v3] coccicheck: process every source file at once

2018-10-04 Thread Jacob Keller
On Tue, Oct 2, 2018 at 1:18 PM Jacob Keller  wrote:
>
> On Tue, Oct 2, 2018 at 1:07 PM Jacob Keller  wrote:
> >
> > From: Jacob Keller 
> >
> > make coccicheck is used in order to apply coccinelle semantic patches,
> > and see if any of the transformations found within contrib/coccinelle/
> > can be applied to the current code base.
> >
> > Pass every file to a single invocation of spatch, instead of running
> > spatch once per source file.
> >
> > This reduces the time required to run make coccicheck by a significant
> > amount of time:
> >
> > Prior timing of make coccicheck
> >   real6m14.090s
> >   user25m2.606s
> >   sys 1m22.919s
> >
> > New timing of make coccicheck
> >   real1m36.580s
> >   user7m55.933s
> >   sys 0m18.219s
> >
> > This is nearly a 4x decrease in the time required to run make
> > coccicheck. This is due to the overhead of restarting spatch for every
> > file. By processing all files at once, we can amortize this startup cost
> > across the total number of files, rather than paying it once per file.
> >
> > Signed-off-by: Jacob Keller 
> > ---
>
> Forgot to add what changed. I dropped the subshell and "||" bit around
> invoking spatch.
>
> Thanks,
> Jake
>

Junio, do you want me to update the commit message on my side with the
memory concerns? Or could you update it to mention memory as a noted
trade off.

Thanks,
Jake

>
> >  Makefile | 6 ++
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index df1df9db78da..da692ece9e12 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -2715,10 +2715,8 @@ endif
> >  %.cocci.patch: %.cocci $(COCCI_SOURCES)
> > @echo '' SPATCH $<; \
> > ret=0; \
> > -   for f in $(COCCI_SOURCES); do \
> > -   $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
> > -   { ret=$$?; break; }; \
> > -   done >$@+ 2>$@.log; \
> > +   $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) >$@+ 
> > 2>$@.log; \
> > +   ret=$$?; \
> > if test $$ret != 0; \
> > then \
> > cat $@.log; \
> > --
> > 2.18.0.219.gaf81d287a9da
> >


Info

2018-10-04 Thread SYARIKAT MISTI JAYA SDN BHD
Let's work together to execute this Business ,contact ( dongeh...@hotmail.com) 
for more details


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-04 Thread Josh Steadmon
On 2018.10.03 15:47, Stefan Beller wrote:
> On Wed, Oct 3, 2018 at 2:34 PM Josh Steadmon  wrote:
> >
> > Is there a method or design for advertising multiple acceptable versions
> > from the client?
> 
> I think the client can send multiple versions, looking through protocol.c
> (and not the documentation as I should for this:)
> 
>   /*
>* Determine which protocol version the client has requested.  Since
>* multiple 'version' keys can be sent by the client, indicating that
>* the client is okay to speak any of them, select the greatest version
>* that the client has requested.  This is due to the assumption that
>* the most recent protocol version will be the most state-of-the-art.
>*/
> ...
> const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> string_list_split(, git_protocol, ':', -1);
> ...
> for_each_string_list_item(item, ) {
> ...
> if (skip_prefix(item->string, "version=", ))
> ...
> 
> in determine_protocol_version_server which already had the client
> speak to it, so I think at least the server can deal with multiple versions.
> 
> But given that we transport the version in env variables, we'd
> need to be extra careful if we just did not see the version
> in the --remote=. above?

Sorry, I'm not sure I understand this. What about env variables requires
caution?


> > From my understanding, we can only add a single
> > version=X field in the advertisement, but IIUC we can extend this fairly
> > easily? Perhaps we can have "version=X" to mean the preferred version,
> > and then a repeatable "acceptable_version=Y" field or similar?
> 
> Just re-use "version X", separated by colons as above.
> 
> > > From a maintenance perspective, do we want to keep
> > > this part of the code central, as it ties protocol (as proxied
> > > by service name) to the max version number?
> > > I would think that we'd rather have the decision local to the
> > > code, i.e. builtin/fetch would need to tell protocol.c that it
> > > can do (0,1,2) and builtin/push can do (0,1), and then the
> > > networking layers of code would figure out by the input
> > > from the caller and the input from the user (configured
> > > protocol.version) what is the best to go forward from
> > > then on.
> >
> > I like having it centralized, because enforcing this in git_connect()
> > and discover_refs() catches all the outgoing version advertisements, but
> > there's lots of code paths that lead to those two functions that would
> > all have to have the acceptable version numbers plumbed through.
> 
> Makes sense.
> 
> > I suppose we could also have a registry of services to version numbers,
> > but I tend to dislike non-local sources of data. But if the list likes
> > that approach better, I'll be happy to implement it.
> 
> > > But I guess having the central place here is not to
> > > bad either. How will it cope with the desire of protocol v2
> > > to have only one end point (c.f. serve.{c,h} via builtin/serve
> > > as "git serve") ?
> >
> > I'm not sure about this. In my series to add a v2 archive command, I
> > added support for a new endpoint for proto v2 and I don't recall seeing
> > any complaints, but that is still open for review.
> 
> Ah I guess new end points would imply that you can speak at least
> a given version N.
> 
> > I suppose if we are strict about serving from a single endpoint, the
> > version registry makes more sense, and individual operations can declare
> > acceptable version numbers before calling any network code?
> 
> Ah yeah, that makes sense!

Thinking out loud here. Please let me know if I say something stupid :)

So we'll have (up to) three pieces of version information we'll care
about for version negotiation:

1. (maybe) a client-side protocol.version config entry
2. a list of acceptable proto versions for the currently running
   operation on the client
3. a list of acceptable proto versions for the server endpoint that
   handles the request

According to the doc on protocol.version: "If set, clients will attempt
to communicate with a server using the specified protocol version. If
unset, no attempt will be made by the client to communicate using a
particular protocol version, this results in protocol version 0 being
used."

So, if protocol.version is not set, or set to 0, the client should not
attempt any sort of version negotiation. Otherwise, the client prefers a
particular version, but we don't guarantee that they will actually use
that version after the (unspecified) version negotiation procedure.

If protocol.version is set to something other than 0, we construct a
list of acceptable versions for the given operation. If the
protocol.version entry is present in that list, we move it to the front
of the list to note that it is the preferred version. We send all of
these, in order, in the request.

When the server endpoint begins to handle a request, it first constructs
a list of acceptable versions. If the client 

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread Jeff King
On Thu, Oct 04, 2018 at 03:52:05PM -0700, Jonathan Tan wrote:

> > Or I am even OK with leaving the existing tablesize
> > check. It is a little intimate with the implementation details, but I
> > suspect that if oidset were to change (e.g., to initialize the buckets
> > immediately), the problem would be pretty apparent in the tests.
> 
> I am OK with this too, except that (as far as I can tell) the point of
> this patch set is to replace the internals of oidset, so we no longer
> have the tablesize check. Unless you meant the khash analog of
> tablesize? I would be OK if all tablesize references are replaced with
> the khash analog in the same patch that the oidset internals are
> replaced.

Yeah, in khash it's n_buckets, but it's basically the same thing. René's
original patch did that update, and we were musing on whether there was
a way to avoid crossing the module boundary so intimately. Hence the
patch you saw. :)

-Peff


[RFC PATCH 2/2] fuzz: Add fuzz testing for packfile indices.

2018-10-04 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 .gitignore  |  1 +
 Makefile|  5 -
 fuzz-pack-idx.c | 13 +
 packfile.c  | 44 +---
 packfile.h  | 13 +
 5 files changed, 56 insertions(+), 20 deletions(-)
 create mode 100644 fuzz-pack-idx.c

diff --git a/.gitignore b/.gitignore
index 87a28b3115..64b3377d40 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,6 @@
 /fuzz_corpora
 /fuzz-pack-headers
+/fuzz-pack-idx
 /GIT-BUILD-OPTIONS
 /GIT-CFLAGS
 /GIT-LDFLAGS
diff --git a/Makefile b/Makefile
index 10bb82b115..f7e6be8f19 100644
--- a/Makefile
+++ b/Makefile
@@ -685,6 +685,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 ETAGS_TARGET = TAGS
 
 FUZZ_OBJS += fuzz-pack-headers.o
+FUZZ_OBJS += fuzz-pack-idx.o
 
 FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS))
 
@@ -3071,7 +3072,7 @@ cover_db_html: cover_db
 
 ### Fuzz testing
 #
-.PHONY: fuzz-clean fuzz-objs fuzz-compile
+.PHONY: fuzz-clean fuzz-objs fuzz-compile fuzz-all
 
 FUZZ_CFLAGS = $(CFLAGS) -fsanitize-coverage=trace-pc-guard -fsanitize=address
 FUZZ_LDFLAGS = $(FUZZ_CFLAGS)
@@ -3089,3 +3090,5 @@ fuzz-compile:
 $(FUZZ_PROGRAMS): fuzz-compile
clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \
$(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@
+
+fuzz-all: $(FUZZ_PROGRAMS)
diff --git a/fuzz-pack-idx.c b/fuzz-pack-idx.c
new file mode 100644
index 00..0c3d777aac
--- /dev/null
+++ b/fuzz-pack-idx.c
@@ -0,0 +1,13 @@
+#include "object-store.h"
+#include "packfile.h"
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+   struct packed_git p;
+
+   load_idx("fuzz-input", GIT_SHA1_RAWSZ, (void *)data, size, );
+
+   return 0;
+}
diff --git a/packfile.c b/packfile.c
index 841b36182f..86074a76e9 100644
--- a/packfile.c
+++ b/packfile.c
@@ -80,10 +80,8 @@ void pack_report(void)
 static int check_packed_git_idx(const char *path, struct packed_git *p)
 {
void *idx_map;
-   struct pack_idx_header *hdr;
size_t idx_size;
-   uint32_t version, nr, i, *index;
-   int fd = git_open(path);
+   int fd = git_open(path), ret;
struct stat st;
const unsigned int hashsz = the_hash_algo->rawsz;
 
@@ -101,16 +99,32 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
idx_map = xmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
 
-   hdr = idx_map;
+   ret = load_idx(path, hashsz, idx_map, idx_size, p);
+
+   if (ret)
+   munmap(idx_map, idx_size);
+
+   return ret;
+}
+
+int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
+size_t idx_size, struct packed_git *p)
+{
+   struct pack_idx_header *hdr = idx_map;
+   uint32_t version, nr, i, *index;
+
+   if (idx_size < 4 * 256 + hashsz + hashsz)
+   return error("index file %s is too small", path);
+   if (idx_map == NULL)
+   return error("empty data");
+
if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) {
version = ntohl(hdr->idx_version);
-   if (version < 2 || version > 2) {
-   munmap(idx_map, idx_size);
+   if (version < 2 || version > 2)
return error("index file %s is version %"PRIu32
 " and is not supported by this binary"
 " (try upgrading GIT to a newer version)",
 path, version);
-   }
} else
version = 1;
 
@@ -120,10 +134,8 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
index += 2;  /* skip index header */
for (i = 0; i < 256; i++) {
uint32_t n = ntohl(index[i]);
-   if (n < nr) {
-   munmap(idx_map, idx_size);
+   if (n < nr)
return error("non-monotonic index %s", path);
-   }
nr = n;
}
 
@@ -135,10 +147,8 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
 *  - hash of the packfile
 *  - file checksum
 */
-   if (idx_size != 4*256 + nr * (hashsz + 4) + hashsz + hashsz) {
-   munmap(idx_map, idx_size);
+   if (idx_size != 4 * 256 + nr * (hashsz + 4) + hashsz + hashsz)
return error("wrong index v1 file size in %s", path);
-   }
} else if (version == 2) {
/*
 * Minimum size:
@@ -157,20 +167,16 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
unsigned long max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
-   if (idx_size < min_size || 

[RFC PATCH 0/2] add fuzzing targets for use with LLVM libFuzzer

2018-10-04 Thread Josh Steadmon
libFuzzer[1] is a fuzzing engine included in recent versions of LLVM. It
is used by OSS-Fuzz[2] for continuous fuzzing of OSS projects.

This series adds two basic fuzzing targets covering packfile header and
index code. It is not particularly portable, and requires the use of
LLVM v4.0 (the latest version available on my workstation). I would
particularly appreciate advice on how to make the Makefile more
portable.

[1]: https://llvm.org/docs/LibFuzzer.html
[2]: https://github.com/google/oss-fuzz

Josh Steadmon (2):
  fuzz: Add basic fuzz testing target.
  fuzz: Add fuzz testing for packfile indices.

 .gitignore  |  3 +++
 Makefile| 33 -
 fuzz-pack-headers.c | 14 ++
 fuzz-pack-idx.c | 13 +
 packfile.c  | 44 +---
 packfile.h  | 13 +
 6 files changed, 100 insertions(+), 20 deletions(-)
 create mode 100644 fuzz-pack-headers.c
 create mode 100644 fuzz-pack-idx.c

-- 
2.19.0.605.g01d371f741-goog



[RFC PATCH 1/2] fuzz: Add basic fuzz testing target.

2018-10-04 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 .gitignore  |  2 ++
 Makefile| 30 +-
 fuzz-pack-headers.c | 14 ++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 fuzz-pack-headers.c

diff --git a/.gitignore b/.gitignore
index 9d1363a1eb..87a28b3115 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,5 @@
+/fuzz_corpora
+/fuzz-pack-headers
 /GIT-BUILD-OPTIONS
 /GIT-CFLAGS
 /GIT-LDFLAGS
diff --git a/Makefile b/Makefile
index 13e1c52478..10bb82b115 100644
--- a/Makefile
+++ b/Makefile
@@ -590,6 +590,8 @@ XDIFF_OBJS =
 VCSSVN_OBJS =
 GENERATED_H =
 EXTRA_CPPFLAGS =
+FUZZ_OBJS =
+FUZZ_PROGRAMS =
 LIB_OBJS =
 PROGRAM_OBJS =
 PROGRAMS =
@@ -682,6 +684,10 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 
 ETAGS_TARGET = TAGS
 
+FUZZ_OBJS += fuzz-pack-headers.o
+
+FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS))
+
 # Empty...
 EXTRA_PROGRAMS =
 
@@ -2250,6 +2256,7 @@ TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) 
$(patsubst %,t/helper/%,$(TEST
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
$(XDIFF_OBJS) \
$(VCSSVN_OBJS) \
+   $(FUZZ_OBJS) \
common-main.o \
git.o
 ifndef NO_CURL
@@ -2931,7 +2938,7 @@ profile-clean:
 cocciclean:
$(RM) contrib/coccinelle/*.cocci.patch*
 
-clean: profile-clean coverage-clean cocciclean
+clean: profile-clean coverage-clean cocciclean fuzz-clean
$(RM) *.res
$(RM) $(OBJECTS)
$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
@@ -3061,3 +3068,24 @@ cover_db: coverage-report
 cover_db_html: cover_db
cover -report html -outputdir cover_db_html cover_db
 
+
+### Fuzz testing
+#
+.PHONY: fuzz-clean fuzz-objs fuzz-compile
+
+FUZZ_CFLAGS = $(CFLAGS) -fsanitize-coverage=trace-pc-guard -fsanitize=address
+FUZZ_LDFLAGS = $(FUZZ_CFLAGS)
+
+
+fuzz-clean:
+   $(RM) $(FUZZ_PROGRAMS) $(FUZZ_OBJS)
+
+fuzz-objs: $(FUZZ_OBJS)
+
+fuzz-compile:
+   $(MAKE) CC=clang LD=clang CFLAGS="$(FUZZ_CFLAGS)" \
+   LDFLAGS="$(FUZZ_LDFLAGS)" all fuzz-objs
+
+$(FUZZ_PROGRAMS): fuzz-compile
+   clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \
+   $(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@
diff --git a/fuzz-pack-headers.c b/fuzz-pack-headers.c
new file mode 100644
index 00..99da1d0fd3
--- /dev/null
+++ b/fuzz-pack-headers.c
@@ -0,0 +1,14 @@
+#include "packfile.h"
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+   enum object_type type;
+   unsigned long len;
+
+   unpack_object_header_buffer((const unsigned char *)data,
+   (unsigned long)size, , );
+
+   return 0;
+}
-- 
2.19.0.605.g01d371f741-goog



Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-04 Thread René Scharfe
Am 01.10.2018 um 22:37 schrieb René Scharfe:
> Am 01.10.2018 um 21:26 schrieb Derrick Stolee:
>> On 10/1/2018 3:16 PM, René Scharfe wrote:
>>> Am 28.06.2018 um 14:31 schrieb Derrick Stolee via GitGitGadget:
 diff --git a/commit-reach.c b/commit-reach.c
 index c58e50fbb..ac132c8e4 100644
 --- a/commit-reach.c
 +++ b/commit-reach.c
 @@ -513,65 +513,88 @@ int commit_contains(struct ref_filter *filter, 
 struct commit *commit,
return is_descendant_of(commit, list);
   }
   
 -int reachable(struct commit *from, int with_flag, int assign_flag,
 -time_t min_commit_date)
 +static int compare_commits_by_gen(const void *_a, const void *_b)
   {
 -  struct prio_queue work = { compare_commits_by_commit_date };
 +  const struct commit *a = (const struct commit *)_a;
 +  const struct commit *b = (const struct commit *)_b;
>>> This cast is bogus.  QSORT gets handed a struct commit **, i.e. an array
>>> of pointers, and qsort(1) passes references to those pointers to the
>>> compare function, and not the pointer values.
>>
>> Good catch! I'm disappointed that we couldn't use type-checking here, as 
>> it is quite difficult to discover that the types are wrong here.
> 
> Generics in C are hard, and type checking traditionally falls by the
> wayside.  You could use macros for that, like klib [*] does, but that
> has its own downsides (more object text, debugging the sort macros
> themselves is harder).
> 
> [*] https://github.com/attractivechaos/klib/blob/master/ksort.h

We could also do something like this to reduce the amount of manual
casting, but do we want to?  (Macro at the bottom, three semi-random
examples at the top.)
---
 bisect.c  | 11 +++
 commit-graph.c|  9 ++---
 commit-reach.c| 12 +---
 git-compat-util.h | 12 
 4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/bisect.c b/bisect.c
index e8b17cf7e1..06be3a3c15 100644
--- a/bisect.c
+++ b/bisect.c
@@ -192,16 +192,11 @@ struct commit_dist {
int distance;
 };
 
-static int compare_commit_dist(const void *a_, const void *b_)
-{
-   struct commit_dist *a, *b;
-
-   a = (struct commit_dist *)a_;
-   b = (struct commit_dist *)b_;
+DEFINE_SORT(sort_by_commit_dist, struct commit_dist, a, b, {
if (a->distance != b->distance)
return b->distance - a->distance; /* desc sort */
return oidcmp(>commit->object.oid, >commit->object.oid);
-}
+})
 
 static struct commit_list *best_bisection_sorted(struct commit_list *list, int 
nr)
 {
@@ -223,7 +218,7 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
array[cnt].distance = distance;
cnt++;
}
-   QSORT(array, cnt, compare_commit_dist);
+   sort_by_commit_dist(array, cnt);
for (p = list, i = 0; i < cnt; i++) {
struct object *obj = &(array[i].commit->object);
 
diff --git a/commit-graph.c b/commit-graph.c
index 7f4519ec3b..a2202414e0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -550,12 +550,7 @@ static void write_graph_chunk_large_edges(struct hashfile 
*f,
}
 }
 
-static int commit_compare(const void *_a, const void *_b)
-{
-   const struct object_id *a = (const struct object_id *)_a;
-   const struct object_id *b = (const struct object_id *)_b;
-   return oidcmp(a, b);
-}
+DEFINE_SORT(sort_oids, struct object_id, a, b, return oidcmp(a, b))
 
 struct packed_commit_list {
struct commit **list;
@@ -780,7 +775,7 @@ void write_commit_graph(const char *obj_dir,
 
close_reachable();
 
-   QSORT(oids.list, oids.nr, commit_compare);
+   sort_oids(oids.list, oids.nr);
 
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
diff --git a/commit-reach.c b/commit-reach.c
index 2f5e592d16..3aef47c3dd 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -527,17 +527,15 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return is_descendant_of(commit, list);
 }
 
-static int compare_commits_by_gen(const void *_a, const void *_b)
-{
-   const struct commit *a = *(const struct commit * const *)_a;
-   const struct commit *b = *(const struct commit * const *)_b;
-
+DEFINE_SORT(sort_commits_by_gen, struct commit *, ap, bp, {
+   const struct commit *a = *ap;
+   const struct commit *b = *bp;
if (a->generation < b->generation)
return -1;
if (a->generation > b->generation)
return 1;
return 0;
-}
+})
 
 int can_all_from_reach_with_flag(struct object_array *from,
 unsigned int with_flag,
@@ -580,7 +578,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
nr_commits++;
}
 
-   QSORT(list, nr_commits, compare_commits_by_gen);
+   sort_commits_by_gen(list, nr_commits);
 
for (i = 0; i < nr_commits; i++) {

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread Jonathan Tan
> Yes, I agree with you that the loops are still entwined. They're at
> least now in a single function, though, which IMHO is a slight
> improvement.

Hmm...originally, the main functionality was in a single loop in a
single function. But I say that because I consider the lazy loading in
tip_oids_contain() as something both peripheral and independent (if the
main loop's logic changes, the lazy loading most likely does not need to
be changed).

> I agree with you that just checking:
> 
>   if (oidset_count() != 0)
> 
> would be fine, too.

OK, we're agreed on this :-)

> Or I am even OK with leaving the existing tablesize
> check. It is a little intimate with the implementation details, but I
> suspect that if oidset were to change (e.g., to initialize the buckets
> immediately), the problem would be pretty apparent in the tests.

I am OK with this too, except that (as far as I can tell) the point of
this patch set is to replace the internals of oidset, so we no longer
have the tablesize check. Unless you meant the khash analog of
tablesize? I would be OK if all tablesize references are replaced with
the khash analog in the same patch that the oidset internals are
replaced.


Re: [PATCH v2] gpg-interface.c: detect and reject multiple signatures on commits

2018-10-04 Thread Tacitus Aedifex
I think that there is a more simple way to catch multiple signatures see below.  
Other than that, I like this patch.


Signed-off-by: Tacitus Aedifex 
---
gpg-interface.c | 18 ++
1 file changed, 18 insertions(+)

diff --git a/gpg-interface.c b/gpg-interface.c
index db17d65f8..a4dba3361 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -93,6 +93,7 @@ static void parse_gpg_output(struct signature_check *sigc)
{
const char *buf = sigc->gpg_status;
int i;
+   int multi_sig = 0;

/* Iterate over all search strings */
for (i = 0; i < ARRAY_SIZE(sigcheck_gpg_status); i++) {
@@ -115,6 +116,23 @@ static void parse_gpg_output(struct signature_check *sigc)
next = strchrnul(found, '\n');
sigc->signer = xmemdupz(found, next - found);
}
+		} else 
+			multi_sig++;

+
+   /*
+* GOODSIG, BADSIG, etc. can occure only once for each 
signature.
+* Therefore, if we had more than one then we're dealing with
+* multiple signatures. We don't support them currently and 
they are
+* rather hard to create, so something is likely probably not 
right
+* and we should reject them altogether.
+*/
+   if (multi_sig > 1) {
+   sigc->result = 'E';
+   /* clear partial data to avoid confusion */
+   if (sigc->signer)
+   FREE_AND_NULL(sigc->signer);
+   if (sigc->key)
+   FREE_AND_NULL(sigc->key);
}
}
}
--
2.18.0.129.ge333175
--


Re: [PATCH v3 0/5] oidset: use khash

2018-10-04 Thread Jeff King
On Thu, Oct 04, 2018 at 05:05:37PM +0200, René Scharfe wrote:

> Two new patches to avoid using oidset internals in fetch-pack:
> 
>   fetch-pack: factor out is_unmatched_ref()
>   fetch-pack: load tip_oids eagerly iff needed
> 
> Unchanged patch:
> 
>   khash: factor out kh_release_*
> 
> Unchanged, except it doesn't touch fetch-pack anymore:
> 
>   oidset: use khash
> 
> A new patch, to reduce object text size:
> 
>   oidset: uninline oidset_init()

I left a few responses to the fetch-pack changes. The rest of it all
looks good to me. Certainly it's satisfying for the implementation-swap
in patch 4 to be able to touch _only_ oidset.[ch].

-Peff


Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread Jeff King
On Thu, Oct 04, 2018 at 02:38:13PM -0700, Jonathan Tan wrote:

> > -   if ((allow_unadvertised_object_request &
> > -(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
> > -   tip_oids_contain(_oids, unmatched, newlist,
> > ->old_oid)) {
> > +   if (!strict || oidset_contains(_oids, >old_oid)) {
> > ref->match_status = REF_MATCHED;
> > *newtail = copy_ref(ref);
> > newtail = &(*newtail)->next;
> 
> I don't think the concerns are truly separated - the first loop quoted
> needs to know that in the second loop, tip_oids is accessed only if
> there is at least one unmatched ref. Would it be better to expose the
> size of the oidset and then use it in place of
> "tip_oids->map.map.tablesize"? Checking for initialization (using
> "tablesize") is conceptually different from checking the size, but in
> this code, both initialization and the first increase in size occur upon
> the first oidset_insert(), so we should still get the same result.

Yes, I agree with you that the loops are still entwined. They're at
least now in a single function, though, which IMHO is a slight
improvement.

I agree with you that just checking:

  if (oidset_count() != 0)

would be fine, too. Or I am even OK with leaving the existing tablesize
check. It is a little intimate with the implementation details, but I
suspect that if oidset were to change (e.g., to initialize the buckets
immediately), the problem would be pretty apparent in the tests.

And in fact, we can test by just changing the conditional in
tip_oid_contains to if(0), which quite clearly fails t5500.60 (along
with others).

So I don't think it's the end of the world to leave it (but I also am
not opposed to the other options discussed).

-Peff


Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread René Scharfe
Am 04.10.2018 um 23:38 schrieb Jonathan Tan:
>> Determine if the oidset needs to be populated upfront and then do that
>> instead.  This duplicates a loop, but simplifies the existing one by
>> separating concerns between the two.
> 
> [snip]
> 
>> +if (strict) {
>> +for (i = 0; i < nr_sought; i++) {
>> +ref = sought[i];
>> +if (!is_unmatched_ref(ref))
>> +continue;
>> +
>> +add_refs_to_oidset(_oids, unmatched);
>> +add_refs_to_oidset(_oids, newlist);
>> +break;
>> +}
>> +}
>> +
>>  /* Append unmatched requests to the list */
>>  for (i = 0; i < nr_sought; i++) {
>>  ref = sought[i];
>>  if (!is_unmatched_ref(ref))
>>  continue;
>>  
>> -if ((allow_unadvertised_object_request &
>> - (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
>> -tip_oids_contain(_oids, unmatched, newlist,
>> - >old_oid)) {
>> +if (!strict || oidset_contains(_oids, >old_oid)) {
>>  ref->match_status = REF_MATCHED;
>>  *newtail = copy_ref(ref);
>>  newtail = &(*newtail)->next;
> 
> I don't think the concerns are truly separated - the first loop quoted
> needs to know that in the second loop, tip_oids is accessed only if
> there is at least one unmatched ref.

Right, the two loops are still closely related, but only the first one
is concerned with loading refs.

For a true separation we could first build a list of unmatched refs and
then loop through that instead of `sought`.  Not sure if that's better,
but maybe the overhead I imagine it would introduce isn't all that big.

> Would it be better to expose the
> size of the oidset and then use it in place of
> "tip_oids->map.map.tablesize"? Checking for initialization (using
> "tablesize") is conceptually different from checking the size, but in
> this code, both initialization and the first increase in size occur upon
> the first oidset_insert(), so we should still get the same result.

It would work in practice.  If there are no refs to load then it would
try to load those zero refs for each unmatched ref, which shouldn't
really be a problem, but I still find it a wee bit sloppy.  Too
theoretical?

René


Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread Jeff King
On Thu, Oct 04, 2018 at 05:09:39PM +0200, René Scharfe wrote:

> tip_oids_contain() lazily loads refs into an oidset at its first call.
> It abuses the internal (sub)member .map.tablesize of that oidset to
> check if it has done that already.
> 
> Determine if the oidset needs to be populated upfront and then do that
> instead.  This duplicates a loop, but simplifies the existing one by
> separating concerns between the two.

I like this approach much better than what I showed earlier. But...

> diff --git a/fetch-pack.c b/fetch-pack.c
> index 3b317952f0..53914563b5 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -526,23 +526,6 @@ static void add_refs_to_oidset(struct oidset *oids, 
> struct ref *refs)
>   oidset_insert(oids, >old_oid);
>  }
>  
> -static int tip_oids_contain(struct oidset *tip_oids,
> - struct ref *unmatched, struct ref *newlist,
> - const struct object_id *id)
> -{
> - /*
> -  * Note that this only looks at the ref lists the first time it's
> -  * called. This works out in filter_refs() because even though it may
> -  * add to "newlist" between calls, the additions will always be for
> -  * oids that are already in the set.
> -  */

I don't think the subtle point this comment is making goes away. We're
still growing the list in the loop that calls tip_oids_contain() (and
which now calls just oidset_contains). That's OK for the reasons given
here, but I think that would need to be moved down to this code:

> + if (strict) {
> + for (i = 0; i < nr_sought; i++) {
> + ref = sought[i];
> + if (!is_unmatched_ref(ref))
> + continue;
> +
> + add_refs_to_oidset(_oids, unmatched);
> + add_refs_to_oidset(_oids, newlist);
> + break;
> + }
> + }

I.e., we need to say here why it's OK to summarize newlist in the
oidset, even though we're adding to it later.

-Peff


Re: [BUG] Error while trying to git apply a patch; works with patch -p1

2018-10-04 Thread SZEDER Gábor
On Thu, Oct 04, 2018 at 06:01:11PM -0300, Eneas Queiroz wrote:
> I've sent this to the list 2 days ago, but I can't find it in the list
> archives, so I'm sending it again without files attached.  I apologize
> if this is a duplicate. One should be able to reproduce this with the
> current PR files, but if not, I can provide them.
> 
> I've hit a strange error while trying to apply a patch from github
> here: https://github.com/openwrt/openwrt/pull/965
> 
> 965.patch:452: trailing whitespace.
> 
> 965.patch:559: space before tab in indent.
>  -o $(SHLIBNAME_FULL) \
> 965.patch:560: space before tab in indent.
>  $$ALLSYMSFLAGS $$SHOBJECTS $$NOALLSYMSFLAGS $$LIBDEPS; \
> 965.patch:564: space before tab in indent.
> -o $(SHLIBNAME_FULL) \
> 965.patch:2334: trailing whitespace.
> 
> error: package/libs/openssl/patches/100-Configure-afalg-support.patch:
> No such file or directory
> error: package/libs/openssl/patches/110-openwrt_targets.patch: No such
> file or directory
> error: package/libs/openssl/patches/120-fix_link_segfault.patch: No
> such file or directory
> error: package/libs/openssl/patches/1.1.0/100-Configure-afalg-support.patch:
> No such file or directory
> error: package/libs/openssl/patches/1.1.0/110-openwrt_targets.patch:
> No such file or directory
> 
> If you get the patch file from
> https://github.com/openwrt/openwrt/pull/965.patch and apply it with
> git apply, it fails.  If I apply the same file with patch -p1, it
> works fine.  I've tried it with git 2.16.4 and 2.19, and they both
> fail with the same error, and at least 2 more people have confirmed
> it.
> 
> git apply fails even when using git format-patch -13 --stdout as a
> source, so it is not github doing something weird.
> 
> The file is a series of 13 patches.  If I split the series after the

So this is no _a_ patch, then, but a mailbox of patches.  'git apply'
is supposed to apply a single a patch; apparently 'patch' is more
lenient.

Have you tried 'git am'?

> 3rd patch, it works.
> Also, if I use https://github.com/openwrt/openwrt/pull/965.diff, it also 
> works.
> 
> I'm not subscribed to the list, so please CC me.
> 
> Cheers,
> 
> Eneas


[RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-04 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 03 2018, Ævar Arnfjörð Bjarmason wrote:

> Don't have time to patch this now, but thought I'd send a note / RFC
> about this.
>
> Now that we have the commit graph it's nice to be able to set
> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
> /etc/gitconfig to apply them to all repos.
>
> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
> until whenever my first "gc" kicks in, which may be quite some time if
> I'm just using it passively.
>
> So we should make "git gc --auto" be run on clone, and change the
> need_to_gc() / cmd_gc() behavior so that we detect that the
> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
> then just generate that without doing a full repack.
>
> As an aside such more granular "gc" would be nice for e.g. pack-refs
> too. It's possible for us to just have one pack, but to have 100k loose
> refs.
>
> It might also be good to have some gc.autoDetachOnClone option and have
> it false by default, so we don't have a race condition where "clone
> linux && git -C linux tag --contains" is slow because the graph hasn't
> been generated yet, and generating the graph initially doesn't take that
> long compared to the time to clone a large repo (and on a small one it
> won't matter either way).
>
> I was going to say "also for midx", but of course after clone we have
> just one pack, so I can't imagine us needing this. But I can see us
> having other such optional side-indexes in the future generated by gc,
> and they'd also benefit from this.

I don't have time to polish this up for submission now, but here's a WIP
patch that implements this, highlights:

 * There's a gc.clone.autoDetach=false default setting which overrides
   gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a
   --cloning option to indicate this).

 * A clone of say git.git with gc.writeCommitGraph=true looks like:

   [...]
   Receiving objects: 100% (255262/255262), 100.49 MiB | 17.78 MiB/s, done.
   Resolving deltas: 100% (188947/188947), done.
   Computing commit graph generation numbers: 100% (55210/55210), done.

 * The 'git gc --auto' command also knows to (only) run the commit-graph
   (and space is left for future optimization steps) if general GC isn't
   needed, but we need "optimization":

   $ rm .git/objects/info/commit-graph; ~/g/git/git --exec-path=$PWD -c 
gc.writeCommitGraph=true -c gc.autoDetach=false gc --auto;
   Annotating commits in commit graph: 341229, done.
   Computing commit graph generation numbers: 100% (165969/165969), done.
   $

 * The patch to gc.c looks less scary with -w, most of it is indenting
   the existing pack-refs etc. with a "!auto_gc || should_gc" condition.

 * I added a commit_graph_exists() exists function and only care if I
   get ENOENT for the purposes of this gc mode. This would need to be
   tweaked for the incremental mode Derrick talks about, but if we just
   set "should_optimize" that'll also work as far as gc --auto is
   concerned (e.g. on fetch, am etc.)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1546833213..5759fbb067 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1621,7 +1621,19 @@ gc.autoPackLimit::

 gc.autoDetach::
Make `git gc --auto` return immediately and run in background
-   if the system supports it. Default is true.
+   if the system supports it. Default is true. Overridden by
+   `gc.clone.autoDetach` when running linkgit:git-clone[1].
+
+gc.clone.autoDetach::
+   Make `git gc --auto` return immediately and run in background
+   if the system supports it when run via
+   linkgit:git-clone[1]. Default is false.
++
+The reason this defaults to false is because the only time we'll have
+work to do after a 'git clone' is if something like
+`gc.writeCommitGraph` is true, in that case we'd like to compute the
+optimized file before returning, so that say commands that benefit
+from commit graph aren't slow until it's generated in the background.

 gc.bigPackThreshold::
If non-zero, all packs larger than this limit are kept when
diff --git a/builtin/clone.c b/builtin/clone.c
index 15b142d646..824c130ba5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -897,6 +897,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct remote *remote;
int err = 0, complete_refs_before_fetch = 1;
int submodule_progress;
+   const char *argv_gc_auto[]   = {"gc", "--auto", "--cloning", NULL};
+   const char *argv_gc_auto_quiet[] = {"gc", "--auto", "--cloning", 
"--quiet", NULL};

struct refspec rs = REFSPEC_INIT_FETCH;
struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
@@ -1245,5 +1247,11 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)

refspec_clear();
argv_array_clear(_prefixes);
+
+   if (0 <= option_verbosity)
+   

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread Jonathan Tan
> Determine if the oidset needs to be populated upfront and then do that
> instead.  This duplicates a loop, but simplifies the existing one by
> separating concerns between the two.

[snip]

> + if (strict) {
> + for (i = 0; i < nr_sought; i++) {
> + ref = sought[i];
> + if (!is_unmatched_ref(ref))
> + continue;
> +
> + add_refs_to_oidset(_oids, unmatched);
> + add_refs_to_oidset(_oids, newlist);
> + break;
> + }
> + }
> +
>   /* Append unmatched requests to the list */
>   for (i = 0; i < nr_sought; i++) {
>   ref = sought[i];
>   if (!is_unmatched_ref(ref))
>   continue;
>  
> - if ((allow_unadvertised_object_request &
> -  (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
> - tip_oids_contain(_oids, unmatched, newlist,
> -  >old_oid)) {
> + if (!strict || oidset_contains(_oids, >old_oid)) {
>   ref->match_status = REF_MATCHED;
>   *newtail = copy_ref(ref);
>   newtail = &(*newtail)->next;

I don't think the concerns are truly separated - the first loop quoted
needs to know that in the second loop, tip_oids is accessed only if
there is at least one unmatched ref. Would it be better to expose the
size of the oidset and then use it in place of
"tip_oids->map.map.tablesize"? Checking for initialization (using
"tablesize") is conceptually different from checking the size, but in
this code, both initialization and the first increase in size occur upon
the first oidset_insert(), so we should still get the same result.

But if we do end up going with the approach in this patch, then this
patch is obviously correct.

I also looked at patch 1 of this patch set and it is obviously correct.
I didn't look at the other patches.


[BUG] Error while trying to git apply a patch; works with patch -p1

2018-10-04 Thread Eneas Queiroz
I've sent this to the list 2 days ago, but I can't find it in the list
archives, so I'm sending it again without files attached.  I apologize
if this is a duplicate. One should be able to reproduce this with the
current PR files, but if not, I can provide them.

I've hit a strange error while trying to apply a patch from github
here: https://github.com/openwrt/openwrt/pull/965

965.patch:452: trailing whitespace.

965.patch:559: space before tab in indent.
 -o $(SHLIBNAME_FULL) \
965.patch:560: space before tab in indent.
 $$ALLSYMSFLAGS $$SHOBJECTS $$NOALLSYMSFLAGS $$LIBDEPS; \
965.patch:564: space before tab in indent.
-o $(SHLIBNAME_FULL) \
965.patch:2334: trailing whitespace.

error: package/libs/openssl/patches/100-Configure-afalg-support.patch:
No such file or directory
error: package/libs/openssl/patches/110-openwrt_targets.patch: No such
file or directory
error: package/libs/openssl/patches/120-fix_link_segfault.patch: No
such file or directory
error: package/libs/openssl/patches/1.1.0/100-Configure-afalg-support.patch:
No such file or directory
error: package/libs/openssl/patches/1.1.0/110-openwrt_targets.patch:
No such file or directory

If you get the patch file from
https://github.com/openwrt/openwrt/pull/965.patch and apply it with
git apply, it fails.  If I apply the same file with patch -p1, it
works fine.  I've tried it with git 2.16.4 and 2.19, and they both
fail with the same error, and at least 2 more people have confirmed
it.

git apply fails even when using git format-patch -13 --stdout as a
source, so it is not github doing something weird.

The file is a series of 13 patches.  If I split the series after the
3rd patch, it works.
Also, if I use https://github.com/openwrt/openwrt/pull/965.diff, it also works.

I'm not subscribed to the list, so please CC me.

Cheers,

Eneas


Re: [PATCH v2 5/5] diff --color-moved: fix a memory leak

2018-10-04 Thread Stefan Beller
On Thu, Oct 4, 2018 at 3:07 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> Free the hashmap items as well as the hashmap itself. This was found
> with asan.
>
> Signed-off-by: Phillip Wood 

Thanks for this series, all 5 patches are

Reviewed-by: Stefan Beller 


Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error

2018-10-04 Thread Johannes Schindelin
Hi Ananya,

On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:

> On Thu, 4 Oct 2018 at 20:56, Johannes Schindelin
>  wrote:
> >
> > [... talking about the reason why a slash does not need to be escaped
> > in a C string specifying a regular expression...]
> >
> > But it does not need to be escaped, when you specify the regular
> > expression the way we do. And the way we specified it is really the
> > standard when specifying regular expressions in C code, i.e. *without* the
> > suggested backslash.
> 
> Aha!. this makes total sense. I was thinking from a general regular expression
> point of view. But I should be thinking from C point of view and how C
> might interpret this newly submitted string.
> This explanation is very clear. Thanks for taking time to reply to my
> patch. From next time on, I will try to think from
> git project's point of view.

Of course! Thank you for taking the time to contribute this patch.

Maybe you have another idea for a micro-project? Maybe there is something
in Git that you wish was more convenient? Or maybe
https://public-inbox.org/git/?q=leftoverbits has something that you would
like to implement?

Ciao,
Johannes

> 
> Thanks,
> Ananya.
> 
> > Ciao,
> > Johannes
> >
> > >
> > > Thanks,
> > > Ananya.
> > >
> > > > Thanks,
> > > > Johannes
> > > >
> > > > > ---
> > > > >  userdiff.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/userdiff.c b/userdiff.c
> > > > > index f565f6731..f4ff9b9e5 100644
> > > > > --- a/userdiff.c
> > > > > +++ b/userdiff.c
> > > > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
> > > > >/* -- */
> > > > >"[a-zA-Z_][a-zA-Z0-9_]*"
> > > > >"|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
> > > > > -  "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
> > > > > +  "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
> > > > >/* -- */
> > > > >  PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> > > > >/* -- */
> > > > > --
> > > > > 2.17.1
> > > > >
> > > > >
> > >
> 


Re: Git Evolve

2018-10-04 Thread Stefan Xenos
Gerrit uses notes and branches of meta-commits internally for its
database, but it still uses the change-id footers to associate an
uploaded commit with a change within its database.

On Thu, Oct 4, 2018 at 9:05 AM, Jakub Narebski  wrote:
> Junio C Hamano  writes:
>> Stefan Xenos  writes:
>>
>>> What is the evolve command?
>>> ...
>>> - Systems like gerrit would no longer need to rely on "change-id" tags
>>> in commit comments to associate commits with the change that they
>>> edit, since git itself would have that information.
>>> ...
>>> Is anyone else interested in this? Please email me directly or on this
>>> list. Let's chat: I want to make sure that whatever we come up with is
>>> at least as good as any similar technology that has come before.
>>
>> As you listed in the related technologies section, I think the
>> underlying machinery that supports "rebase -i", especially with the
>> recent addition of redoing the existing merges (i.e. "rebase -i
>> -r"), may be enough to rewrite the histories that were built on top
>> of a commit that has been obsoleted by amending.
>>
>> I would imagine that the main design effort you would need to make
>> is to figure out a good way to
>>
>>  (1) keep track of which commits are obsoleted by which other ones
>>  [*1*], and
>>
>>  (2) to figure out what histories are still to be rebuilt in what
>>  order on top of what commit efficiently.
>>
>> Once these are done, you should be able to write out the sequence of
>> instructions to feed the same sequencer machinery used by the
>> "rebase -i" command.
>
> Well, that assumes that "rebase -i" can correctly recreate merges, if
> needed.
>
>> [Side note]
>>
>> *1* It is very desirable to keep track of the evolution of a change
>> without polluting the commit object with things like Change-Id:
>> and other cruft, either in the body or in the header.  If we
>> lose the Change-Id: footer without adding any new cruft in the
>> commit object header, that would be a great success.  It would
>> be a failure if we end up touching the object header.
>
> Doesn't Gerrit use git-notes instead of 'Change-Id:' trailer nowadays?
> Notes transport is quite easily controlled; the problem with notes merge
> does not matter for this use.
>
> Best,
> --
> Jakub Narębski


Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error

2018-10-04 Thread Ananya Krishna Maram
On Thu, 4 Oct 2018 at 20:56, Johannes Schindelin
 wrote:
>
> Hi Ananya,
>
> On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:
>
> > On Thu, 4 Oct 2018 at 19:56, Johannes Schindelin
> >  wrote:
> > >
> > > Hi Ananya,
> > >
> > > thank you for taking the time to write this patch!
> > >
> > > On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:
> > >
> > > > the forward slash character should be escaped with backslash. Fix
> > > > Unescaped forward slash error in Python regex statements.
> > > >
> > > > Signed-off-by: Ananya Krishna Maram
> > >
> > > That explains pretty well what you did, but I wonder why the forward slash
> > > needs to be escaped? I would understand if we enclosed the pattern in
> > > `//`, as it is done e.g. in Javascript, but we do not...
> >
> > You are correct, the code would execute either ways. But when I came across
> > this line, I didn't get it's meaning instantly because as per standards, 
> > forward
> > slash has to be escaped. In fact when open source code is written according 
> > to
> > standards, the code will be reachable to more people.
>
> I am afraid that I do not follow... Regular expressions have quite a few
> special characters, but forward slashes are not among them.
>
> Meaning: if we had specified the regular expression thusly (in any
> language that supports them to be specified in this way):
>
> /|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?/
>
> then I would agree that this is a bug, and needs to be fixed. But we
> specify it as a regular C string:
>
> "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"
>
> In this context, the backslash has an additional, nested meaning: it
> escapes special characters in a C string, too. So writing
>
> "\\"
>
> will actually result in a string consisting of a single backslash. And
>
> "\n"
>
> would specify a string consisting of a single line feed character. Some C
> compilers ignore incorrectly-escaped characters, i.e. "\/" would simply
> expand to the forward slash.
>
> However, you wanted to escape the forward slash in the regular expression.
> To do that, you would have to write
>
> "\\/"
>
> i.e. specifying, in a C string, a backslash character, followed by a
> forward slash character.
>
> However, the regular expression would then be interpreting the backslash
> character as escape character in its own right, seeing the forward slash,
> and replacing this sequence by a forward slash.
>
> But it does not need to be escaped, when you specify the regular
> expression the way we do. And the way we specified it is really the
> standard when specifying regular expressions in C code, i.e. *without* the
> suggested backslash.

Aha!. this makes total sense. I was thinking from a general regular expression
point of view. But I should be thinking from C point of view and how C
might interpret this newly submitted string.
This explanation is very clear. Thanks for taking time to reply to my
patch. From next time on, I will try to think from
git project's point of view.

Thanks,
Ananya.

> Ciao,
> Johannes
>
> >
> > Thanks,
> > Ananya.
> >
> > > Thanks,
> > > Johannes
> > >
> > > > ---
> > > >  userdiff.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/userdiff.c b/userdiff.c
> > > > index f565f6731..f4ff9b9e5 100644
> > > > --- a/userdiff.c
> > > > +++ b/userdiff.c
> > > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
> > > >/* -- */
> > > >"[a-zA-Z_][a-zA-Z0-9_]*"
> > > >"|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
> > > > -  "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
> > > > +  "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
> > > >/* -- */
> > > >  PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> > > >/* -- */
> > > > --
> > > > 2.17.1
> > > >
> > > >
> >


Re: Git Evolve

2018-10-04 Thread Jakub Narebski
Junio C Hamano  writes:
> Stefan Xenos  writes:
>
>> What is the evolve command?
>> ...
>> - Systems like gerrit would no longer need to rely on "change-id" tags
>> in commit comments to associate commits with the change that they
>> edit, since git itself would have that information.
>> ...
>> Is anyone else interested in this? Please email me directly or on this
>> list. Let's chat: I want to make sure that whatever we come up with is
>> at least as good as any similar technology that has come before.
>
> As you listed in the related technologies section, I think the
> underlying machinery that supports "rebase -i", especially with the
> recent addition of redoing the existing merges (i.e. "rebase -i
> -r"), may be enough to rewrite the histories that were built on top
> of a commit that has been obsoleted by amending.
>
> I would imagine that the main design effort you would need to make
> is to figure out a good way to
>
>  (1) keep track of which commits are obsoleted by which other ones
>  [*1*], and
>
>  (2) to figure out what histories are still to be rebuilt in what
>  order on top of what commit efficiently.
>
> Once these are done, you should be able to write out the sequence of
> instructions to feed the same sequencer machinery used by the
> "rebase -i" command.

Well, that assumes that "rebase -i" can correctly recreate merges, if
needed.

> [Side note]
>
> *1* It is very desirable to keep track of the evolution of a change
> without polluting the commit object with things like Change-Id:
> and other cruft, either in the body or in the header.  If we
> lose the Change-Id: footer without adding any new cruft in the
> commit object header, that would be a great success.  It would
> be a failure if we end up touching the object header.

Doesn't Gerrit use git-notes instead of 'Change-Id:' trailer nowadays?
Notes transport is quite easily controlled; the problem with notes merge
does not matter for this use.

Best,
--
Jakub Narębski


Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error

2018-10-04 Thread Johannes Schindelin
Hi Ananya,

On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:

> On Thu, 4 Oct 2018 at 19:56, Johannes Schindelin
>  wrote:
> >
> > Hi Ananya,
> >
> > thank you for taking the time to write this patch!
> >
> > On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:
> >
> > > the forward slash character should be escaped with backslash. Fix
> > > Unescaped forward slash error in Python regex statements.
> > >
> > > Signed-off-by: Ananya Krishna Maram
> >
> > That explains pretty well what you did, but I wonder why the forward slash
> > needs to be escaped? I would understand if we enclosed the pattern in
> > `//`, as it is done e.g. in Javascript, but we do not...
> 
> You are correct, the code would execute either ways. But when I came across
> this line, I didn't get it's meaning instantly because as per standards, 
> forward
> slash has to be escaped. In fact when open source code is written according to
> standards, the code will be reachable to more people.

I am afraid that I do not follow... Regular expressions have quite a few
special characters, but forward slashes are not among them.

Meaning: if we had specified the regular expression thusly (in any
language that supports them to be specified in this way):

/|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?/

then I would agree that this is a bug, and needs to be fixed. But we
specify it as a regular C string:

"|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"

In this context, the backslash has an additional, nested meaning: it
escapes special characters in a C string, too. So writing

"\\"

will actually result in a string consisting of a single backslash. And

"\n"

would specify a string consisting of a single line feed character. Some C
compilers ignore incorrectly-escaped characters, i.e. "\/" would simply
expand to the forward slash.

However, you wanted to escape the forward slash in the regular expression.
To do that, you would have to write

"\\/"

i.e. specifying, in a C string, a backslash character, followed by a
forward slash character.

However, the regular expression would then be interpreting the backslash
character as escape character in its own right, seeing the forward slash,
and replacing this sequence by a forward slash.

But it does not need to be escaped, when you specify the regular
expression the way we do. And the way we specified it is really the
standard when specifying regular expressions in C code, i.e. *without* the
suggested backslash.

Ciao,
Johannes

> 
> Thanks,
> Ananya.
> 
> > Thanks,
> > Johannes
> >
> > > ---
> > >  userdiff.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/userdiff.c b/userdiff.c
> > > index f565f6731..f4ff9b9e5 100644
> > > --- a/userdiff.c
> > > +++ b/userdiff.c
> > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
> > >/* -- */
> > >"[a-zA-Z_][a-zA-Z0-9_]*"
> > >"|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
> > > -  "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
> > > +  "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
> > >/* -- */
> > >  PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> > >/* -- */
> > > --
> > > 2.17.1
> > >
> > >
> 


[PATCH 5/5] oidset: uninline oidset_init()

2018-10-04 Thread René Scharfe
There is no need to inline oidset_init(), as it's typically only called
twice in the lifetime of an oidset (once at the beginning and at the end
by oidset_clear()) and kh_resize_* is quite big, so move its definition
to oidset.c.  Document it while we're at it.

Signed-off-by: Rene Scharfe 
---
 oidset.c |  7 +++
 oidset.h | 13 +++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/oidset.c b/oidset.c
index 9836d427ef..fe4eb921df 100644
--- a/oidset.c
+++ b/oidset.c
@@ -1,6 +1,13 @@
 #include "cache.h"
 #include "oidset.h"
 
+void oidset_init(struct oidset *set, size_t initial_size)
+{
+   memset(>set, 0, sizeof(set->set));
+   if (initial_size)
+   kh_resize_oid(>set, initial_size);
+}
+
 int oidset_contains(const struct oidset *set, const struct object_id *oid)
 {
khiter_t pos = kh_get_oid(>set, *oid);
diff --git a/oidset.h b/oidset.h
index 4b90540cd4..c9d0f6d3cc 100644
--- a/oidset.h
+++ b/oidset.h
@@ -38,12 +38,13 @@ struct oidset {
 #define OIDSET_INIT { { 0 } }
 
 
-static inline void oidset_init(struct oidset *set, size_t initial_size)
-{
-   memset(>set, 0, sizeof(set->set));
-   if (initial_size)
-   kh_resize_oid(>set, initial_size);
-}
+/**
+ * Initialize the oidset structure `set`.
+ *
+ * If `initial_size` is bigger than 0 then preallocate to allow inserting
+ * the specified number of elements without further allocations.
+ */
+void oidset_init(struct oidset *set, size_t initial_size);
 
 /**
  * Returns true iff `set` contains `oid`.
-- 
2.19.0


[PATCH v3 4/5] oidset: use khash

2018-10-04 Thread René Scharfe
Reimplement oidset using khash.h in order to reduce its memory footprint
and make it faster.

Performance of a command that mainly checks for duplicate objects using
an oidset, with master and Clang 6.0.1:

  $ cmd="./git-cat-file --batch-all-objects --unordered --buffer 
--batch-check='%(objectname)'"

  $ /usr/bin/time $cmd >/dev/null
  0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 
48484maxresident)k
  0inputs+0outputs (0major+11204minor)pagefaults 0swaps

  $ hyperfine "$cmd"
  Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer 
--batch-check='%(objectname)'

Time (mean ± σ): 250.0 ms ±   6.0 ms[User: 225.9 ms, System: 23.6 
ms]

Range (min … max):   242.0 ms … 261.1 ms

And with this patch:

  $ /usr/bin/time $cmd >/dev/null
  0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 
41396maxresident)k
  0inputs+0outputs (0major+8318minor)pagefaults 0swaps

  $ hyperfine "$cmd"
  Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer 
--batch-check='%(objectname)'

Time (mean ± σ): 151.9 ms ±   4.9 ms[User: 130.5 ms, System: 21.2 
ms]

Range (min … max):   148.2 ms … 170.4 ms

Initial-patch-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
 oidset.c | 34 --
 oidset.h | 36 
 2 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/oidset.c b/oidset.c
index 454c54f933..9836d427ef 100644
--- a/oidset.c
+++ b/oidset.c
@@ -3,38 +3,28 @@
 
 int oidset_contains(const struct oidset *set, const struct object_id *oid)
 {
-   if (!set->map.map.tablesize)
-   return 0;
-   return !!oidmap_get(>map, oid);
+   khiter_t pos = kh_get_oid(>set, *oid);
+   return pos != kh_end(>set);
 }
 
 int oidset_insert(struct oidset *set, const struct object_id *oid)
 {
-   struct oidmap_entry *entry;
-
-   if (!set->map.map.tablesize)
-   oidmap_init(>map, 0);
-   else if (oidset_contains(set, oid))
-   return 1;
-
-   entry = xmalloc(sizeof(*entry));
-   oidcpy(>oid, oid);
-
-   oidmap_put(>map, entry);
-   return 0;
+   int added;
+   kh_put_oid(>set, *oid, );
+   return !added;
 }
 
 int oidset_remove(struct oidset *set, const struct object_id *oid)
 {
-   struct oidmap_entry *entry;
-
-   entry = oidmap_remove(>map, oid);
-   free(entry);
-
-   return (entry != NULL);
+   khiter_t pos = kh_get_oid(>set, *oid);
+   if (pos == kh_end(>set))
+   return 0;
+   kh_del_oid(>set, pos);
+   return 1;
 }
 
 void oidset_clear(struct oidset *set)
 {
-   oidmap_free(>map, 1);
+   kh_release_oid(>set);
+   oidset_init(set, 0);
 }
diff --git a/oidset.h b/oidset.h
index 40ec5f87fe..4b90540cd4 100644
--- a/oidset.h
+++ b/oidset.h
@@ -1,7 +1,8 @@
 #ifndef OIDSET_H
 #define OIDSET_H
 
-#include "oidmap.h"
+#include "hashmap.h"
+#include "khash.h"
 
 /**
  * This API is similar to sha1-array, in that it maintains a set of object ids
@@ -15,19 +16,33 @@
  *  table overhead.
  */
 
+static inline unsigned int oid_hash(struct object_id oid)
+{
+   return sha1hash(oid.hash);
+}
+
+static inline int oid_equal(struct object_id a, struct object_id b)
+{
+   return oideq(, );
+}
+
+KHASH_INIT(oid, struct object_id, int, 0, oid_hash, oid_equal)
+
 /**
  * A single oidset; should be zero-initialized (or use OIDSET_INIT).
  */
 struct oidset {
-   struct oidmap map;
+   kh_oid_t set;
 };
 
-#define OIDSET_INIT { OIDMAP_INIT }
+#define OIDSET_INIT { { 0 } }
 
 
 static inline void oidset_init(struct oidset *set, size_t initial_size)
 {
-   oidmap_init(>map, initial_size);
+   memset(>set, 0, sizeof(set->set));
+   if (initial_size)
+   kh_resize_oid(>set, initial_size);
 }
 
 /**
@@ -58,19 +73,24 @@ int oidset_remove(struct oidset *set, const struct 
object_id *oid);
 void oidset_clear(struct oidset *set);
 
 struct oidset_iter {
-   struct oidmap_iter m_iter;
+   kh_oid_t *set;
+   khiter_t iter;
 };
 
 static inline void oidset_iter_init(struct oidset *set,
struct oidset_iter *iter)
 {
-   oidmap_iter_init(>map, >m_iter);
+   iter->set = >set;
+   iter->iter = kh_begin(iter->set);
 }
 
 static inline struct object_id *oidset_iter_next(struct oidset_iter *iter)
 {
-   struct oidmap_entry *e = oidmap_iter_next(>m_iter);
-   return e ? >oid : NULL;
+   for (; iter->iter != kh_end(iter->set); iter->iter++) {
+   if (kh_exist(iter->set, iter->iter))
+   return _key(iter->set, iter->iter++);
+   }
+   return NULL;
 }
 
 static inline struct object_id *oidset_iter_first(struct oidset *set,
-- 
2.19.0


[PATCH v3 3/5] khash: factor out kh_release_*

2018-10-04 Thread René Scharfe
Add a function for releasing the khash-internal allocations, but not the
khash structure itself.  It can be used with on-stack khash structs.

Signed-off-by: Rene Scharfe 
---
1 tab = 4 spaces here.

 khash.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/khash.h b/khash.h
index 07b4cc2e67..d10caa0c35 100644
--- a/khash.h
+++ b/khash.h
@@ -82,11 +82,16 @@ static const double __ac_HASH_UPPER = 0.77;
SCOPE kh_##name##_t *kh_init_##name(void) { 
\
return (kh_##name##_t*)xcalloc(1, sizeof(kh_##name##_t));   
\
}   
\
+   SCOPE void kh_release_##name(kh_##name##_t *h)  
\
+   {   
\
+   free(h->flags); 
\
+   free((void *)h->keys);  
\
+   free((void *)h->vals);  
\
+   }   
\
SCOPE void kh_destroy_##name(kh_##name##_t *h)  
\
{   
\
if (h) {
\
-   free((void *)h->keys); free(h->flags);  
\
-   free((void *)h->vals);  
\
+   kh_release_##name(h);   
\
free(h);
\
}   
\
}   
\
-- 
2.19.0


[PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-04 Thread René Scharfe
tip_oids_contain() lazily loads refs into an oidset at its first call.
It abuses the internal (sub)member .map.tablesize of that oidset to
check if it has done that already.

Determine if the oidset needs to be populated upfront and then do that
instead.  This duplicates a loop, but simplifies the existing one by
separating concerns between the two.

Signed-off-by: Rene Scharfe 
---
 fetch-pack.c | 36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 3b317952f0..53914563b5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -526,23 +526,6 @@ static void add_refs_to_oidset(struct oidset *oids, struct 
ref *refs)
oidset_insert(oids, >old_oid);
 }
 
-static int tip_oids_contain(struct oidset *tip_oids,
-   struct ref *unmatched, struct ref *newlist,
-   const struct object_id *id)
-{
-   /*
-* Note that this only looks at the ref lists the first time it's
-* called. This works out in filter_refs() because even though it may
-* add to "newlist" between calls, the additions will always be for
-* oids that are already in the set.
-*/
-   if (!tip_oids->map.map.tablesize) {
-   add_refs_to_oidset(tip_oids, unmatched);
-   add_refs_to_oidset(tip_oids, newlist);
-   }
-   return oidset_contains(tip_oids, id);
-}
-
 static int is_unmatched_ref(const struct ref *ref)
 {
struct object_id oid;
@@ -563,6 +546,8 @@ static void filter_refs(struct fetch_pack_args *args,
struct ref *ref, *next;
struct oidset tip_oids = OIDSET_INIT;
int i;
+   int strict = !(allow_unadvertised_object_request &
+  (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
 
i = 0;
for (ref = *refs; ref; ref = next) {
@@ -599,16 +584,25 @@ static void filter_refs(struct fetch_pack_args *args,
}
}
 
+   if (strict) {
+   for (i = 0; i < nr_sought; i++) {
+   ref = sought[i];
+   if (!is_unmatched_ref(ref))
+   continue;
+
+   add_refs_to_oidset(_oids, unmatched);
+   add_refs_to_oidset(_oids, newlist);
+   break;
+   }
+   }
+
/* Append unmatched requests to the list */
for (i = 0; i < nr_sought; i++) {
ref = sought[i];
if (!is_unmatched_ref(ref))
continue;
 
-   if ((allow_unadvertised_object_request &
-(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
-   tip_oids_contain(_oids, unmatched, newlist,
->old_oid)) {
+   if (!strict || oidset_contains(_oids, >old_oid)) {
ref->match_status = REF_MATCHED;
*newtail = copy_ref(ref);
newtail = &(*newtail)->next;
-- 
2.19.0


[PATCH v3 1/5] fetch-pack: factor out is_unmatched_ref()

2018-10-04 Thread René Scharfe
Move the code to determine if a request is unmatched to its own little
helper.  This allows us to reuse it in a subsequent patch.

Signed-off-by: Rene Scharfe 
---
 fetch-pack.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 75047a4b2a..3b317952f0 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -543,6 +543,16 @@ static int tip_oids_contain(struct oidset *tip_oids,
return oidset_contains(tip_oids, id);
 }
 
+static int is_unmatched_ref(const struct ref *ref)
+{
+   struct object_id oid;
+   const char *p;
+   return  ref->match_status == REF_NOT_MATCHED &&
+   !parse_oid_hex(ref->name, , ) &&
+   *p == '\0' &&
+   oideq(, >old_oid);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
struct ref **refs,
struct ref **sought, int nr_sought)
@@ -591,15 +601,8 @@ static void filter_refs(struct fetch_pack_args *args,
 
/* Append unmatched requests to the list */
for (i = 0; i < nr_sought; i++) {
-   struct object_id oid;
-   const char *p;
-
ref = sought[i];
-   if (ref->match_status != REF_NOT_MATCHED)
-   continue;
-   if (parse_oid_hex(ref->name, , ) ||
-   *p != '\0' ||
-   !oideq(, >old_oid))
+   if (!is_unmatched_ref(ref))
continue;
 
if ((allow_unadvertised_object_request &
-- 
2.19.0


Re: [PATCH v2 2/2] oidset: use khash

2018-10-04 Thread René Scharfe
Am 04.10.2018 um 08:48 schrieb Jeff King:
> On Thu, Oct 04, 2018 at 07:56:44AM +0200, René Scharfe wrote:
> 
>>> As the comment above notes, I think we're really looking at the case
>>> where this gets populated on the first call, but not subsequent ones. It
>>> might be less hacky to use a "static int initialized" here. Or if we
>>> want to avoid hidden globals, put the logic into filter_refs() to decide
>>> when to populate.
>>
>> Right.  I'd prefer the latter, but was unable to find a nice way that
>> still populates the oidset lazily.  It's certainly worth another look,
>> and a separate series.
> 
> It's a little awkward because the lazy load happens in a conditional.
> You can fully encapsulate it like the patch below, but I actually don't
> think it's really helping readability.

*Shudder*

>>> It might be nice if these functions could hide inside oidset.c (and just
>>> declare the struct here). It looks like we might be able to do that with
>>> __KHASH_TYPE(), but the double-underscore implies that we're not
>>> supposed to. ;)
>>>
>>> I guess we also use a few of them in our inlines here. I'm not 100% sure
>>> that oidset_* needs to be inlined either, but this is at least a pretty
>>> faithful conversion of the original.
>>
>> We could inline all of the oidset functions, following the spirit of
>> klib/khash.h.
>>
>> Or we could uninline all of them and then may be able to clean up
>> oidset.h by using KHASH_DECLARE.  Perhaps we'd need to guard with an
>> "#ifndef THIS_IS_OIDSET_C" or similar to avoid a clash with KHASH_INIT.
>>
>> Not sure if any of that would be a worthwhile improvement..
> 
> Unless we know something is a performance win to inline, I'd generally
> prefer not to.

Agreed.

> For a case like this with auto-generated functions, I'm mostly worried
> about bloating the compiled code. Either with a bunch of inlined
> non-trivial functions, or cases where the compiler says "this is too big
> to inline" and generates an anonymous file-scope function, but we end up
> with a bunch of duplicates, because we're generating the same functions
> in a bunch of C files.

The _iter_ functions look harmless in this regard, as the only use small
functions that are not even type-specific.

oidset_init() would better be moved to oidset.c, as the code for resizing
is quite big.

René


Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error

2018-10-04 Thread Ananya Krishna Maram
On Thu, 4 Oct 2018 at 19:56, Johannes Schindelin
 wrote:
>
> Hi Ananya,
>
> thank you for taking the time to write this patch!
>
> On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:
>
> > the forward slash character should be escaped with backslash. Fix
> > Unescaped forward slash error in Python regex statements.
> >
> > Signed-off-by: Ananya Krishna Maram
>
> That explains pretty well what you did, but I wonder why the forward slash
> needs to be escaped? I would understand if we enclosed the pattern in
> `//`, as it is done e.g. in Javascript, but we do not...

You are correct, the code would execute either ways. But when I came across
this line, I didn't get it's meaning instantly because as per standards, forward
slash has to be escaped. In fact when open source code is written according to
standards, the code will be reachable to more people.

Thanks,
Ananya.

> Thanks,
> Johannes
>
> > ---
> >  userdiff.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/userdiff.c b/userdiff.c
> > index f565f6731..f4ff9b9e5 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
> >/* -- */
> >"[a-zA-Z_][a-zA-Z0-9_]*"
> >"|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
> > -  "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
> > +  "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
> >/* -- */
> >  PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> >/* -- */
> > --
> > 2.17.1
> >
> >


[PATCH v3 0/5] oidset: use khash

2018-10-04 Thread René Scharfe
Two new patches to avoid using oidset internals in fetch-pack:

  fetch-pack: factor out is_unmatched_ref()
  fetch-pack: load tip_oids eagerly iff needed

Unchanged patch:

  khash: factor out kh_release_*

Unchanged, except it doesn't touch fetch-pack anymore:

  oidset: use khash

A new patch, to reduce object text size:

  oidset: uninline oidset_init()

 fetch-pack.c | 49 +++--
 khash.h  |  9 +++--
 oidset.c | 41 +++--
 oidset.h | 43 ---
 4 files changed, 81 insertions(+), 61 deletions(-)

-- 
2.19.0



Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error

2018-10-04 Thread Johannes Schindelin
Hi Ananya,

thank you for taking the time to write this patch!

On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:

> the forward slash character should be escaped with backslash. Fix
> Unescaped forward slash error in Python regex statements.
> 
> Signed-off-by: Ananya Krishna Maram

That explains pretty well what you did, but I wonder why the forward slash
needs to be escaped? I would understand if we enclosed the pattern in
`//`, as it is done e.g. in Javascript, but we do not...

Thanks,
Johannes

> ---
>  userdiff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/userdiff.c b/userdiff.c
> index f565f6731..f4ff9b9e5 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
>/* -- */
>"[a-zA-Z_][a-zA-Z0-9_]*"
>"|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
> -  "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
> +  "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
>/* -- */
>  PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
>/* -- */
> -- 
> 2.17.1
> 
> 


Re: inside the git folder

2018-10-04 Thread Stephen Smith
Chris -

You may want to look at "git bundle" to transfer the repository contents.   
Then the recipient could fetch from the bundle to get the source git history.

Just a thought.

sps

On Thursday, October 4, 2018 4:03:27 AM MST Johannes Schindelin wrote:
> Hi Chris,
> 
> as mentioned by Stefan (who is a respected, active core Git contributor,
> if you need any more arguments to listen to him), it is inappropriate to
> copy the contents of the .git/ directory wholesale to another user's
> machine.
> 
> For one, it would horribly break in case the user overrode `user.email` in
> `.git/config`. That's one setting that should not be copied *anywhere*.
> And that's just one, there are *plenty* more examples. Just think of
> absolute paths referring to files that probably do not even exist on
> another machine! Like, worktrees, etc.
> 
> Of course, you could start a list of exceptions (files, config keys, etc)
> that should not be copied. But that's very fragile a solution.
> 
> So no, copying the .git/ directory is always the wrong thing to do, as
> Stefan pointed out.
> 
> I could imagine that a much better idea is to identify a *positive* list
> of things you want to copy over. The output of `git rev-parse
> --symbolic-full-name HEAD`? Sure. Maybe even the output of `git rev-parse
> --symbolic-full-name HEAD@{u}`? And then the URL of the corresponding
> remote? Sure. `.git/objects/alternates/`? Absolutely not.
> 
> It is tedious, alright, but you simply cannot copy the contents of .git/
> to another machine and expect that to work.
> 
> Ciao,
> Johannes
> 
> On Thu, 4 Oct 2018, Chris Jeschke wrote:
> > Hi Stefan,
> > 
> > thanks for your answer.
> > 
> > The Goal after sending the files is to have a copy on the remote site.
> > This includes that the working directory is the same (what we already
> > guarantee with our tool) and that git is at the same 'state' (that
> > means that we have the same history and that we checkout at the same
> > branch/commit).
> > My idea:
> > Send the working directory with our  tool
> > Initialize a Git directory on the remote side
> > Send the 'objects','refs', 'HEAD' and the 'gitignore' with our tool
> > 
> > Is there anything else I should take care of?
> > 
> > Am Mi., 3. Okt. 2018 um 20:51 Uhr schrieb Stefan Beller 
:
> > > On Wed, Oct 3, 2018 at 5:26 AM Chris Jeschke
> > > 
> > >  wrote:
> > > > Hey git-team,
> > > > I am working on a plug-in for a distributed pair programming tool. To
> > > > skip the details: I was thinking about sending parts of the git folder
> > > > as a zip folder with our own Bytestream instead of using the git API.
> > > > Is there a common sense about what should and what shouldn't be done
> > > > when working with the files inside the git folder?
> > > 
> > > This contradicts the security model of git.
> > > 
> > > Locally I can do things like:
> > > git config alias.co "rm -rf ~"
> > > echo "rm -rf ~" >.git/hooks/{...}
> > > 
> > > and I would experience bad things, but that is ok,
> > > as I configured it locally (supposedly I know what
> > > I am doing); but if I have the ability to send these
> > > tricks to my beloved coworkers, hilarity might ensue.
> > > 
> > > What stuff do you need to send around?
> > > 
> > > objects? Fine, as the receive could check they are
> > > good using fsck.
> > > 
> > > refs/ ? Sure. It may be confusing to users,
> > > but I am sure you'll figure UX out.
> > > 
> > > local config, hooks ? I would not.
> > > 
> > > Not sure what else you'd think of sending around.
> > > 
> > > Cheers,
> > > Stefan






[PATCH] [Outreachy] git/userdiff.c fix regex pattern error

2018-10-04 Thread Ananya Krishna Maram
the forward slash character should be escaped with backslash. Fix
Unescaped forward slash error in Python regex statements.

Signed-off-by: Ananya Krishna Maram
---
 userdiff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/userdiff.c b/userdiff.c
index f565f6731..f4ff9b9e5 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
 /* -- */
 "[a-zA-Z_][a-zA-Z0-9_]*"
 "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
-"|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
+"|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
 /* -- */
 PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
 /* -- */
-- 
2.17.1



Re: inside the git folder

2018-10-04 Thread Johannes Schindelin
Hi Chris,

as mentioned by Stefan (who is a respected, active core Git contributor,
if you need any more arguments to listen to him), it is inappropriate to
copy the contents of the .git/ directory wholesale to another user's
machine.

For one, it would horribly break in case the user overrode `user.email` in
`.git/config`. That's one setting that should not be copied *anywhere*.
And that's just one, there are *plenty* more examples. Just think of
absolute paths referring to files that probably do not even exist on
another machine! Like, worktrees, etc.

Of course, you could start a list of exceptions (files, config keys, etc)
that should not be copied. But that's very fragile a solution.

So no, copying the .git/ directory is always the wrong thing to do, as
Stefan pointed out.

I could imagine that a much better idea is to identify a *positive* list
of things you want to copy over. The output of `git rev-parse
--symbolic-full-name HEAD`? Sure. Maybe even the output of `git rev-parse
--symbolic-full-name HEAD@{u}`? And then the URL of the corresponding
remote? Sure. `.git/objects/alternates/`? Absolutely not.

It is tedious, alright, but you simply cannot copy the contents of .git/
to another machine and expect that to work.

Ciao,
Johannes

On Thu, 4 Oct 2018, Chris Jeschke wrote:

> Hi Stefan,
> 
> thanks for your answer.
> 
> The Goal after sending the files is to have a copy on the remote site.
> This includes that the working directory is the same (what we already
> guarantee with our tool) and that git is at the same 'state' (that
> means that we have the same history and that we checkout at the same
> branch/commit).
> My idea:
> Send the working directory with our  tool
> Initialize a Git directory on the remote side
> Send the 'objects','refs', 'HEAD' and the 'gitignore' with our tool
> 
> Is there anything else I should take care of?
> 
> Am Mi., 3. Okt. 2018 um 20:51 Uhr schrieb Stefan Beller :
> >
> > On Wed, Oct 3, 2018 at 5:26 AM Chris Jeschke
> >  wrote:
> > >
> > > Hey git-team,
> > > I am working on a plug-in for a distributed pair programming tool. To
> > > skip the details: I was thinking about sending parts of the git folder
> > > as a zip folder with our own Bytestream instead of using the git API.
> > > Is there a common sense about what should and what shouldn't be done
> > > when working with the files inside the git folder?
> >
> > This contradicts the security model of git.
> > Locally I can do things like:
> > git config alias.co "rm -rf ~"
> > echo "rm -rf ~" >.git/hooks/{...}
> > and I would experience bad things, but that is ok,
> > as I configured it locally (supposedly I know what
> > I am doing); but if I have the ability to send these
> > tricks to my beloved coworkers, hilarity might ensue.
> >
> > What stuff do you need to send around?
> >
> > objects? Fine, as the receive could check they are
> > good using fsck.
> >
> > refs/ ? Sure. It may be confusing to users,
> > but I am sure you'll figure UX out.
> >
> > local config, hooks ? I would not.
> >
> > Not sure what else you'd think of sending around.
> >
> > Cheers,
> > Stefan
> 


[PATCH v2 4/5] diff --color-moved-ws: fix another memory leak

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

This is obvious in retrospect, it was found with asan.

Signed-off-by: Phillip Wood 
---
 diff.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/diff.c b/diff.c
index 9bf41bad0d..69f6309db6 100644
--- a/diff.c
+++ b/diff.c
@@ -969,6 +969,8 @@ static void pmb_advance_or_null_multi_match(struct 
diff_options *o,
moved_block_clear([i]);
}
}
+
+   free(got_match);
 }
 
 static int shrink_potential_moved_blocks(struct moved_block *pmb,
-- 
2.19.0



[PATCH v2 5/5] diff --color-moved: fix a memory leak

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

Free the hashmap items as well as the hashmap itself. This was found
with asan.

Signed-off-by: Phillip Wood 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 69f6309db6..100d24b9c4 100644
--- a/diff.c
+++ b/diff.c
@@ -5768,8 +5768,8 @@ static void diff_flush_patch_all_file_pairs(struct 
diff_options *o)
if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
dim_moved_lines(o);
 
-   hashmap_free(_lines, 0);
-   hashmap_free(_lines, 0);
+   hashmap_free(_lines, 1);
+   hashmap_free(_lines, 1);
}
 
for (i = 0; i < esm.nr; i++)
-- 
2.19.0



[PATCH v2 1/5] diff --color-moved-ws: fix double free crash

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

Running
  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
results in a crash due to a double free. This happens when two
potential moved blocks start with consecutive lines. As
pmb_advance_or_null_multi_match() advances it copies the ws_delta from
the last matching line to the next. When the first of our consecutive
lines is advanced its ws_delta well be copied to the second,
overwriting the ws_delta of the block containing the second line. Then
when the second line is advanced it will copy the new ws_delta to the
line below it and so on. Eventually one of these blocks will stop
matching and the ws_delta will be freed. From then on the other block
is in a use-after-free state and when it stops matching it will try to
free the ws_delta that has already been freed by the other block.

The solution is to store the ws_delta in the array of potential moved
blocks rather than with the lines. This means that it no longer needs
to be copied around and one block cannot overwrite the ws_delta of
another. Additionally it saves some malloc/free calls as we don't keep
allocating and freeing ws_deltas.

Signed-off-by: Phillip Wood 
---

Notes:
This applies on top of fab01ec52e ("diff: fix
--color-moved-ws=allow-indentation-change", 2018-09-04). Note that the
crash happens with or without that patch, but the mechanism is
slightly different without it.

Changes since v1:
 - updated comments to match new implementation.

 diff.c | 82 --
 1 file changed, 45 insertions(+), 37 deletions(-)

diff --git a/diff.c b/diff.c
index 9393993e33..02d885f039 100644
--- a/diff.c
+++ b/diff.c
@@ -751,7 +751,6 @@ struct moved_entry {
struct hashmap_entry ent;
const struct emitted_diff_symbol *es;
struct moved_entry *next_line;
-   struct ws_delta *wsd;
 };
 
 /**
@@ -768,6 +767,17 @@ struct ws_delta {
 };
 #define WS_DELTA_INIT { NULL, 0 }
 
+struct moved_block {
+   struct moved_entry *match;
+   struct ws_delta wsd;
+};
+
+static void moved_block_clear(struct moved_block *b)
+{
+   FREE_AND_NULL(b->wsd.string);
+   b->match = NULL;
+}
+
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
 const struct emitted_diff_symbol *b,
 struct ws_delta *out)
@@ -785,7 +795,7 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
 static int cmp_in_block_with_wsd(const struct diff_options *o,
 const struct moved_entry *cur,
 const struct moved_entry *match,
-struct moved_entry *pmb,
+struct moved_block *pmb,
 int n)
 {
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
@@ -805,25 +815,24 @@ static int cmp_in_block_with_wsd(const struct 
diff_options *o,
if (strcmp(a, b))
return 1;
 
-   if (!pmb->wsd)
+   if (!pmb->wsd.string)
/*
-* No white space delta was carried forward? This can happen
-* when we exit early in this function and do not carry
-* forward ws.
+* The white space delta is not active? This can happen
+* when we exit early in this function.
 */
return 1;
 
/*
-* The indent changes of the block are known and carried forward in
+* The indent changes of the block are known and stored in
 * pmb->wsd; however we need to check if the indent changes of the
 * current line are still the same as before.
 *
 * To do so we need to compare 'l' to 'cur', adjusting the
 * one of them for the white spaces, depending which was longer.
 */
 
-   wslen = strlen(pmb->wsd->string);
-   if (pmb->wsd->current_longer) {
+   wslen = strlen(pmb->wsd.string);
+   if (pmb->wsd.current_longer) {
c += wslen;
cl -= wslen;
} else {
@@ -873,7 +882,6 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
ret->es = l;
ret->next_line = NULL;
-   ret->wsd = NULL;
 
return ret;
 }
@@ -913,76 +921,72 @@ static void add_lines_to_move_detection(struct 
diff_options *o,
 static void pmb_advance_or_null(struct diff_options *o,
struct moved_entry *match,
struct hashmap *hm,
-   struct moved_entry **pmb,
+   struct moved_block *pmb,
int pmb_nr)
 {
int i;
for (i = 0; i < pmb_nr; i++) {
-   struct moved_entry *prev = pmb[i];
+   struct moved_entry *prev = pmb[i].match;

[PATCH v2 0/5] diff --color-moved-ws: fix double free crash

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

Thanks to Stefan and Martin for their comments on v1. This version has
some small changes to patches 1-3 based on that feedback - see the
individual patches for what has changed.

Phillip Wood (5):
  diff --color-moved-ws: fix double free crash
  diff --color-moved-ws: fix out of bounds string access
  diff --color-moved-ws: fix a memory leak
  diff --color-moved-ws: fix another memory leak
  diff --color-moved: fix a memory leak

 diff.c | 95 +-
 1 file changed, 54 insertions(+), 41 deletions(-)

-- 
2.19.0



[PATCH v2 2/5] diff --color-moved-ws: fix out of bounds string access

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

When adjusting the start of the string to take account of the change
in indentation the code was not checking that the string being
adjusted was in fact longer than the indentation change. This was
detected by asan.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since v1:
 - simplified comparison as suggested by Stefan

 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 02d885f039..e492f8b74f 100644
--- a/diff.c
+++ b/diff.c
@@ -840,7 +840,7 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
al -= wslen;
}
 
-   if (strcmp(a, c))
+   if (al != cl || memcmp(a, c, al))
return 1;
 
return 0;
-- 
2.19.0



[PATCH v2 3/5] diff --color-moved-ws: fix a memory leak

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

Don't duplicate the indentation string if we're not going to use it.
This was found with asan.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since v1:
 - simplified code as suggested by Martin Ågren

 diff.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index e492f8b74f..9bf41bad0d 100644
--- a/diff.c
+++ b/diff.c
@@ -786,10 +786,13 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
int d = longer->len - shorter->len;
 
+   if (strncmp(longer->line + d, shorter->line, shorter->len))
+   return 0;
+
out->string = xmemdupz(longer->line, d);
out->current_longer = (a == longer);
 
-   return !strncmp(longer->line + d, shorter->line, shorter->len);
+   return 1;
 }
 
 static int cmp_in_block_with_wsd(const struct diff_options *o,
-- 
2.19.0



Re: inside the git folder

2018-10-04 Thread Chris Jeschke
Hi Stefan,

thanks for your answer.

The Goal after sending the files is to have a copy on the remote site.
This includes that the working directory is the same (what we already
guarantee with our tool) and that git is at the same 'state' (that
means that we have the same history and that we checkout at the same
branch/commit).
My idea:
Send the working directory with our  tool
Initialize a Git directory on the remote side
Send the 'objects','refs', 'HEAD' and the 'gitignore' with our tool

Is there anything else I should take care of?

Am Mi., 3. Okt. 2018 um 20:51 Uhr schrieb Stefan Beller :
>
> On Wed, Oct 3, 2018 at 5:26 AM Chris Jeschke
>  wrote:
> >
> > Hey git-team,
> > I am working on a plug-in for a distributed pair programming tool. To
> > skip the details: I was thinking about sending parts of the git folder
> > as a zip folder with our own Bytestream instead of using the git API.
> > Is there a common sense about what should and what shouldn't be done
> > when working with the files inside the git folder?
>
> This contradicts the security model of git.
> Locally I can do things like:
> git config alias.co "rm -rf ~"
> echo "rm -rf ~" >.git/hooks/{...}
> and I would experience bad things, but that is ok,
> as I configured it locally (supposedly I know what
> I am doing); but if I have the ability to send these
> tricks to my beloved coworkers, hilarity might ensue.
>
> What stuff do you need to send around?
>
> objects? Fine, as the receive could check they are
> good using fsck.
>
> refs/ ? Sure. It may be confusing to users,
> but I am sure you'll figure UX out.
>
> local config, hooks ? I would not.
>
> Not sure what else you'd think of sending around.
>
> Cheers,
> Stefan


Re: git fetch behaves weirdely when run in a worktree

2018-10-04 Thread Duy Nguyen
On Thu, Oct 4, 2018 at 8:55 AM Kaartic Sivaraam
 wrote:
>
>
>
> On 3 October 2018 00:13:06 GMT+05:30, Kaartic Sivaraam 
>  wrote:
> >Hi,
> >
> >Sorry for the delay. Got a little busy over the weekend. I seem to have
> >found the reason behind the issue in the mean time :-)
> >
>
> Oops, I forgot to mention there's more comments inline!
>
> BTW, is there an issue if .git/HEAD and .git/index are owned by root? The 
> owners seem to have changed since I created the worktree possibly due to the 
> cron job. Just wondering if it might cause some issues.

If the files are still readable by normal user and $GIT_DIR has
permission to delete/rename them, then I think everything still works.
-- 
Duy


Re: git fetch behaves weirdely when run in a worktree

2018-10-04 Thread Kaartic Sivaraam



On 3 October 2018 00:13:06 GMT+05:30, Kaartic Sivaraam 
 wrote:
>Hi,
>
>Sorry for the delay. Got a little busy over the weekend. I seem to have
>found the reason behind the issue in the mean time :-)
>

Oops, I forgot to mention there's more comments inline!

BTW, is there an issue if .git/HEAD and .git/index are owned by root? The 
owners seem to have changed since I created the worktree possibly due to the 
cron job. Just wondering if it might cause some issues.


>On Wed, 2018-09-26 at 10:05 -0700, Junio C Hamano wrote:
>> Duy Nguyen  writes:
>> 
>> > On Wed, Sep 26, 2018 at 05:24:14PM +0200, Duy Nguyen wrote:
>> > > On Wed, Sep 26, 2018 at 6:46 AM Kaartic Sivaraam
>> > >  wrote:
>> > > > This is the most interesting part of the issue. I **did not**
>run
>> > > > 'git fetch ...' in between those cat commands. I was surprised
>by
>> > > > how the contents of FETCH_HEAD are changing without me spawning
>any
>> > > > git processes that might change it. Am I missing something
>here? As
>> > > > far as i could remember, there wasn't any IDE running that
>might
>> > > > automatically spawn git commands especially in that work
>> > > > tree. Weird.
>> > 
>> > Maybe something like this could help track down that rogue "git
>fetch"
>> > process (it's definitely _some_ process writing to the wrong file;
>or
>> > some file synchronization thingy is going on). You can log more of
>> > course, but this is where FETCH_HEAD is updated.
>> 
>
>Thanks for the patch! It really helped me identify the issue.
>
>The actual culprit here doesn't seem to be Git itself. It was the "git-
>prompt" bash plug-in[1] I was using. It seems to be spawning "git
>fetch" for every command I type on shell. That answers why the
>FETCH_HEAD was being updated even though I wasn't explicitly running
>it. The git bash plug-in seems to be fetching changes for *all* the
>upstream branches. That answers why there FETCH_HEAD was populated with
>info about all the branches when I explicitly requested for the next
>branch. I've also verified that `git fetch origin next` works as
>intended (FETCH_HEAD has info only about orgin/next) when I turn-off
>the plug-in which confirms that it's the culprit. A cursory search
>found me a related issue[2] but I'm not sure if it's the exact same
>one.
>
>I could identify the culprit only with the help of Duy's patch. Thanks
>Duy! Sorry for not realising this earlier. I almost forgot I'm using it
>as I've been accustomed to it a lot.
>
>
>> Well, a background-ish thing could be some vendor-provided copy of
>> Git that has nothing to do with what Kaartic would be compiling with
>> this patch X-<.
>
>Fortunately, it wasn't the case here as the plug-in was using my
>manually-built version of Git :-)
>
>Thanks for the help!
>
>Tag-line: Sometimes tools become part of our workflow so much that we
>really don't realise their presence. It's an indication of a good tool
>but we should be aware of suspecting them when an issue arises!
>Something which I should have done to realise the issue ealier x-<
>
>
>References:
>[1]: https://github.com/magicmonty/bash-git-prompt
>[2]: https://github.com/magicmonty/bash-git-prompt/issues/125
>
>Thanks,
>Sivaraam

-- 
Sivaraam

Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH v2 2/2] oidset: use khash

2018-10-04 Thread Jeff King
On Thu, Oct 04, 2018 at 02:48:33AM -0400, Jeff King wrote:

> On Thu, Oct 04, 2018 at 07:56:44AM +0200, René Scharfe wrote:
> 
> > > As the comment above notes, I think we're really looking at the case
> > > where this gets populated on the first call, but not subsequent ones. It
> > > might be less hacky to use a "static int initialized" here. Or if we
> > > want to avoid hidden globals, put the logic into filter_refs() to decide
> > > when to populate.
> > 
> > Right.  I'd prefer the latter, but was unable to find a nice way that
> > still populates the oidset lazily.  It's certainly worth another look,
> > and a separate series.
> 
> It's a little awkward because the lazy load happens in a conditional.
> You can fully encapsulate it like the patch below, but I actually don't
> think it's really helping readability.

I forgot the patch, of course. ;)

I'm not really proposing this, just illustrating one direction (that I
think is kind of ugly). Notably it doesn't get rid of the tricky comment
in tip_oids_contain(), because that is explaining why the single load
works even on a list we're still adding to.

diff --git a/fetch-pack.c b/fetch-pack.c
index a839315726..a6212c8758 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -526,8 +526,14 @@ static void add_refs_to_oidset(struct oidset *oids, struct 
ref *refs)
oidset_insert(oids, >old_oid);
 }
 
-static int tip_oids_contain(struct oidset *tip_oids,
-   struct ref *unmatched, struct ref *newlist,
+struct lazy_tip_oids {
+   int loaded;
+   struct oidset oids;
+   struct ref *unmatched;
+   struct ref *newlist;
+};
+
+static int tip_oids_contain(struct lazy_tip_oids *tip_oids,
const struct object_id *id)
 {
/*
@@ -536,11 +542,12 @@ static int tip_oids_contain(struct oidset *tip_oids,
 * add to "newlist" between calls, the additions will always be for
 * oids that are already in the set.
 */
-   if (!tip_oids->set.n_buckets) {
-   add_refs_to_oidset(tip_oids, unmatched);
-   add_refs_to_oidset(tip_oids, newlist);
+   if (!tip_oids->loaded) {
+   add_refs_to_oidset(_oids->oids, tip_oids->unmatched);
+   add_refs_to_oidset(_oids->oids, tip_oids->newlist);
+   tip_oids->loaded = 1;
}
-   return oidset_contains(tip_oids, id);
+   return oidset_contains(_oids->oids, id);
 }
 
 static void filter_refs(struct fetch_pack_args *args,
@@ -551,7 +558,7 @@ static void filter_refs(struct fetch_pack_args *args,
struct ref **newtail = 
struct ref *unmatched = NULL;
struct ref *ref, *next;
-   struct oidset tip_oids = OIDSET_INIT;
+   struct lazy_tip_oids tip_oids = { 0 };
int i;
 
i = 0;
@@ -589,6 +596,9 @@ static void filter_refs(struct fetch_pack_args *args,
}
}
 
+   tip_oids.unmatched = unmatched;
+   tip_oids.newlist = newlist;
+
/* Append unmatched requests to the list */
for (i = 0; i < nr_sought; i++) {
struct object_id oid;
@@ -604,8 +614,7 @@ static void filter_refs(struct fetch_pack_args *args,
 
if ((allow_unadvertised_object_request &
 (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1)) ||
-   tip_oids_contain(_oids, unmatched, newlist,
->old_oid)) {
+   tip_oids_contain(_oids, >old_oid)) {
ref->match_status = REF_MATCHED;
*newtail = copy_ref(ref);
newtail = &(*newtail)->next;
@@ -614,7 +623,7 @@ static void filter_refs(struct fetch_pack_args *args,
}
}
 
-   oidset_clear(_oids);
+   oidset_clear(_oids.oids);
for (ref = unmatched; ref; ref = next) {
next = ref->next;
free(ref);


Re: [PATCH v2 2/2] oidset: use khash

2018-10-04 Thread Jeff King
On Thu, Oct 04, 2018 at 07:56:44AM +0200, René Scharfe wrote:

> > As the comment above notes, I think we're really looking at the case
> > where this gets populated on the first call, but not subsequent ones. It
> > might be less hacky to use a "static int initialized" here. Or if we
> > want to avoid hidden globals, put the logic into filter_refs() to decide
> > when to populate.
> 
> Right.  I'd prefer the latter, but was unable to find a nice way that
> still populates the oidset lazily.  It's certainly worth another look,
> and a separate series.

It's a little awkward because the lazy load happens in a conditional.
You can fully encapsulate it like the patch below, but I actually don't
think it's really helping readability.

> >> +KHASH_INIT(oid, struct object_id, int, 0, oid_hash, oid_equal)
> > 
> > This will declare these "static inline". Our other major "macros become
> > inline functions" code is commit-slab.h, and there we found it necessary
> > to add MAYBE_UNUSED. I wonder if we ought to be doing the same here (I
> > don't get any warnings, but I suspect sparse might complain).
> 
> I doubt it (but didn't check) because khash.h defines kh_clear_##name(),
> which we don't use it anywhere and there have been no complaints so far.
> And if we wanted to add MAYBE_UNUSED then the right place for that would
> be in KHASH_INIT, no?

Right, that would be the correct spot. I'm OK to leave it until somebody
complains. Looking at commit-slab again, its functions are straight
statics, not static inline. That's probably the important difference.

> > It might be nice if these functions could hide inside oidset.c (and just
> > declare the struct here). It looks like we might be able to do that with
> > __KHASH_TYPE(), but the double-underscore implies that we're not
> > supposed to. ;)
> > 
> > I guess we also use a few of them in our inlines here. I'm not 100% sure
> > that oidset_* needs to be inlined either, but this is at least a pretty
> > faithful conversion of the original.
> 
> We could inline all of the oidset functions, following the spirit of
> klib/khash.h.
> 
> Or we could uninline all of them and then may be able to clean up
> oidset.h by using KHASH_DECLARE.  Perhaps we'd need to guard with an
> "#ifndef THIS_IS_OIDSET_C" or similar to avoid a clash with KHASH_INIT.
> 
> Not sure if any of that would be a worthwhile improvement..

Unless we know something is a performance win to inline, I'd generally
prefer not to.

For a case like this with auto-generated functions, I'm mostly worried
about bloating the compiled code. Either with a bunch of inlined
non-trivial functions, or cases where the compiler says "this is too big
to inline" and generates an anonymous file-scope function, but we end up
with a bunch of duplicates, because we're generating the same functions
in a bunch of C files.

I may be worried about nothing, though. I don't know how clever the
compiler and linker can be there.

-Peff