[PATCH v2] remote.c: avoid O(m*n) behavior in match_push_refs

2013-07-08 Thread Brandon Casey
When pushing using a matching refspec or a pattern refspec, each ref
in the local repository must be paired with a ref advertised by the
remote server.  This is accomplished by using the refspec to transform
the name of the local ref into the name it should have in the remote
repository, and then performing a linear search through the list of
remote refs to see if the remote ref was advertised by the remote
system.

Each of these lookups has O(n) complexity and makes match_push_refs()
be an O(m*n) operation, where m is the number of local refs and n is
the number of remote refs.  If there are many refs 100,000+, then this
ref matching can take a significant amount of time.  Let's prepare an
index of the remote refs to allow searching in O(log n) time and
reduce the complexity of match_push_refs() to O(m log n).

We prepare the index lazily so that it is only created when necessary.
So, there should be no impact when _not_ using a matching or pattern
refspec, i.e. when pushing using only explicit refspecs.

Dry-run push of a repository with 121,913 local and remote refs:

before after
real1m40.582s  0m0.804s
user1m39.914s  0m0.515s
sys 0m0.125s   0m0.106s

The creation of the index has overhead.  So, if there are very few
local refs, then it could take longer to create the index than it
would have taken to just perform n linear lookups into the remote
ref space.  Using the index should provide some improvement when
the number of local refs is roughly greater than the log of the
number of remote refs (i.e. m >= log n).  The pathological case is
when there is a single local ref and very many remote refs.

Dry-run push of a repository with 121,913 remote refs and a single
local ref:

beforeafter
real0m0.525s  0m0.566s
user0m0.243s  0m0.279s
sys 0m0.075s  0m0.099s

Using an index takes 41 ms longer, or roughly 7.8% longer.

Jeff King measured a no-op push of a single ref into a remote repo
with 370,000 refs:

beforeafter
real0m1.087s  0m1.156s
user0m1.344s  0m1.412s
sys 0m0.288s  0m0.284s

Using an index takes 69 ms longer, or roughly 6.3% longer.

None of the measurements above required transferring any objects to
the remote repository.  If the push required transferring objects and
updating the refs in the remote repository, the impact of preparing
the search index would be even smaller.

Note, we refrain from using an index in the send_prune block since it
is expected that the number of refs that are being pruned is more
commonly much smaller than the number of local refs (i.e. m << n,
and particularly m < log(n), where m is the number of refs that
should be pruned and n is the number of local refs), so the overhead
of creating the search index would likely exceed the benefit of using
it.

Signed-off-by: Brandon Casey 
---

Here is the reroll with an updated commit message that hopefully
provides a little more detail to justify this change.  I removed
the use of the search index in the send_prune block since I think
that pruning many refs is an uncommon operation and the overhead
of creating the index will more commonly exceed the benefit of
using it.

This version now lazily builds the search index in the first loop,
so there should be no impact when pushing using explicit refspecs.

e.g. pushing a change for review to Gerrit

   $ git push origin HEAD:refs/for/master

I suspect that this is the most common form of pushing and furthermore
will become the default once push.default defaults to 'current'.

The remaining push cases can be distilled into the following:

  ref-countimpact
  m >= log n   improved with this patch
  m < log nregressed with this patch roughly ~6-7%

So, I think what we have to consider is whether the improvement to
something like 'git push --mirror' is worth the impact to an asymmetric
push where the number of local refs is much smaller than the number of
remote refs.  I'm not sure how common the latter really is though.
Gerrit does produce repositories with many refs on the remote end in
the refs/changes/ namespace, but do people commonly push to Gerrit
using matching or pattern refspecs?  Not sure, but I'd tend to think
that they don't.

-Brandon

 remote.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/remote.c b/remote.c
index 6f57830..8bca65a 100644
--- a/remote.c
+++ b/remote.c
@@ -1302,6 +1302,14 @@ static void add_missing_tags(struct ref *src, struct ref 
**dst, struct ref ***ds
free(sent_tips.tip);
 }
 
+static void prepare_ref_index(struct string_list *ref_index, struct ref *ref)
+{
+   for ( ; ref; ref = ref->next)
+   string_list_append_nodup(ref_index, ref->name)->util = ref;
+
+   sort_string_list(ref_index);
+}
+
 /*
  * Given the set of refs the local repository has, the set of refs the
  * remote repository has, and the refspec used for push, determine
@@ -1320,6 +1328,7 @@ int match_push_refs(struct ref *src, struct re

Re: A local shared clone is now much slower

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 01:03:55PM +1000, Stephen Rothwell wrote:

> So commit 0433ad128c59 ("clone: run check_everything_connected") (which
> turned up with v1.8.3) added a large traversal to clone which (as the
> comment said) makes a clone much slower.  It is especially noticeable on
> "git clone -s -l -n" which I use every day and used to be almost
> instant.  Is there any thought to making it fast again, please?
> 
> The above clone is very useful for working with different branches in one
> tree without touching every file in the main branch you are working
> with (and consequent issues with rebuilding at least).  As linux-next
> maintainer, you can imagine that I do this a bit.

Yeah, I have noticed it is somewhat annoying, as well, because the
proportion of time taken for the check is so much larger compared to the
relatively instant time taken for the local shared clone.

The point of that commit is to add the same safety checks to clone that
we do for fetching. But in the local shared-repo case, I really feel
like all safety bets are off anyway. You are not creating a verified
redundant copy at all, and there are still corruptions that can sneak
through (e.g., bit corruptions of blob objects).

So maybe this:

-- >8 --
Subject: [PATCH] clone: drop connectivity check for local clones

Commit 0433ad1 (clone: run check_everything_connected,
2013-03-25) added the same connectivity check to clone that
we use for fetching. The intent was to provide enough safety
checks that "git clone git://..." could be counted on to
detect bit errors and other repo corruption, and not
silently propagate them to the clone.

For local clones, this turns out to be a bad idea, for two
reasons:

  1. Local clones use hard linking (or even shared object
 stores), and so complete far more quickly. The time
 spent on the connectivity check is therefore
 proportionally much more painful.

  2. Local clones do not actually meet our safety guarantee
 anyway. The connectivity check makes sure we have all
 of the objects we claim to, but it does not check for
 bit errors. We will notice bit errors in commits and
 trees, but we do not load blob objects at all. Whereas
 over the pack transport, we actually recompute the sha1
 of each object in the incoming packfile; bit errors
 change the sha1 of the object, which is then caught by
 the connectivity check.

This patch drops the connectivity check in the local case.
Note that we have to revert the changes from 0433ad1 to
t5710, as we no longer notice the corruption during clone.

We could go a step further and provide a "verify even local
clones" option, but it is probably not worthwhile. You can
already spell that as "cd foo.git && git fsck && git clone ."
or as "git clone --no-local foo.git".

Signed-off-by: Jeff King 
---
 builtin/clone.c   | 22 +-
 t/t5710-info-alternate.sh |  8 +++-
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 14b1323..dafb6b5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -545,17 +545,20 @@ static void update_remote_refs(const struct ref *refs,
   const struct ref *remote_head_points_at,
   const char *branch_top,
   const char *msg,
-  struct transport *transport)
+  struct transport *transport,
+  int check_connectivity)
 {
const struct ref *rm = mapped_refs;
 
-   if (0 <= option_verbosity)
-   printf(_("Checking connectivity... "));
-   if (check_everything_connected_with_transport(iterate_ref_map,
- 0, &rm, transport))
-   die(_("remote did not send all necessary objects"));
-   if (0 <= option_verbosity)
-   printf(_("done\n"));
+   if (check_connectivity) {
+   if (0 <= option_verbosity)
+   printf(_("Checking connectivity... "));
+   if (check_everything_connected_with_transport(iterate_ref_map,
+ 0, &rm, 
transport))
+   die(_("remote did not send all necessary objects"));
+   if (0 <= option_verbosity)
+   printf(_("done\n"));
+   }
 
if (refs) {
write_remote_refs(mapped_refs);
@@ -963,7 +966,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
transport_fetch_refs(transport, mapped_refs);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
-  branch_top.buf, reflog_msg.buf, transport);
+  branch_top.buf, reflog_msg.buf, transport,
+  !is_local);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/t

Re: [PATCH v2] remote.c: avoid O(m*n) behavior in match_push_refs

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 12:02:11AM -0700, Brandon Casey wrote:

> Here is the reroll with an updated commit message that hopefully
> provides a little more detail to justify this change.  I removed
> the use of the search index in the send_prune block since I think
> that pruning many refs is an uncommon operation and the overhead
> of creating the index will more commonly exceed the benefit of
> using it.

I don't know. I'd think that if you are using pruning, you might delete
a large chunk at one time (e.g., rearranging your ref hierarchy,
followed by "git push --mirror"). But that is just my gut feeling. I
haven't actually run into this slow-down in the real world (we typically
fetch from our giant repositories rather than push into them).

> This version now lazily builds the search index in the first loop,
> so there should be no impact when pushing using explicit refspecs.
> 
> e.g. pushing a change for review to Gerrit
> 
>$ git push origin HEAD:refs/for/master
> 
> I suspect that this is the most common form of pushing and furthermore
> will become the default once push.default defaults to 'current'.

Nice.

> The remaining push cases can be distilled into the following:
> 
>   ref-countimpact
>   m >= log n   improved with this patch
>   m < log nregressed with this patch roughly ~6-7%
> 
> So, I think what we have to consider is whether the improvement to
> something like 'git push --mirror' is worth the impact to an asymmetric
> push where the number of local refs is much smaller than the number of
> remote refs.  I'm not sure how common the latter really is though.
> Gerrit does produce repositories with many refs on the remote end in
> the refs/changes/ namespace, but do people commonly push to Gerrit
> using matching or pattern refspecs?  Not sure, but I'd tend to think
> that they don't.

To me it is not about what happens sometimes or not, but about having
runaway worst-case behavior that is unusable. The 6-7% increase (which
is the absolute worst-case measurement we could come up with; in the
real world you would usually transfer actual objects, and connect over
an actual network) is worth it, IMHO.

So I'd be in favor of applying this (possibly covering the send_prune
case, too). If somebody really wants to care about the 6-7%, they can
build on top of your patch with heuristics to avoid indexing in the
small cases.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] pack-revindex: radix-sort the revindex

2013-07-08 Thread Jeff King
On Sun, Jul 07, 2013 at 04:52:23PM -0700, Shawn O. Pearce wrote:

> On Sun, Jul 7, 2013 at 3:14 AM, Jeff King  wrote:
> > The pack revindex stores the offsets of the objects in the
> > pack in sorted order, allowing us to easily find the on-disk
> > size of each object. To compute it, we populate an array
> > with the offsets from the sha1-sorted idx file, and then use
> > qsort to order it by offsets.
> >
> > That does O(n log n) offset comparisons, and profiling shows
> > that we spend most of our time in cmp_offset. However, since
> > we are sorting on a simple off_t, we can use numeric sorts
> > that perform better. A radix sort can run in O(k*n), where k
> > is the number of "digits" in our number. For a 64-bit off_t,
> > using 16-bit "digits" gives us k=4.
> 
> Did you try the simple bucket sort Colby now uses in JGit?
> 
> The sort is pretty simple:
> 
>   bucket_size = pack_length / object_count;
>   buckets[] = malloc(object_count * sizeof(int));
> 
>   foreach obj in idx:
> push_chain(buckets[obj.offset / bucket_size], obj.idx_nth);
> 
>   foreach bucket:
> insertion sort by offset

I did do something similar (though I flattened my buckets into a single
list afterwards), but I ended up closer to 700ms (down from 830ms, but
with the radix sort around 200ms). It's entirely possible I screwed up
something in the implementation (the bucket insertion can be done in a
lot of different ways, many of which are terrible), but I didn't keep a
copy of that attempt. If you try it and have better numbers, I'd be
happy to see them.

> We observed on linux.git that most buckets have an average number of
> objects. IIRC the bucket_size was ~201 bytes and most buckets had very
> few objects each. For lookups we keep the bucket_size parameter and a
> bucket index table. This arrangement uses 8 bytes per object in the
> reverse index, making it very memory efficient. Searches are typically
> below O(log N) time because each bucket has http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] name-ref: factor out name shortening logic from name_ref()

2013-07-08 Thread Michael Haggerty
On 07/08/2013 12:33 AM, Junio C Hamano wrote:
> The logic will be used in a new codepath for showing exact matches.
> 
> Signed-off-by: Junio C Hamano 
> ---
>  builtin/name-rev.c | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 87d4854..1234ebb 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -96,6 +96,17 @@ static int subpath_matches(const char *path, const char 
> *filter)
>   return -1;
>  }
>  
> +static const char *name_ref_abbrev(const char *refname, int 
> shorten_unambiguous)
> +{
> + if (shorten_unambiguous)
> + refname = shorten_unambiguous_ref(refname, 0);
> + else if (!prefixcmp(refname, "refs/heads/"))
> + refname = refname + 11;
> + else if (!prefixcmp(refname, "refs/"))
> + refname = refname + 5;
> + return refname;
> +}
> +

In my opinion this would be a tad clearer if each of the branches of the
"if" returned the value directly rather than setting refname and relying
on the "return" statement that follows.  But it's probably a matter of
taste.

>  struct name_ref_data {
>   int tags_only;
>   int name_only;
> @@ -134,13 +145,7 @@ static int name_ref(const char *path, const unsigned 
> char *sha1, int flags, void
>   if (o && o->type == OBJ_COMMIT) {
>   struct commit *commit = (struct commit *)o;
>  
> - if (can_abbreviate_output)
> - path = shorten_unambiguous_ref(path, 0);
> - else if (!prefixcmp(path, "refs/heads/"))
> - path = path + 11;
> - else if (!prefixcmp(path, "refs/"))
> - path = path + 5;
> -
> + path = name_ref_abbrev(path, can_abbreviate_output);
>   name_rev(commit, xstrdup(path), 0, 0, deref);
>   }
>   return 0;
> 

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A local shared clone is now much slower

2013-07-08 Thread Duy Nguyen
On Mon, Jul 8, 2013 at 2:30 PM, Jeff King  wrote:
> Subject: [PATCH] clone: drop connectivity check for local clones
>
> Commit 0433ad1 (clone: run check_everything_connected,
> 2013-03-25) added the same connectivity check to clone that
> we use for fetching. The intent was to provide enough safety
> checks that "git clone git://..." could be counted on to
> detect bit errors and other repo corruption, and not
> silently propagate them to the clone.
>
> For local clones, this turns out to be a bad idea, for two
> reasons:
>
>   1. Local clones use hard linking (or even shared object
>  stores), and so complete far more quickly. The time
>  spent on the connectivity check is therefore
>  proportionally much more painful.

There's also byte-to-byte copy when system does not support hardlinks
(or the user does not want it) but I guess it's safe to trust the OS
to copy correctly in most cases.

>   2. Local clones do not actually meet our safety guarantee
>  anyway. The connectivity check makes sure we have all
>  of the objects we claim to, but it does not check for
>  bit errors. We will notice bit errors in commits and
>  trees, but we do not load blob objects at all. Whereas
>  over the pack transport, we actually recompute the sha1
>  of each object in the incoming packfile; bit errors
>  change the sha1 of the object, which is then caught by
>  the connectivity check.

We used to, before d21c463 (fetch/receive: remove over-pessimistic
connectivity check - 2012-03-15). But back then we did not even do
connectivity check in clone.

> This patch drops the connectivity check in the local case.
> Note that we have to revert the changes from 0433ad1 to
> t5710, as we no longer notice the corruption during clone.
>
> We could go a step further and provide a "verify even local
> clones" option, but it is probably not worthwhile. You can
> already spell that as "cd foo.git && git fsck && git clone ."
> or as "git clone --no-local foo.git".

Faster clones make everybody happy :-)
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 w/prune index] remote.c: avoid O(m*n) behavior in match_push_refs

2013-07-08 Thread Brandon Casey
When pushing using a matching refspec or a pattern refspec, each ref
in the local repository must be paired with a ref advertised by the
remote server.  This is accomplished by using the refspec to transform
the name of the local ref into the name it should have in the remote
repository, and then performing a linear search through the list of
remote refs to see if the remote ref was advertised by the remote
system.

Each of these lookups has O(n) complexity and makes match_push_refs()
be an O(m*n) operation, where m is the number of local refs and n is
the number of remote refs.  If there are many refs 100,000+, then this
ref matching can take a significant amount of time.  Let's prepare an
index of the remote refs to allow searching in O(log n) time and
reduce the complexity of match_push_refs() to O(m log n).

We prepare the index lazily so that it is only created when necessary.
So, there should be no impact when _not_ using a matching or pattern
refspec, i.e. when pushing using only explicit refspecs.

Dry-run push of a repository with 121,913 local and remote refs:

before after
real1m40.582s  0m0.804s
user1m39.914s  0m0.515s
sys 0m0.125s   0m0.106s

The creation of the index has overhead.  So, if there are very few
local refs, then it could take longer to create the index than it
would have taken to just perform n linear lookups into the remote
ref space.  Using the index should provide some improvement when
the number of local refs is roughly greater than the log of the
number of remote refs (i.e. m >= log n).  The pathological case is
when there is a single local ref and very many remote refs.

Dry-run push of a repository with 121,913 remote refs and a single
local ref:

beforeafter
real0m0.525s  0m0.566s
user0m0.243s  0m0.279s
sys 0m0.075s  0m0.099s

Using an index takes 41 ms longer, or roughly 7.8% longer.

Jeff King measured a no-op push of a single ref into a remote repo
with 370,000 refs:

beforeafter
real0m1.087s  0m1.156s
user0m1.344s  0m1.412s
sys 0m0.288s  0m0.284s

Using an index takes 69 ms longer, or roughly 6.3% longer.

None of the measurements above required transferring any objects to
the remote repository.  If the push required transferring objects and
updating the refs in the remote repository, the impact of preparing
the search index would be even smaller.

A similar operation is performed in the reverse direction when pruning
using a matching or pattern refspec.  Let's avoid O(m*n) behavior in
the same way by lazily preparing an index on the local refs.

Signed-off-by: Brandon Casey 
---

On Mon, Jul 8, 2013 at 12:50 AM, Jeff King  wrote:
> On Mon, Jul 08, 2013 at 12:02:11AM -0700, Brandon Casey wrote:
> 
> > Here is the reroll with an updated commit message that hopefully
> > provides a little more detail to justify this change.  I removed
> > the use of the search index in the send_prune block since I think
> > that pruning many refs is an uncommon operation and the overhead
> > of creating the index will more commonly exceed the benefit of
> > using it.
> 
> I don't know. I'd think that if you are using pruning, you might delete
> a large chunk at one time (e.g., rearranging your ref hierarchy,
> followed by "git push --mirror"). But that is just my gut feeling. I
> haven't actually run into this slow-down in the real world (we typically
> fetch from our giant repositories rather than push into them).

Firstly, why are you still awake?!?! :)

Secondly, fair enough.  I don't think the change to the pruning block
will have much impact in real repos either way.  In this block, the
search is being performed on the local refs.  There would have to be
many refs on the local side for the generation of the index to be
significant enough to notice.

So, I'm fine with using the index when pruning too to avoid worst-case
behavior when there are many local refs and many deletions.

-Brandon

 remote.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/remote.c b/remote.c
index 6f57830..efcba93 100644
--- a/remote.c
+++ b/remote.c
@@ -1302,6 +1302,14 @@ static void add_missing_tags(struct ref *src, struct ref 
**dst, struct ref ***ds
free(sent_tips.tip);
 }
 
+static void prepare_ref_index(struct string_list *ref_index, struct ref *ref)
+{
+   for ( ; ref; ref = ref->next)
+   string_list_append_nodup(ref_index, ref->name)->util = ref;
+
+   sort_string_list(ref_index);
+}
+
 /*
  * Given the set of refs the local repository has, the set of refs the
  * remote repository has, and the refspec used for push, determine
@@ -1320,6 +1328,7 @@ int match_push_refs(struct ref *src, struct ref **dst,
int errs;
static const char *default_refspec[] = { ":", NULL };
struct ref *ref, **dst_tail = tail_ref(dst);
+   struct string_list dst_ref_index = STRING_LIST_INIT_NODUP;
 
if (!nr_refspec) {
nr_refspec 

[PATCH] t0000: do not use export X=Y

2013-07-08 Thread Torsten Bögershausen
The shell syntax "export X=Y A=B" is not understood by all shells.

Signed-off-by: Torsten Bögershausen 
---
 t/t-basic.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index 5c32288..10be52b 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -53,7 +53,8 @@ run_sub_test_lib_test () {
# Pretend we're a test harness.  This prevents
# test-lib from writing the counts to a file that will
# later be summarized, showing spurious "failed" tests
-   export HARNESS_ACTIVE=t &&
+   HARNESS_ACTIVE=t &&
+   export HARNESS_ACTIVE &&
cd "$name" &&
cat >"$name.sh" <<-EOF &&
#!$SHELL_PATH
-- 
1.8.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git p4 clone not processing branches properly

2013-07-08 Thread Matthieu Brucher
Hi again,

I tried with @all, but it didn'y work as expected. It imported a bunch
of revisions (but no files?) and ended with:
  Reading pipe: ['git', 'config', '--bool', 'git-p4.importLabels']
  Not checking out any branch, use "git checkout -q -b master "
  executing git config --bool git-p4.useclientspec true

And when I tried to checkout Branch/Main, it failed with
  fatal: Cannot update paths and switch to branch 'master' at the same time.
  Did you intend to checkout 'Branch/Main' which can not be resolved as commit?

Thanks,

Matthieu


2013/7/5 Matthieu Brucher :
>>> I can try. Indeed, at this revision, the two other branches do not yet
>>> exist. But @all will get everything? Last time, I only got head
>>> (IIRC).
>>
>> Our P4 server has a limitation on the number of lines returned by "p4
>> changes" command, so sometimes I have to use @change_start,@change_stop
>> instead of @all. You might want to use this range limitation to test
>> git-p4 by limiting to a small number of changelists that allows you to
>> check if at least one branch is correctly detected.
>
> I didn't know about this. I wanted to start the cloning at some point
> in the past, that's why I used the @123456789 notation.
>
 Also, by using that command it means that the following depot paths must
 exist:
 //Depot/Project/Branch/Main
 //Depot/Project/Releases/2013
 //Depot/Project/Branch/Feature1
>>>
>>> Yes, they indeed do.
>>
>> In this case the problem should not be in branchList configuration.
>>
 I've never used the --use-client-spec, so I'm not sure if that will not
 break the branch detection code.
>>>
>>> I need to do that because if I don't, the depot is clobbed with
>>> binaries. Or perhaps if I put some .gitignore stuff, I might not do
>>> this?
>>
>> Keep using it, at least for now. If everything else fails we can look at
>> this again.
>
> OK, I'll send a mail on Monday (forgot it was the week end tomorrow...)
>
> Cheers,
>
> Matthieu
> --
> Information System Engineer, Ph.D.
> Blog: http://matt.eifelle.com
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
> Music band: http://liliejay.com/



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git p4 clone not processing branches properly

2013-07-08 Thread Vitor Antunes
On Mon, Jul 8, 2013 at 11:09 AM, Matthieu Brucher
 wrote:
> Hi again,
>
> I tried with @all, but it didn'y work as expected. It imported a bunch
> of revisions (but no files?) and ended with:
>   Reading pipe: ['git', 'config', '--bool', 'git-p4.importLabels']
>   Not checking out any branch, use "git checkout -q -b master "
>   executing git config --bool git-p4.useclientspec true
>
> And when I tried to checkout Branch/Main, it failed with
>   fatal: Cannot update paths and switch to branch 'master' at the same time.
>   Did you intend to checkout 'Branch/Main' which can not be resolved as 
> commit?

Hi Matthieu,

Please run "git branch -a" in that repository and you should be able
to see the various branches under /remotes/p4/
Then you just need to choose a branch and run "git checkout -b
git_branch_name p4/p4_branch_name".

Cheers,
Vitor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git p4 clone not processing branches properly

2013-07-08 Thread Matthieu Brucher
Unfortunately, git branch -a returns nothing :/
I tried with the simple detect-branches as well as with the config values.
Perhaps the spec-client? Although it is strange as it seems that the
repository is completely empty.

Thanks,

Matthieu

2013/7/8 Vitor Antunes :
> On Mon, Jul 8, 2013 at 11:09 AM, Matthieu Brucher
>  wrote:
>> Hi again,
>>
>> I tried with @all, but it didn'y work as expected. It imported a bunch
>> of revisions (but no files?) and ended with:
>>   Reading pipe: ['git', 'config', '--bool', 'git-p4.importLabels']
>>   Not checking out any branch, use "git checkout -q -b master "
>>   executing git config --bool git-p4.useclientspec true
>>
>> And when I tried to checkout Branch/Main, it failed with
>>   fatal: Cannot update paths and switch to branch 'master' at the same time.
>>   Did you intend to checkout 'Branch/Main' which can not be resolved as 
>> commit?
>
> Hi Matthieu,
>
> Please run "git branch -a" in that repository and you should be able
> to see the various branches under /remotes/p4/
> Then you just need to choose a branch and run "git checkout -b
> git_branch_name p4/p4_branch_name".
>
> Cheers,
> Vitor



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Duy Nguyen
un Mon, Jul 8, 2013 at 12:49 AM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> Perhaps we need
>>
>>   git cat-file --batch-format="%(disk-size) %(object)"
>>
>> or similar.

This is what I wanted to do with the in for-each-ref's pretty
formatting [1]. I used to hack cat-file --batch to extract info I
needed for experimenting with various pack index extensions. If you
are not in hurry, maybe we can introduce something similar to your
syntax, but applicable for all for-each-ref, branch and log family.
Ram, are you still interested in the awesome branch series?

> I agree with your reasoning.  It may be simpler to give an interface
> to ask for which pieces of info, e.g. --batch-cols=size,disksize,
> without giving the readers a flexible "format".

[1] http://thread.gmane.org/gmane.comp.version-control.git/227057/focus=227223
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git p4 clone not processing branches properly

2013-07-08 Thread Matthieu Brucher
Without the spec client, it seems that the branches are recognized,
but there are some many binary files that I need to remove them during
the migration.
I tried setting a .gitignore beforehand, but it is not respected (I
tried to remove some folders with folder/ in .gitignore, but the
folder are still imported).
It there a switch for the import somewhere?

Thanks,

Matthieu

2013/7/8 Matthieu Brucher :
> Unfortunately, git branch -a returns nothing :/
> I tried with the simple detect-branches as well as with the config values.
> Perhaps the spec-client? Although it is strange as it seems that the
> repository is completely empty.
>
> Thanks,
>
> Matthieu
>
> 2013/7/8 Vitor Antunes :
>> On Mon, Jul 8, 2013 at 11:09 AM, Matthieu Brucher
>>  wrote:
>>> Hi again,
>>>
>>> I tried with @all, but it didn'y work as expected. It imported a bunch
>>> of revisions (but no files?) and ended with:
>>>   Reading pipe: ['git', 'config', '--bool', 'git-p4.importLabels']
>>>   Not checking out any branch, use "git checkout -q -b master "
>>>   executing git config --bool git-p4.useclientspec true
>>>
>>> And when I tried to checkout Branch/Main, it failed with
>>>   fatal: Cannot update paths and switch to branch 'master' at the same time.
>>>   Did you intend to checkout 'Branch/Main' which can not be resolved as 
>>> commit?
>>
>> Hi Matthieu,
>>
>> Please run "git branch -a" in that repository and you should be able
>> to see the various branches under /remotes/p4/
>> Then you just need to choose a branch and run "git checkout -b
>> git_branch_name p4/p4_branch_name".
>>
>> Cheers,
>> Vitor
>
>
>
> --
> Information System Engineer, Ph.D.
> Blog: http://matt.eifelle.com
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
> Music band: http://liliejay.com/



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/22] read-cache: add index reading api

2013-07-08 Thread Thomas Gummerer
Duy Nguyen  writes:

> On Sun, Jul 7, 2013 at 3:11 PM, Thomas Gummerer  wrote:
>> +/*
>> + * Options by which the index should be filtered when read partially.
>> + *
>> + * pathspec: The pathspec which the index entries have to match
>> + * seen: Used to return the seen parameter from match_pathspec()
>> + * max_prefix, max_prefix_len: These variables are set to the longest
>> + * common prefix, the length of the longest common prefix of the
>> + * given pathspec
>> + *
>> + * read_staged: used to indicate if the conflicted entries (entries
>> + * with a stage) should be included
>> + * read_cache_tree: used to indicate if the cache-tree should be read
>> + * read_resolve_undo: used to indicate if the resolve undo data should
>> + * be read
>> + */
>> +struct filter_opts {
>> +   const char **pathspec;
>> +   char *seen;
>> +   char *max_prefix;
>> +   int max_prefix_len;
>> +
>> +   int read_staged;
>> +   int read_cache_tree;
>> +   int read_resolve_undo;
>> +};
>> +
>>  struct index_state {
>> struct cache_entry **cache;
>> unsigned int version;
>> @@ -270,6 +297,8 @@ struct index_state {
>> struct hash_table name_hash;
>> struct hash_table dir_hash;
>> struct index_ops *ops;
>> +   struct internal_ops *internal_ops;
>> +   struct filter_opts *filter_opts;
>>  };
>
> ...
>
>> -/* remember to discard_cache() before reading a different cache! */
>> -int read_index_from(struct index_state *istate, const char *path)
>> +
>> +int read_index_filtered_from(struct index_state *istate, const char *path,
>> +struct filter_opts *opts)
>>  {
>> int fd, err, i;
>> struct stat st_old, st_new;
>> @@ -1337,7 +1425,7 @@ int read_index_from(struct index_state *istate, const 
>> char *path)
>> if (istate->ops->verify_hdr(mmap, mmap_size) < 0)
>> err = 1;
>>
>> -   if (istate->ops->read_index(istate, mmap, mmap_size) < 0)
>> +   if (istate->ops->read_index(istate, mmap, mmap_size, opts) < 
>> 0)
>> err = 1;
>> istate->timestamp.sec = st_old.st_mtime;
>> istate->timestamp.nsec = ST_MTIME_NSEC(st_old);
>> @@ -1345,6 +1433,7 @@ int read_index_from(struct index_state *istate, const 
>> char *path)
>> die_errno("cannot stat the open index");
>>
>> munmap(mmap, mmap_size);
>> +   istate->filter_opts = opts;
>> if (!index_changed(&st_old, &st_new) && !err)
>> return istate->cache_nr;
>> }
>
> Putting filter_opts in index_state feels like a bad design. Iterator
> information should be separated from the iterated object, so that two
> callers can walk through the same index without stepping on each other
> (I'm not talking about multithreading, a caller may walk a bit, then
> the other caller starts walking, then the former caller resumes
> walking again in a call chain).

Yes, you're right.  We need the filter_opts to see what part of the
index has been loaded [1] and which part has been skipped, but it
shouldn't be used for filtering in the for_each_index_entry function.

I think there should be two versions of the for_each_index_entry
function then, where the for_each_index_entry function would behave the
same way as the for_each_index_entry_filtered function with the
filter_opts parameter set to NULL:
for_each_index_entry_filtered(struct index_state *, each_cache_entry_fn, void 
*cb_data, struct filter_opts *)
for_each_index_entry(struct index_state *, each_cache_entry_fn, void *cb_data)

Both of them then should call index_change_filter_opts to make sure all
the entries that are needed are loaded in the in-memory format.

Does that make sense?

[1] That is only important for the new index-v5 file format, which can
be loaded partially.  The filter_opts could always be set to NULL,
as the whole index is always loaded anyway.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/22] read-cache: add index reading api

2013-07-08 Thread Thomas Gummerer
Duy Nguyen  writes:

> On Sun, Jul 7, 2013 at 3:11 PM, Thomas Gummerer  wrote:
>> Add an api for access to the index file.  Currently there is only a very
>> basic api for accessing the index file, which only allows a full read of
>> the index, and lets the users of the data filter it.  The new index api
>> gives the users the possibility to use only part of the index and
>> provides functions for iterating over and accessing cache entries.
>>
>> This simplifies future improvements to the in-memory format, as changes
>> will be concentrated on one file, instead of the whole git source code.
>>
>> Signed-off-by: Thomas Gummerer 
>> ---
>>  cache.h |  57 +-
>>  read-cache-v2.c |  96 +++--
>>  read-cache.c| 108 
>> 
>>  read-cache.h|  12 ++-
>>  4 files changed, 263 insertions(+), 10 deletions(-)
>>
>> diff --git a/cache.h b/cache.h
>> index 5082b34..d38dfbd 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -127,7 +127,8 @@ struct cache_entry {
>> unsigned int ce_flags;
>> unsigned int ce_namelen;
>> unsigned char sha1[20];
>> -   struct cache_entry *next;
>> +   struct cache_entry *next; /* used by name_hash */
>> +   struct cache_entry *next_ce; /* used to keep a list of cache entries 
>> */
>> char name[FLEX_ARRAY]; /* more */
>>  };
>
> From what I read, doing
>
> ce = start;
> while (ce) { do(something); ce = next_cache_entry(ce); }
>
> is the same as
>
> i = start_index;
> while (i < active_nr) { ce = active_cache[i]; do(something); i++; }
>
> What's the advantage of using the former over the latter? Do you plan
> to eliminate the latter loop (by hiding "struct cache_entry **cache;"
> from public index_state structure?

Yes, I wanted to eliminate the latter loop, because it depends on the
in-memory format of the index.  By moving all direct accesses of
index_state->cache to an api it gets easier to change the in-memory
format.  I played a bit with a tree-based in-memory format [1], which
represents the on-disk format of index-v5 more closely, making
modifications and partial-loading simpler.

I've tried switching all those loops to api calls, but that would make
the api too bloated because there is a lot of those loops.  I found it
more sensible to do it this way, leaving the loops how they are, while
making future changes to the in-memory format a lot simpler.

[1] https://github.com/tgummerer/git/blob/index-v5api/read-cache-v5.c#L17
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/22] read-cache: read index-v5

2013-07-08 Thread Thomas Gummerer
Eric Sunshine  writes:

> On Sun, Jul 7, 2013 at 4:11 AM, Thomas Gummerer  wrote:
>> Make git read the index file version 5 without complaining.
>>
>> This version of the reader doesn't read neither the cache-tree
>> nor the resolve undo data, but doesn't choke on an index that
>> includes such data.
>> ---
>> diff --git a/read-cache-v5.c b/read-cache-v5.c
>> new file mode 100644
>> index 000..e319f30
>> --- /dev/null
>> +++ b/read-cache-v5.c
>> @@ -0,0 +1,658 @@
>> +static struct directory_entry *read_directories(unsigned int *dir_offset,
>> +   unsigned int *dir_table_offset,
>> +   void *mmap,
>> +   int mmap_size)
>> +{
>> +   int i, ondisk_directory_size;
>> +   uint32_t *filecrc, *beginning, *end;
>> +   struct directory_entry *current = NULL;
>> +   struct ondisk_directory_entry *disk_de;
>> +   struct directory_entry *de;
>> +   unsigned int data_len, len;
>> +   char *name;
>> +
>> +   /* Length of pathname + nul byte for termination + size of
>> +* members of ondisk_directory_entry. (Just using the size
>> +* of the stuct doesn't work, because there may be padding
>
> s/stuct/struct/
>
>> +* bytes for the struct)
>> +*/
>
> Also:
>
>   /*
>* Format multi-line comment
>* like this.
>*/
>
> Remaining multi-line comments appear to be formatted correctly.

Thanks for catching this and the other typos.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Ramkumar Ramachandra
Duy Nguyen wrote:
> Ram, are you still interested in the awesome branch series?

Yep, but it got stalled due to lack of reviewer-interest :/

I'm a bit under the weather at the moment, but it's good to see that
you're back: let's finish this soon.

>>> Perhaps we need
>>>
>>>   git cat-file --batch-format="%(disk-size) %(object)"
>>>
>>> or similar.
>
> This is what I wanted to do with the in for-each-ref's pretty
> formatting [1]. I used to hack cat-file --batch to extract info I
> needed for experimenting with various pack index extensions. If you
> are not in hurry, maybe we can introduce something similar to your
> syntax, but applicable for all for-each-ref, branch and log family.

I'm still quite confused about this "grand plan".  We have short
commit-specific format specifiers that don't work with refs, among
several other quirks in [1].  I personally think we should absolutely
stay away from short format-specifiers (like %H, %f, %e; we'll soon
run out of letters, and nobody can tell what they are without the
documentation anyway) for the new options, and just start adding new
long-form ones as and when they are necessary.  I think refname:short,
upstream:track, upstream:trackshort are very sensible choices, and
that we should continue along that line.  I'm fine with
format-specifiers having meanings only in certain contexts as long as
we document it properly (how can we possibly get %(refname) to mean
something sensible in cat-file?).

As far as this series is concerned, I think Peff can implement %H and
%(object:[disk-]size) locally without worrying about code-sharing or
waiting for us.  Then, after the for-each-ref-pretty thing matures, we
can just replace the code underneath.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] name-rev: allow converting the exact object name at the tip of a ref

2013-07-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote:
> "git name-rev" is supposed to convert 40-hex object names into
> strings that name the same objects based on refs, that can be fed to
> "git rev-parse" to get the same object names back, so
>
> $ git rev-parse v1.8.3 v1.8.3^0 | git name-rev --stdin
> 8af06057d0c31a24e8737ae846ac2e116e8bafb9
> edca4152560522a431a51fc0a06147fc680b5b18 (tags/v1.8.3^0)

Wait, what?

  $ git name-rev 8af060
  8af060 tags/v1.8.3^0

Isn't this a failure specific to --stdin?

> Teach it to show this instead:
>
> $ git rev-parse v1.8.3 v1.8.3^0 | git name-rev --stdin
> 8af06057d0c31a24e8737ae846ac2e116e8bafb9 (tags/v1.8.3)
> edca4152560522a431a51fc0a06147fc680b5b18 (tags/v1.8.3^0)

Wait, what is name-rev?

  Finds symbolic names suitable for human digestion for revisions
  given in any format parsable by git rev-parse.

It is meant to name _revisions_ (aka. commits): in that context, what
sense does it make to distinguish between tags/v1.8.3 and
tags/v1.8.3^0?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/22] read-cache: add index reading api

2013-07-08 Thread Duy Nguyen
On Mon, Jul 8, 2013 at 6:20 PM, Thomas Gummerer  wrote:
> Duy Nguyen  writes:
>> Putting filter_opts in index_state feels like a bad design. Iterator
>> information should be separated from the iterated object, so that two
>> callers can walk through the same index without stepping on each other
>> (I'm not talking about multithreading, a caller may walk a bit, then
>> the other caller starts walking, then the former caller resumes
>> walking again in a call chain).
>
> Yes, you're right.  We need the filter_opts to see what part of the
> index has been loaded [1] and which part has been skipped, but it
> shouldn't be used for filtering in the for_each_index_entry function.
>
> I think there should be two versions of the for_each_index_entry
> function then, where the for_each_index_entry function would behave the
> same way as the for_each_index_entry_filtered function with the
> filter_opts parameter set to NULL:
> for_each_index_entry_filtered(struct index_state *, each_cache_entry_fn, void 
> *cb_data, struct filter_opts *)
> for_each_index_entry(struct index_state *, each_cache_entry_fn, void *cb_data)
>
> Both of them then should call index_change_filter_opts to make sure all
> the entries that are needed are loaded in the in-memory format.
>
> Does that make sense?

Hmm.. I was confused actually (documentation on the api would help
greatly). If you already filter at load time, I don't think you need
to match again. The caller asked for filter and it should know what's
in there so for_each_index_entry just goes through all entries without
extra match_pathspec. Or is that what next_index_entry for?
match_pathspec function could be expensive when glob is involved. If
the caller wants extra matching, it could do inside the callback
function.

It seems you could change the filter with index_change_filter_opts. In
v5 the index will be reloaded. What happens when some index entries
area already modified? Do we start to have read-only index "views" and
one read-write view? If partial views are always read-only, perhaps we
just allow the user to create a new index_state (or view) with new
filter and destroy the old one. We don't have to care about changing
or separating filter in that case because the view is the iterator.

I wanted to have a tree-based iterator api, but that seems
incompatible with pre-v5 (or at least adds some overhead on pre-v5 to
rebuild the tree structure). It looks like using pathspec to build a
list of entries, as you did, is a good way to take advantage of
tree-based v5 while maintaining code compatibility with pre-v5. By the
way with tree structure, you could use tree_entry_interesting in
read_index_filtered_v5. I think it's more efficient than
match_pathspec.

I'm still studying the code. Some of what I wrote here may be totally
wrong due to my lack of understanding. I'll get back to you later if I
find something else.

> [1] That is only important for the new index-v5 file format, which can
> be loaded partially.  The filter_opts could always be set to NULL,
> as the whole index is always loaded anyway.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] name-rev: strip trailing ^0 in when --name-only

2013-07-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote:
> But I do not think "name-rev" is limited
> to commits, in the sense that you would see this:
>
> $ git rev-parse v1.8.3 v1.8.3^0 | git name-rev --stdin
> 8af06057d0c31a24e8737ae846ac2e116e8bafb9
> edca4152560522a431a51fc0a06147fc680b5b18 (tags/v1.8.3^0)
>
> The second object is _not_ v1.8.3 but is v1.8.3^0 in the context of
> name-rev, whose purpose is to give you a string you can feed
> "rev-parse" and get the object name back.  "rev-parse v1.8.3" will
> not give you the commit object name, so you need to keep "^0".

Quite frankly, I thought the unstripped ^0 in one codepath was an
unintended quirk.  What exactly do you want name-rev to give you?

  $ git tag foo @^
  $ git name-rev foo
  foo tags/foo

So you can distinguish between annotated tags, unannotated tags, and
head-refs.  Can you get it to tell you anything reliably though?

  $ git tag bar @
  $ git tag -a baz @
  $ git name-rev @
  $ git name-rev bar
  $ git name-rev baz

ref, annotated, or unannotated tag?  I do not think name-rev is
fundamentally different from describe: it is also only dependent on
the commit history graph.  Whether I specify a revision using @, HEAD,
baz, or bar, I should get the same answer (it's just a recursive
peeler).  I'm not sure what you gain by knowing the object type of the
output.  If you wanted to feed something into rev-parse and get out a
commit, you'd send in $REV^0 without bothering about what it is, no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A local shared clone is now much slower

2013-07-08 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Jul 08, 2013 at 01:03:55PM +1000, Stephen Rothwell wrote:
>
>> So commit 0433ad128c59 ("clone: run check_everything_connected") (which
>> turned up with v1.8.3) added a large traversal to clone which (as the
>> comment said) makes a clone much slower.  It is especially noticeable on
>> "git clone -s -l -n" which I use every day and used to be almost
>> instant.  Is there any thought to making it fast again, please?
>> 
>> The above clone is very useful for working with different branches in one
>> tree without touching every file in the main branch you are working
>> with (and consequent issues with rebuilding at least).  As linux-next
>> maintainer, you can imagine that I do this a bit.
>
> Yeah, I have noticed it is somewhat annoying, as well, because the
> proportion of time taken for the check is so much larger compared to the
> relatively instant time taken for the local shared clone.
>
> The point of that commit is to add the same safety checks to clone that
> we do for fetching. But in the local shared-repo case, I really feel
> like all safety bets are off anyway. You are not creating a verified
> redundant copy at all, and there are still corruptions that can sneak
> through (e.g., bit corruptions of blob objects).

Yeah, I was thinking the same when I saw that report, so obviously I
think the approacch makes sense ;-)

Thanks.

>
> So maybe this:
>
> -- >8 --
> Subject: [PATCH] clone: drop connectivity check for local clones
>
> Commit 0433ad1 (clone: run check_everything_connected,
> 2013-03-25) added the same connectivity check to clone that
> we use for fetching. The intent was to provide enough safety
> checks that "git clone git://..." could be counted on to
> detect bit errors and other repo corruption, and not
> silently propagate them to the clone.
>
> For local clones, this turns out to be a bad idea, for two
> reasons:
>
>   1. Local clones use hard linking (or even shared object
>  stores), and so complete far more quickly. The time
>  spent on the connectivity check is therefore
>  proportionally much more painful.
>
>   2. Local clones do not actually meet our safety guarantee
>  anyway. The connectivity check makes sure we have all
>  of the objects we claim to, but it does not check for
>  bit errors. We will notice bit errors in commits and
>  trees, but we do not load blob objects at all. Whereas
>  over the pack transport, we actually recompute the sha1
>  of each object in the incoming packfile; bit errors
>  change the sha1 of the object, which is then caught by
>  the connectivity check.
>
> This patch drops the connectivity check in the local case.
> Note that we have to revert the changes from 0433ad1 to
> t5710, as we no longer notice the corruption during clone.
>
> We could go a step further and provide a "verify even local
> clones" option, but it is probably not worthwhile. You can
> already spell that as "cd foo.git && git fsck && git clone ."
> or as "git clone --no-local foo.git".
>
> Signed-off-by: Jeff King 
> ---
>  builtin/clone.c   | 22 +-
>  t/t5710-info-alternate.sh |  8 +++-
>  2 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 14b1323..dafb6b5 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -545,17 +545,20 @@ static void update_remote_refs(const struct ref *refs,
>  const struct ref *remote_head_points_at,
>  const char *branch_top,
>  const char *msg,
> -struct transport *transport)
> +struct transport *transport,
> +int check_connectivity)
>  {
>   const struct ref *rm = mapped_refs;
>  
> - if (0 <= option_verbosity)
> - printf(_("Checking connectivity... "));
> - if (check_everything_connected_with_transport(iterate_ref_map,
> -   0, &rm, transport))
> - die(_("remote did not send all necessary objects"));
> - if (0 <= option_verbosity)
> - printf(_("done\n"));
> + if (check_connectivity) {
> + if (0 <= option_verbosity)
> + printf(_("Checking connectivity... "));
> + if (check_everything_connected_with_transport(iterate_ref_map,
> +   0, &rm, 
> transport))
> + die(_("remote did not send all necessary objects"));
> + if (0 <= option_verbosity)
> + printf(_("done\n"));
> + }
>  
>   if (refs) {
>   write_remote_refs(mapped_refs);
> @@ -963,7 +966,8 @@ int cmd_clone(int argc, const char **argv, const char 
> *prefix)
>   transport_fetch_refs(transport, mapped_refs);
>  
>   update_remote_refs(refs, mapped_refs, remote_head_points

[RFC/PATCH] fetch: make --prune configurable

2013-07-08 Thread Michael Schubert
$gmane/201715 brought up the idea to fetch --prune by default.
Since --prune is a "potentially destructive operation" (Git doesn't
keep reflogs for deleted references yet), we don't want to prune
without users consent.

To accommodate users who want to either prune always or when fetching
from a particular remote, add two new configuration variables
"fetch.prune" and "remote..prune":

 - "fetch.prune" allows to enable prune for all fetch operations.

 - "remote..prune" allows to change the behaviour per remote.

Signed-off-by: Michael Schubert 
---

http://thread.gmane.org/gmane.comp.version-control.git/201715


 Documentation/config.txt |  9 +
 builtin/fetch.c  | 28 +---
 remote.c |  2 ++
 remote.h |  1 +
 t/t5510-fetch.sh | 38 ++
 5 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b4d4887..74e8026 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1067,6 +1067,10 @@ fetch.unpackLimit::
especially on slow filesystems.  If not set, the value of
`transfer.unpackLimit` is used instead.
 
+fetch.prune::
+   If true, fetch will automatically behave as if the `--prune`
+   option was given on the command line.
+
 format.attach::
Enable multipart/mixed attachments as the default for
'format-patch'.  The value can also be a double quoted string
@@ -2010,6 +2014,11 @@ remote..vcs::
Setting this to a value  will cause Git to interact with
the remote with the git-remote- helper.
 
+remote..prune::
+   When set to true, fetching from this remote by default will also
+   remove any remote-tracking branches which no longer exist on the
+   remote (as if the `--prune` option was give on the command line).
+
 remotes.::
The list of remotes which are fetched by "git remote update
".  See linkgit:git-remote[1].
diff --git a/builtin/fetch.c b/builtin/fetch.c
index d784b2e..3953317 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -30,7 +30,14 @@ enum {
TAGS_SET = 2
 };
 
-static int all, append, dry_run, force, keep, multiple, prune, update_head_ok, 
verbosity;
+enum {
+   PRUNE_UNSET = 0,
+   PRUNE_DEFAULT = 1,
+   PRUNE_FORCE = 2
+};
+
+static int prune = PRUNE_DEFAULT;
+static int all, append, dry_run, force, keep, multiple, update_head_ok, 
verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow;
 static const char *depth;
@@ -54,6 +61,17 @@ static int option_parse_recurse_submodules(const struct 
option *opt,
return 0;
 }
 
+static int git_fetch_config(const char *k, const char *v, void *cb)
+{
+   if (!strcmp(k, "fetch.prune")) {
+   int boolval = git_config_bool(k, v);
+   if (boolval)
+   prune = PRUNE_FORCE;
+   return 0;
+   }
+   return git_default_config(k, v, cb);
+}
+
 static struct option builtin_fetch_options[] = {
OPT__VERBOSITY(&verbosity),
OPT_BOOLEAN(0, "all", &all,
@@ -770,7 +788,7 @@ static int do_fetch(struct transport *transport,
retcode = 1;
goto cleanup;
}
-   if (prune) {
+   if (prune == PRUNE_FORCE || (transport->remote->prune && prune)) {
/* If --tags was specified, pretend the user gave us the 
canonical tags refspec */
if (tags == TAGS_SET) {
const char *tags_str = "refs/tags/*:refs/tags/*";
@@ -882,8 +900,10 @@ static void add_options_to_argv(struct argv_array *argv)
 {
if (dry_run)
argv_array_push(argv, "--dry-run");
-   if (prune)
+   if (prune == PRUNE_FORCE)
argv_array_push(argv, "--prune");
+   else if (prune == PRUNE_UNSET)
+   argv_array_push(argv, "--no-prune");
if (update_head_ok)
argv_array_push(argv, "--update-head-ok");
if (force)
@@ -1007,6 +1027,8 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
for (i = 1; i < argc; i++)
strbuf_addf(&default_rla, " %s", argv[i]);
 
+   git_config(git_fetch_config, NULL);
+
argc = parse_options(argc, argv, prefix,
 builtin_fetch_options, builtin_fetch_usage, 0);
 
diff --git a/remote.c b/remote.c
index 6f57830..e6f2acb 100644
--- a/remote.c
+++ b/remote.c
@@ -404,6 +404,8 @@ static int handle_config(const char *key, const char 
*value, void *cb)
remote->skip_default_update = git_config_bool(key, value);
else if (!strcmp(subkey, ".skipfetchall"))
remote->skip_default_update = git_config_bool(key, value);
+   else if (!strcmp(subkey, ".prune"))
+   remote->prune = git_config_bool(key, value);
else if (!strcmp(subkey, ".url")) {
 

Re: [PATCH 4/4] describe/name-rev: tell name-rev to peel the incoming object to commit first

2013-07-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote:
> With this on top of the other patches in this series, you would get:
>
> $ git describe --contains $(git rev-parse v1.8.3 v1.8.3^0)
> v1.8.3
> v1.8.3
>
> while you can still differentiate tags and the commits they point at
> with:
>
> $ git name-rev --refs=tags/\* --name-only $(git rev-parse v1.8.3 v1.8.3^0)
> v1.8.3
> v1.8.3^0
>
> The difference in these two behaviours is achieved by adding --peel-to-commit
> option to "name-rev" and using it when "describe" internally calls it.

Essentially a revert of [2/4] for describe-purposes, achieved by
adding an ugly command-line option to name-rev.  Before we argue any
further, let me ask: who uses name-rev (and depends strongly on its
output)?!  Our very own testsuite does not exercise it.  There are
exactly two users of describe/name-rev:

1. prompt, obviously.
2. DAG-tests, for simplification.

I really can't imagine it being useful elsewhere.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Duy Nguyen
On Mon, Jul 8, 2013 at 7:00 PM, Ramkumar Ramachandra  wrote:
>> This is what I wanted to do with the in for-each-ref's pretty
>> formatting [1]. I used to hack cat-file --batch to extract info I
>> needed for experimenting with various pack index extensions. If you
>> are not in hurry, maybe we can introduce something similar to your
>> syntax, but applicable for all for-each-ref, branch and log family.
>
> I'm still quite confused about this "grand plan".  We have short
> commit-specific format specifiers that don't work with refs, among
> several other quirks in [1].  I personally think we should absolutely
> stay away from short format-specifiers (like %H, %f, %e; we'll soon
> run out of letters, and nobody can tell what they are without the
> documentation anyway) for the new options, and just start adding new
> long-form ones as and when they are necessary.  I think refname:short,
> upstream:track, upstream:trackshort are very sensible choices, and
> that we should continue along that line.  I'm fine with
> format-specifiers having meanings only in certain contexts as long as
> we document it properly (how can we possibly get %(refname) to mean
> something sensible in cat-file?).

The short/long naming is the least I worry about. We could add long
names to pretty specifiers. The thing about the last attempt is, you
add some extra things on top elsewhere, but format_commit_item code
may need to be aware of those changes, which are not obvious when
sombody just focuses on format_commit_item. Having all specifiers in
one place would be better (hence no hooks, no callbacks) because we
get a full picture. And yes we need to deal with specifers that make
no sense in certain context.

All that makes changes bigger, but when format_commit_item (now just
"format_item") knows how to deal with all kinds of objects and refs,
it becomes a small declaration language to extract things out of git.
You can use it for pretty printing or mass extraction in the case of
"cat-file --batch". "cat-file --batch" is just some bonus on top,
mostly to exercise that we do it right.

> As far as this series is concerned, I think Peff can implement %H and
> %(object:[disk-]size) locally without worrying about code-sharing or
> waiting for us.  Then, after the for-each-ref-pretty thing matures, we
> can just replace the code underneath.

There's also syntax sharing. I don't think each command should have
its own syntax. f-e-r already has %(objectsize). If we plan to have a
common syntax, perhaps %(disk-size) should be %(objectsize:disk) or
something. Adding formatting to cat-file --batch from scratch could be
another big chunk of code (that also comes with bugs, usually) and may
or may not be compatible with the common syntax because of some
oversight. --batch-cols=... or --batch-disk-size would be simpler, but
we might never be able to remove that code.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] name-rev: fix assumption about --name-only usage

2013-07-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote:
>> would get name-rev to print output in the same format as describe,
>>
>>   $ git describe --contains --all v1.8.3~1
>>   tags/v1.8.3~1
>>
>> would not strip the leading "tags/".
>
> If you _know_ v1.8.3 does not appear outside "tags/", this does look
> inconsistent, but I do not think the code checks it.  Ahd if the
> code does not, I am not sure not stripping "tags/" is necessarily a
> bad thing, because "--all" allows names to come outside "tags/"
> hierarchy.

Yeah, you asked for it using --all.

> Also how should this interact with v1.8.3-1-g98c5c4a that changed
> the rule somewhat so that the common prefix is stripped when we know
> the result is not ambiguous?

Completely independent of everything else.  The condition is "if
name-only && prefix == refs/tags", strip that prefix.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fixup! git-remote-mediawiki: New git bin-wrapper for developement

2013-07-08 Thread Matthieu Moy
Signed-off-by: Matthieu Moy 
---
> > The colon after "make" and the indentation look weird. Shouldn't this be
> >
> > +# To build and test:
> > +#
> > +#   make
> > +#   bin-wrapper/git mw preview Some_page.mw
> > +#   bin-wrapper/git clone mediawiki::http://example.com/wiki/
> > +#
> >
> > ?
> 
> oops, yes definitely.

Junio, can you apply this fixup to bp/mediawiki-preview?

Thanks,

 contrib/mw-to-git/Makefile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/mw-to-git/Makefile b/contrib/mw-to-git/Makefile
index 775cb07..76fcd4d 100644
--- a/contrib/mw-to-git/Makefile
+++ b/contrib/mw-to-git/Makefile
@@ -4,9 +4,9 @@
 #
 # To build and test:
 #
-#   make:
-# bin-wrapper/git mw preview Some_page.mw
-# bin-wrapper/git clone mediawiki::http://example.com/wiki/
+#   make
+#   bin-wrapper/git mw preview Some_page.mw
+#   bin-wrapper/git clone mediawiki::http://example.com/wiki/
 #
 # To install, run Git's toplevel 'make install' then run:
 #
-- 
1.8.3.1.495.g13f33cf.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/22] read-cache: add index reading api

2013-07-08 Thread Thomas Gummerer
Duy Nguyen  writes:

> On Mon, Jul 8, 2013 at 6:20 PM, Thomas Gummerer  wrote:
>> Duy Nguyen  writes:
>>> Putting filter_opts in index_state feels like a bad design. Iterator
>>> information should be separated from the iterated object, so that two
>>> callers can walk through the same index without stepping on each other
>>> (I'm not talking about multithreading, a caller may walk a bit, then
>>> the other caller starts walking, then the former caller resumes
>>> walking again in a call chain).
>>
>> Yes, you're right.  We need the filter_opts to see what part of the
>> index has been loaded [1] and which part has been skipped, but it
>> shouldn't be used for filtering in the for_each_index_entry function.
>>
>> I think there should be two versions of the for_each_index_entry
>> function then, where the for_each_index_entry function would behave the
>> same way as the for_each_index_entry_filtered function with the
>> filter_opts parameter set to NULL:
>> for_each_index_entry_filtered(struct index_state *, each_cache_entry_fn, 
>> void *cb_data, struct filter_opts *)
>> for_each_index_entry(struct index_state *, each_cache_entry_fn, void 
>> *cb_data)
>>
>> Both of them then should call index_change_filter_opts to make sure all
>> the entries that are needed are loaded in the in-memory format.
>>
>> Does that make sense?
>
> Hmm.. I was confused actually (documentation on the api would help
> greatly). If you already filter at load time, I don't think you need
> to match again. The caller asked for filter and it should know what's
> in there so for_each_index_entry just goes through all entries without
> extra match_pathspec. Or is that what next_index_entry for?
> match_pathspec function could be expensive when glob is involved. If
> the caller wants extra matching, it could do inside the callback
> function.

Yes, a documentation would be good.  I'll try to write something better
later today, when I have some more time.  In the meantime I'll just
outline what the functions do here shortly:

read_index_filtered(opts): This method behaves differently for index-v2
  and index-v5.
  For index-v2 it simply reads the whole index as read_cache() does, so
  we are sure we don't have to reload anything if the user wants a
  different filter.
  For index-v5 it creates an adjusted pathspec to and reads all
  directories that are matched by them.

get_index_entry_by_name(name, namelen, &ce): Returns a cache_entry
  matched by name via the &ce parameter.  If a cache_entry is matched
  exactly 1 is returned.
  Name may also be a path, in which case it returns 0 and the first
  cache_entry in that path. e.g. we have:
  ...
  path/file1
  
in the index and name is "path", than it returns 0 and the path/file1
cache_entry.  If name is "path/file1" on the other hand it returns 1
and the path/file1 cache_entry.

for_each_index_entry(fn, cb_data):  Iterates over all cache_entries in
  the index filtered by filter_opts in the index_state, and executes fn
  for each of them with the cb_data as callback data.

next_index_entry(ce): Returns the cache_entry that follows after ce

index_change_filter_opts(opts): For index-v2 it simply changes the
  filter_opts, so for_each_index_entry uses the changed index_opts.
  For index-v5 it refreshes the index if the filter_opts have changed.
  This has some optimization potential, in the case that the opts get
  stricter (less of the index should be read) it doesn't have to reload
  anything.

I'm not sure what's in the cache, because the whole index is in the
cache if the on-disk format is index-v2 and the index is filtered by the
adjusted_pathspec if the on-disk format is index-v5.  That's what I need
the extra match_pathspec for. But yes, that could also be left to the
caller.

Hope that makes it a little clearer.

> It seems you could change the filter with index_change_filter_opts. In
> v5 the index will be reloaded. What happens when some index entries
> area already modified? Do we start to have read-only index "views" and
> one read-write view? If partial views are always read-only, perhaps we
> just allow the user to create a new index_state (or view) with new
> filter and destroy the old one. We don't have to care about changing
> or separating filter in that case because the view is the iterator.

The read-write part is mostly covered by the next patch (6/22).  Before
changing the index, the filter_opts always have to be set to NULL, using
index_change_filter_opts and therefore use the whole index.  This is
currently hard to improve, because we always need the whole index when
we write it.  Changing this only makes sense once we have partial
writing too.

So in principle the index_change_filter_opts function implements those
views.

Even with partial writing we have to distinguish if a cache_entry has
been added/removed, in which case a full rewrite is necessary or if a
cache_entry has simply been modified (it's content changed), in which
case we cou

Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Ramkumar Ramachandra
Duy Nguyen wrote:
> The short/long naming is the least I worry about. We could add long
> names to pretty specifiers. The thing about the last attempt is, you
> add some extra things on top elsewhere, but format_commit_item code
> may need to be aware of those changes, which are not obvious when
> sombody just focuses on format_commit_item. Having all specifiers in
> one place would be better (hence no hooks, no callbacks) because we
> get a full picture. And yes we need to deal with specifers that make
> no sense in certain context.

Yeah, it would certainly be nice to have all the format-specifiers
that one unified parser acts on, but isn't this just a matter of
refactoring?  Shouldn't we be starting with cheap callbacks, get
things working, and guard against regressions in the refactoring phase
first?  How else do you propose to start out?

> There's also syntax sharing. I don't think each command should have
> its own syntax. f-e-r already has %(objectsize). If we plan to have a
> common syntax, perhaps %(disk-size) should be %(objectsize:disk) or
> something.

Ofcourse.  I didn't notice %(objectsize); %(objectsize[:disk]) is a
fine suggestion.

> Adding formatting to cat-file --batch from scratch could be
> another big chunk of code (that also comes with bugs, usually) and may
> or may not be compatible with the common syntax because of some
> oversight.

Oh, I'm proposing that Peff implements just %H and
%(objectsize[:disk]) for _now_, because that's what he wants.  It
should be a tiny 20-line parser that's easy to swap out.

> --batch-cols=... or --batch-disk-size would be simpler, but
> we might never be able to remove that code.

Agreed.  The approach paints us into a design-corner, and must
therefore be avoided.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git p4 clone not processing branches properly

2013-07-08 Thread Vitor Antunes
On Mon, Jul 8, 2013 at 12:10 PM, Matthieu Brucher
 wrote:
> Without the spec client, it seems that the branches are recognized,
> but there are some many binary files that I need to remove them during
> the migration.
> I tried setting a .gitignore beforehand, but it is not respected (I
> tried to remove some folders with folder/ in .gitignore, but the
> folder are still imported).
> It there a switch for the import somewhere?

Hi Matthieu,

Unfortunately I've never tested the branch detection together with spec
configuration. But there is a test case for it in the code that refers
to the following question in StackOverflow:

http://stackoverflow.com/questions/11893688

Could you also tell us which version of git you are using?

Pete, maybe you can help Matthieu further on this question?

Thanks,
Vitor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] push: avoid suggesting "merging" remote changes

2013-07-08 Thread Matthieu Moy
John Keeping  writes:

>  static const char message_advice_pull_before_push[] =
>   N_("Updates were rejected because the tip of your current branch is 
> behind\n"
> -"its remote counterpart. Merge the remote changes (e.g. 'git 
> pull')\n"
> -"before pushing again.\n"
> +"its remote counterpart. Integrate the remote changes (e.g.\n"
> +"'git pull ...') before pushing again.\n"

To me, "merge" includes "rebase", so I'd say the merge -> integrate
change is not needed, but I have nothing against it either.

The "..." added are a bit weird with the quotes around. Quotes may
suggest that the content is to be taken literally, which is not the case
anymore. Not a real objection anyway, just thinking aloud.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] prompt: do not double-discriminate detached HEAD

2013-07-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote:
> +1; I find red on many terminal emulators to be too dark to tell,
> especially in a small font, from black myself.

It's a matter of taste anyway.  I hope everyone's not going colorblind
from writing too much C89 and Bourne shell ;)

Eduardo R. D'Avila wrote:
> I think color in terminals should be used to highlight and make it easier to 
> see
> textual information, not to replace them. So I would keep the parenthesis.

I largely agree, but there are a few exceptions.  Most notably, have
you noticed how, in addition to font-locking, scheme-mode replaces
lambda with λ?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Corrects an Abort Trap: 6 error with completions.

2013-07-08 Thread Steven Klass
Hi Folks,

Corrects an Abort Trap: 6 error with completions.
Fixed an issue where the two commands on a single line  would cause a 
strange unrelated 'Abort trap: 6' error on  non-git commands on Mac OSX 10.8.

Signed-off-by: Steven Klass 

---
 contrib/completion/git-completion.bash | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
 mode change 100644 => 100755 contrib/completion/git-completion.bash

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
old mode 100644
new mode 100755
index 6c3bafe..d63b1ba
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2165,7 +2165,8 @@ _git_config ()
   user.name
   user.signingkey
   web.browser
-  branch. remote.
+  branch.
+  remote.
  "
 }

---

Steven Klass

(480) 225-1112
skl...@7stalks.com
http://www.7stalks.com

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git p4 clone not processing branches properly

2013-07-08 Thread Matthieu Brucher
Hi,

I tried without spec, but then it tried importing everything, even
though there was a .gitignore and a .git/config/exclude file.
Then, it crashed during the importation because it could find an old
branch (I don't have access to everything on the repository), so I
tried importing just the recent past, but then it failed because it
identified a branch names Branch/Main/src...
It is starting to feel as if I will have to compromise between
something that works but without branches and without the proper names
(the files are named Project/Branch/Main/...) or having the proper
names, but with all binaries, bogus branches...
I know it is not due to git, it is mainly that Perforce and git have
very different workflows. Or perhaps with any luck, the server is up
to date, and I can find a way of using Perforce's bridge.

Thanks,

Matthieu



2013/7/8 Vitor Antunes :
> On Mon, Jul 8, 2013 at 12:10 PM, Matthieu Brucher
>  wrote:
>> Without the spec client, it seems that the branches are recognized,
>> but there are some many binary files that I need to remove them during
>> the migration.
>> I tried setting a .gitignore beforehand, but it is not respected (I
>> tried to remove some folders with folder/ in .gitignore, but the
>> folder are still imported).
>> It there a switch for the import somewhere?
>
> Hi Matthieu,
>
> Unfortunately I've never tested the branch detection together with spec
> configuration. But there is a test case for it in the code that refers
> to the following question in StackOverflow:
>
> http://stackoverflow.com/questions/11893688
>
> Could you also tell us which version of git you are using?
>
> Pete, maybe you can help Matthieu further on this question?
>
> Thanks,
> Vitor



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t0000: do not use export X=Y

2013-07-08 Thread Thomas Rast
Torsten Bögershausen  writes:

> The shell syntax "export X=Y A=B" is not understood by all shells.
>
> Signed-off-by: Torsten Bögershausen 
> ---
>  t/t-basic.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/t/t-basic.sh b/t/t-basic.sh
> index 5c32288..10be52b 100755
> --- a/t/t-basic.sh
> +++ b/t/t-basic.sh
> @@ -53,7 +53,8 @@ run_sub_test_lib_test () {
>   # Pretend we're a test harness.  This prevents
>   # test-lib from writing the counts to a file that will
>   # later be summarized, showing spurious "failed" tests
> - export HARNESS_ACTIVE=t &&
> + HARNESS_ACTIVE=t &&
> + export HARNESS_ACTIVE &&
>   cd "$name" &&
>   cat >"$name.sh" <<-EOF &&
>   #!$SHELL_PATH

Ack.  Sorry for breaking this -- I suppose test-lint would have caught
me out?

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A local shared clone is now much slower

2013-07-08 Thread Junio C Hamano
Duy Nguyen  writes:

> On Mon, Jul 8, 2013 at 2:30 PM, Jeff King  wrote:
>> Subject: [PATCH] clone: drop connectivity check for local clones
>>
>> Commit 0433ad1 (clone: run check_everything_connected,
>> 2013-03-25) added the same connectivity check to clone that
>> we use for fetching. The intent was to provide enough safety
>> checks that "git clone git://..." could be counted on to
>> detect bit errors and other repo corruption, and not
>> silently propagate them to the clone.
>>
>> For local clones, this turns out to be a bad idea, for two
>> reasons:
>>
>>   1. Local clones use hard linking (or even shared object
>>  stores), and so complete far more quickly. The time
>>  spent on the connectivity check is therefore
>>  proportionally much more painful.
>
> There's also byte-to-byte copy when system does not support hardlinks
> (or the user does not want it) but I guess it's safe to trust the OS
> to copy correctly in most cases.

While that may be true, I do not think it matters that much.  The
check during transport is meant to guard against not just corruption
during the object transfer, but also against a corrupt source
repository, and your trust on "cp -R" only covers the "transfer"
part.  And that makes 2. below very relevant.

>>   2. Local clones do not actually meet our safety guarantee
>>  anyway.
>>  ...
>
> Faster clones make everybody happy :-)

Yup.

I think this deserves to be backported to 'maint' track for
1.8.3.x.  Here is an attempt to do so.

 builtin/clone.c   | 11 +++
 t/t5710-info-alternate.sh |  8 +++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 035ab64..38a0a64 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -542,12 +542,15 @@ static void update_remote_refs(const struct ref *refs,
   const struct ref *mapped_refs,
   const struct ref *remote_head_points_at,
   const char *branch_top,
-  const char *msg)
+  const char *msg,
+  int check_connectivity)
 {
const struct ref *rm = mapped_refs;
 
-   if (check_everything_connected(iterate_ref_map, 0, &rm))
-   die(_("remote did not send all necessary objects"));
+   if (check_connectivity) {
+   if (check_everything_connected(iterate_ref_map, 0, &rm))
+   die(_("remote did not send all necessary objects"));
+   }
 
if (refs) {
write_remote_refs(mapped_refs);
@@ -950,7 +953,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
transport_fetch_refs(transport, mapped_refs);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
-  branch_top.buf, reflog_msg.buf);
+  branch_top.buf, reflog_msg.buf, !is_local);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/t/t5710-info-alternate.sh b/t/t5710-info-alternate.sh
index 8956c21..5a6e49d 100755
--- a/t/t5710-info-alternate.sh
+++ b/t/t5710-info-alternate.sh
@@ -58,7 +58,13 @@ test_expect_success 'creating too deep nesting' \
 git clone -l -s D E &&
 git clone -l -s E F &&
 git clone -l -s F G &&
-test_must_fail git clone --bare -l -s G H'
+git clone --bare -l -s G H'
+
+test_expect_success 'invalidity of deepest repository' \
+'cd H && {
+   test_valid_repo
+   test $? -ne 0
+}'
 
 cd "$base_dir"
 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] name-ref: factor out name shortening logic from name_ref()

2013-07-08 Thread Junio C Hamano
Michael Haggerty  writes:

> On 07/08/2013 12:33 AM, Junio C Hamano wrote:
>> The logic will be used in a new codepath for showing exact matches.
>> 
>> Signed-off-by: Junio C Hamano 
>> ---
>>  builtin/name-rev.c | 19 ---
>>  1 file changed, 12 insertions(+), 7 deletions(-)
>> 
>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>> index 87d4854..1234ebb 100644
>> --- a/builtin/name-rev.c
>> +++ b/builtin/name-rev.c
>> @@ -96,6 +96,17 @@ static int subpath_matches(const char *path, const char 
>> *filter)
>>  return -1;
>>  }
>>  
>> +static const char *name_ref_abbrev(const char *refname, int 
>> shorten_unambiguous)
>> +{
>> +if (shorten_unambiguous)
>> +refname = shorten_unambiguous_ref(refname, 0);
>> +else if (!prefixcmp(refname, "refs/heads/"))
>> +refname = refname + 11;
>> +else if (!prefixcmp(refname, "refs/"))
>> +refname = refname + 5;
>> +return refname;
>> +}
>> +
>
> In my opinion this would be a tad clearer if each of the branches of the
> "if" returned the value directly rather than setting refname and relying
> on the "return" statement that follows.  But it's probably a matter of
> taste.

I tend to agree; this is a straight-forward code movement (with the
variable name changed to conform to your recent refs.c update to
call these things "refname"), and that was the primary reason I kept
them like so.

>
>>  struct name_ref_data {
>>  int tags_only;
>>  int name_only;
>> @@ -134,13 +145,7 @@ static int name_ref(const char *path, const unsigned 
>> char *sha1, int flags, void
>>  if (o && o->type == OBJ_COMMIT) {
>>  struct commit *commit = (struct commit *)o;
>>  
>> -if (can_abbreviate_output)
>> -path = shorten_unambiguous_ref(path, 0);
>> -else if (!prefixcmp(path, "refs/heads/"))
>> -path = path + 11;
>> -else if (!prefixcmp(path, "refs/"))
>> -path = path + 5;
>> -
>> +path = name_ref_abbrev(path, can_abbreviate_output);
>>  name_rev(commit, xstrdup(path), 0, 0, deref);
>>  }
>>  return 0;
>> 
>
> Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] name-rev: allow converting the exact object name at the tip of a ref

2013-07-08 Thread Junio C Hamano
Ramkumar Ramachandra  writes:

>   Finds symbolic names suitable for human digestion for revisions
>   given in any format parsable by git rev-parse.
>
> It is meant to name _revisions_ (aka. commits):

That is a mistaken documentation, written based on a half-baked
implementation that conflated "the current code only can handle
commits" with "we need to only handle commit and nothing, ever".

We had to fix a similar misguided/short-sighted implementation in
the "git fetch" codepath when adding "git pull/merge $tag" (the code
peeled a tag object too early when writing FETCH_HEAD out, losing
information if what we were given was a tag or a commit).

I do not want to see us making the mistake worse unnecessarily.
When we see a commit object that is pointed by tag v1.8.3, it is the
right thing to do to return v1.8.3^0 as its string representation so
that "git rev-parse v1.8.3^0" give us the same thing back.  name-rev
that is fed a tag object v1.8.3 should give v1.8.3 (not v1.8.3^0)
back, otherwise feeding its output to "git rev-parse" will not give
us the same thing we fed as the input to name-rev.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] t9902: fix 'test A == B' to use = operator

2013-07-08 Thread Thomas Rast
The == operator as an alias to = is not POSIX.  This doesn't actually
matter for the execution of the script, because it only runs when the
shell is bash.  However, it trips up test-lint, so it's nicer to use
the standard form.

Signed-off-by: Thomas Rast 
---
 t/t9902-completion.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index d9e3103..272a071 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -69,7 +69,7 @@ run_completion ()
local -a COMPREPLY _words
local _cword
_words=( $1 )
-   test "${1: -1}" == ' ' && _words+=('')
+   test "${1: -1}" = ' ' && _words+=('')
(( _cword = ${#_words[@]} - 1 ))
__git_wrap__git_main && print_comp
 }
-- 
1.8.3.2.947.g0347b11

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git subtree push-all and pull-all

2013-07-08 Thread Gareth Collins
Hello Fredrik,

Thanks for the suggestion! Adding in Paul Campbell and Herman Van Rink
who worked on this before.

thanks again,
Gareth

On Sun, Jul 7, 2013 at 8:54 AM, Fredrik Gustafsson  wrote:
> On Wed, Jul 03, 2013 at 03:56:36PM -0400, Gareth Collins wrote:
>> Hello,
>>
>> I see over the last year (on the web and in this mailing list) there
>> was some activity to extend subtree with a .gittrees file and
>> push-all/pull-all commands.
>>
>> Perhaps I missed it, but looking through the latest git code on the
>> github mirror I can't find any reference to the .gittrees file or
>> these commands.
>>
>> Does anyone know the status of this feature? Was it decided that this
>> was a bad idea and the feature has been rejected? Or is this a feature
>> still "cooking"...which will likely make it into git mainline at some
>> point?
>>
>> I ask because I would like to use something like this to be able to
>> keep a combined repository and separate project repositories in sync.
>> Of course, if it was decided that this feature is fundamentally a bad
>> idea then I will do something different.
>>
>> Any pointers would be a big help.
>>
>> thanks in advance,
>> Gareth Collins
>
> Still no answer to this? I suggest that you CC the persons discussing
> this the last time.
>
> --
> Med vänliga hälsningar
> Fredrik Gustafsson
>
> tel: 0733-608274
> e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] test-lint: detect 'export FOO=bar'

2013-07-08 Thread Thomas Rast
Some shells do not understand the one-line construct, and instead need

  FOO=bar &&
  export FOO

Detect this in the test-lint target.

Signed-off-by: Thomas Rast 
---

I wrote:

> Torsten Bögershausen  writes:
[...]
> > -   export HARNESS_ACTIVE=t &&
> > +   HARNESS_ACTIVE=t &&
> > +   export HARNESS_ACTIVE &&
[...]
> Ack.  Sorry for breaking this -- I suppose test-lint would have caught
> me out?

Well, no, not yet.


 t/check-non-portable-shell.pl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index 8b5a71d..45971f4 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -21,6 +21,7 @@ while (<>) {
/^\s*declare\s+/ and err 'arrays/declare not portable';
/^\s*[^#]\s*which\s/ and err 'which is not portable (please use type)';
/test\s+[^=]*==/ and err '"test a == b" is not portable (please use =)';
+   /^\s*export\s+[^=]*=/ and err '"export FOO=bar" is not portable (please 
use FOO=bar && export FOO)';
# this resets our $. for each file
close ARGV if eof;
 }
-- 
1.8.3.2.947.g0347b11

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t0000: do not use export X=Y

2013-07-08 Thread Junio C Hamano
Thomas Rast  writes:

> Torsten Bögershausen  writes:
>
>> The shell syntax "export X=Y A=B" is not understood by all shells.
>>
>> Signed-off-by: Torsten Bögershausen 
>> ---
>>  t/t-basic.sh | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/t/t-basic.sh b/t/t-basic.sh
>> index 5c32288..10be52b 100755
>> --- a/t/t-basic.sh
>> +++ b/t/t-basic.sh
>> @@ -53,7 +53,8 @@ run_sub_test_lib_test () {
>>  # Pretend we're a test harness.  This prevents
>>  # test-lib from writing the counts to a file that will
>>  # later be summarized, showing spurious "failed" tests
>> -export HARNESS_ACTIVE=t &&
>> +HARNESS_ACTIVE=t &&
>> +export HARNESS_ACTIVE &&
>>  cd "$name" &&
>>  cat >"$name.sh" <<-EOF &&
>>  #!$SHELL_PATH
>
> Ack.  Sorry for breaking this -- I suppose test-lint would have caught
> me out?

Or I should have.  Thanks for catching.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] merge -Xindex-only

2013-07-08 Thread Michael Haggerty
[Resend because of address confusion in replied-to email.]

On 07/07/2013 08:00 PM, Thomas Rast wrote:
> I recently looked into making merge-recursive more useful as a modular
> piece in various tasks, e.g. Michael's git-imerge and the experiments
> I made in showing evil merges.
> 
> This miniseries is the extremely low-hanging fruit.  If it makes a
> good first step for git-imerge, perhaps it can go in like this.  It's
> not a big speedup (about 2.2s vs 2.4s in a sample conflicting merge in
> git.git), but it does feel much cleaner to avoid touching the worktree
> unless actually necessary.
> 
> Otherwise it's probably not worth it just yet; for what I want to do
> with it, we need some more reshuffling of things.

Interesting.

For git-imerge, it would be nice to speed up merges by skipping the
working tree updates.  10% might not be so noticeable, but every little
bit helps :-)

But the killer benefit would be if git-imerge could do some of its
automatic merge-testing and autofilling in the background while the user
is resolving conflicts in the main index and working tree.

Is it possible to use this option with an alternate index file (e.g.,
via the GIT_INDEX_FILE environment variable)?  Can it be made to leave
other shared state (e.g., MERGE_HEAD) alone?  If so, maybe it's already
there.

For this feature to be useful for test merges, it would be enough to
just get a retcode saying whether a given merge would succeed or conflict.

For it to be used for autofilling, it would also be necessary to be able
to create a commit from the merge result in the alternate index (this
would only be done when there are no conflicts).  I assume this part can
be done in the usual way using "git commit-tree".

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] pack-revindex: radix-sort the revindex

2013-07-08 Thread Shawn Pearce
On Mon, Jul 8, 2013 at 12:57 AM, Jeff King  wrote:
> On Sun, Jul 07, 2013 at 04:52:23PM -0700, Shawn O. Pearce wrote:
>
>> On Sun, Jul 7, 2013 at 3:14 AM, Jeff King  wrote:
>> > The pack revindex stores the offsets of the objects in the
>> > pack in sorted order, allowing us to easily find the on-disk
>> > size of each object. To compute it, we populate an array
>> > with the offsets from the sha1-sorted idx file, and then use
>> > qsort to order it by offsets.
>> >
>> > That does O(n log n) offset comparisons, and profiling shows
>> > that we spend most of our time in cmp_offset. However, since
>> > we are sorting on a simple off_t, we can use numeric sorts
>> > that perform better. A radix sort can run in O(k*n), where k
>> > is the number of "digits" in our number. For a 64-bit off_t,
>> > using 16-bit "digits" gives us k=4.
>>
>> Did you try the simple bucket sort Colby now uses in JGit?
>>
>> The sort is pretty simple:
>>
>>   bucket_size = pack_length / object_count;
>>   buckets[] = malloc(object_count * sizeof(int));
>>
>>   foreach obj in idx:
>> push_chain(buckets[obj.offset / bucket_size], obj.idx_nth);
>>
>>   foreach bucket:
>> insertion sort by offset
>
> I did do something similar (though I flattened my buckets into a single
> list afterwards), but I ended up closer to 700ms (down from 830ms, but
> with the radix sort around 200ms). It's entirely possible I screwed up
> something in the implementation (the bucket insertion can be done in a
> lot of different ways, many of which are terrible), but I didn't keep a
> copy of that attempt. If you try it and have better numbers, I'd be
> happy to see them.

Colby's sort in Java is coming in around 450ms for linux.git, so
sounds like your implementation was doing something suboptimal.

But as I thought about it this morning, a radix sort for most pack
files should run with k=2 and take only O(2*N) time. It is a very
efficient sort for the data. Colby and I didn't even try a radix sort,
and I suspect it would out-perform the bucket sort we do now.

>> We observed on linux.git that most buckets have an average number of
>> objects. IIRC the bucket_size was ~201 bytes and most buckets had very
>> few objects each. For lookups we keep the bucket_size parameter and a
>> bucket index table. This arrangement uses 8 bytes per object in the
>> reverse index, making it very memory efficient. Searches are typically
>> below O(log N) time because each bucket has 
> I didn't measure lookups at all; I was focused on time to build the
> index. So if there were benefits there that make up for a longer setup
> time, I wouldn't have measured them (of course, we also care about the
> case with few lookups, so it would be a tradeoff).

We didn't measure lookup times either. Colby did compute a histogram
of bucket sizes and showed nearly all buckets were significantly
smaller than log N, so lookups are  You could also leave
> each bucket unsorted and only lazily sort it when a lookup hits the
> bucket, which might help that case (I didn't look to see if you do that
> in JGit).

We didn't do that in JGit, the sort is done at initialization. But
given the remark I just made about clones doing only a few lookups we
may want to defer the sort. IIRC the sort is about half of our
initialization cost.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] merge -Xindex-only

2013-07-08 Thread Thomas Rast
Michael Haggerty  writes:

> [Resend because of address confusion in replied-to email.]
>
> On 07/07/2013 08:00 PM, Thomas Rast wrote:
>> I recently looked into making merge-recursive more useful as a modular
>> piece in various tasks, e.g. Michael's git-imerge and the experiments
>> I made in showing evil merges.
>> 
>> This miniseries is the extremely low-hanging fruit.  If it makes a
>> good first step for git-imerge, perhaps it can go in like this.  It's
>> not a big speedup (about 2.2s vs 2.4s in a sample conflicting merge in
>> git.git), but it does feel much cleaner to avoid touching the worktree
>> unless actually necessary.
>> 
>> Otherwise it's probably not worth it just yet; for what I want to do
>> with it, we need some more reshuffling of things.
>
> Interesting.
>
> For git-imerge, it would be nice to speed up merges by skipping the
> working tree updates.  10% might not be so noticeable, but every little
> bit helps :-)
>
> But the killer benefit would be if git-imerge could do some of its
> automatic merge-testing and autofilling in the background while the user
> is resolving conflicts in the main index and working tree.
>
> Is it possible to use this option with an alternate index file (e.g.,
> via the GIT_INDEX_FILE environment variable)?  Can it be made to leave
> other shared state (e.g., MERGE_HEAD) alone?  If so, maybe it's already
> there.

GIT_INDEX_FILE yes, that one works out of the box.

I think for the shared state, the following is a (probably) ridiculously
unsupported yet magic way of achieving this:

  mkdir -p unshared/.git
  cd unshared/.git
  for f in ../../.git/*; do
case "$f" in
  *HEAD | index)
cp "$f" .
;;
  *)
ln -s "$f" .
;;
esac
  done

That gives you a repository that propagates ref changes and object
writing, but does not propagate changes to index, HEAD, FETCH_HEAD or
MERGE_HEAD.  Which might just be what you need?

Note that as far as I'm concerned, this is a live handgrenade.  It could
blow up in your face at any time, but it probably has its applications...

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


expanding pack idx fanout table

2013-07-08 Thread Shawn Pearce
Has anyone studied the impact of converting the pack idx fanout table
from 256 entries to 65536 entries?

Back of the envelope estimates for 3.1M objects in linux.git suggests
a 2^16 fanout table would decrease the number of binary search
iterations from ~14 to ~6. The increased table costs an extra 255 KiB
of disk. On a 70M idx file this is noise.

I'm starting to wonder if increasing the fanout table once the object
count is above a certain threshold is a reasonable optimization for
larger repositories.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fixup! git-remote-mediawiki: New git bin-wrapper for developement

2013-07-08 Thread Junio C Hamano
Matthieu Moy  writes:

> Signed-off-by: Matthieu Moy 
> ---
>> > The colon after "make" and the indentation look weird. Shouldn't this be
>> >
>> > +# To build and test:
>> > +#
>> > +#   make
>> > +#   bin-wrapper/git mw preview Some_page.mw
>> > +#   bin-wrapper/git clone mediawiki::http://example.com/wiki/
>> > +#
>> >
>> > ?
>> 
>> oops, yes definitely.
>
> Junio, can you apply this fixup to bp/mediawiki-preview?

Will do; thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 w/prune index] remote.c: avoid O(m*n) behavior in match_push_refs

2013-07-08 Thread Junio C Hamano
Brandon Casey  writes:

> ...
> Using an index takes 41 ms longer, or roughly 7.8% longer.
>
> Jeff King measured a no-op push of a single ref into a remote repo
> with 370,000 refs:
>
> beforeafter
> real0m1.087s  0m1.156s
> user0m1.344s  0m1.412s
> sys 0m0.288s  0m0.284s
>
> Using an index takes 69 ms longer, or roughly 6.3% longer.
>
> None of the measurements above required transferring any objects to
> the remote repository.  If the push required transferring objects and
> updating the refs in the remote repository, the impact of preparing
> the search index would be even smaller.
>
> A similar operation is performed in the reverse direction when pruning
> using a matching or pattern refspec.  Let's avoid O(m*n) behavior in
> the same way by lazily preparing an index on the local refs.

Thanks.  Both the explanation and the code change makes sense to me.

Will queue.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] allow git-svn fetching to work using serf

2013-07-08 Thread Junio C Hamano
"Kyle J. McKay"  writes:

> From: "Kyle J. McKay" 
>
> This patch allows git-svn to fetch successfully using the
> serf library when given an https?: url to fetch from.
>
> Unfortunately some svn servers do not seem to be configured
> well for use with the serf library.  This can cause fetching
> to take longer compared to the neon library or actually
> cause timeouts during the fetch.  When timeouts occur
> git-svn can be safely restarted to fetch more revisions.
>
> A new temp_is_locked function has been added to Git.pm
> to facilitate using the minimal number of temp files
> possible when using serf.
>
> The problem that occurs when running git-svn fetch using
> the serf library is that the previously used temp file
> is not always unlocked before the next temp file needs
> to be used.
>
> To work around this problem, a new temp name is used
> if the temp name that would otherwise be chosen is
> currently locked.
>
> Version v2 of the patch introduced a bug when changing the _temp_cache
> function to use the new temp_is_locked function at the suggestion of a
> reviewer.  That has now been resolved.

Thanks; I've queued this version to 'pu' at least tentatively.

Is everybody who discussed the issue happy with the direction of
this patch?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/22] make sure partially read index is not changed

2013-07-08 Thread Junio C Hamano
Thomas Gummerer  writes:

> A partially read index file currently cannot be written to disk.  Make
> sure that never happens, by re-reading the index file if the index file
> wasn't read completely before changing the in-memory index.

I am not quite sure what you are trying to do.  

In operations that modify the index (replace_index_entry(),
remove_index_entry_at(), etc.)  you lift the filter_ops and keep
partially_read flag still on.  In the write-out codepath, you have
an assert to make sure the caller has cleared the partially_read
flag.  A natural way to clear the flag is to re-read the index from
the file, but then you can easily lose the modifications.  Should
there be another safety that says "calling read_index() with the
partially_read flag on is a bug" or something?

Also shouldn't the flag be cleared upon discard_index()?  If it is
done there, you probably would not need to clear it in read_index().

>
> Signed-off-by: Thomas Gummerer 
> ---
>  builtin/update-index.c | 4 
>  cache.h| 4 +++-
>  read-cache-v2.c| 3 +++
>  read-cache.c   | 8 
>  4 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index 5c7762e..03f6426 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -49,6 +49,8 @@ static int mark_ce_flags(const char *path, int flag, int 
> mark)
>   int namelen = strlen(path);
>   int pos = cache_name_pos(path, namelen);
>   if (0 <= pos) {
> + if (active_cache_partially_read)
> + cache_change_filter_opts(NULL);
>   if (mark)
>   active_cache[pos]->ce_flags |= flag;
>   else
> @@ -253,6 +255,8 @@ static void chmod_path(int flip, const char *path)
>   pos = cache_name_pos(path, strlen(path));
>   if (pos < 0)
>   goto fail;
> + if (active_cache_partially_read)
> + cache_change_filter_opts(NULL);
>   ce = active_cache[pos];
>   mode = ce->ce_mode;
>   if (!S_ISREG(mode))
> diff --git a/cache.h b/cache.h
> index d38dfbd..f6c3407 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -293,7 +293,8 @@ struct index_state {
>   struct cache_tree *cache_tree;
>   struct cache_time timestamp;
>   unsigned name_hash_initialized : 1,
> -  initialized : 1;
> +  initialized : 1,
> +  partially_read : 1;
>   struct hash_table name_hash;
>   struct hash_table dir_hash;
>   struct index_ops *ops;
> @@ -315,6 +316,7 @@ extern void free_name_hash(struct index_state *istate);
>  #define active_alloc (the_index.cache_alloc)
>  #define active_cache_changed (the_index.cache_changed)
>  #define active_cache_tree (the_index.cache_tree)
> +#define active_cache_partially_read (the_index.partially_read)
>  
>  #define read_cache() read_index(&the_index)
>  #define read_cache_from(path) read_index_from(&the_index, (path))
> diff --git a/read-cache-v2.c b/read-cache-v2.c
> index 1ed640d..2cc792d 100644
> --- a/read-cache-v2.c
> +++ b/read-cache-v2.c
> @@ -273,6 +273,7 @@ static int read_index_v2(struct index_state *istate, void 
> *mmap,
>   src_offset += 8;
>   src_offset += extsize;
>   }
> + istate->partially_read = 0;
>   return 0;
>  unmap:
>   munmap(mmap, mmap_size);
> @@ -495,6 +496,8 @@ static int write_index_v2(struct index_state *istate, int 
> newfd)
>   struct stat st;
>   struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
>  
> + if (istate->partially_read)
> + die("BUG: index: cannot write a partially read index");
>   for (i = removed = extended = 0; i < entries; i++) {
>   if (cache[i]->ce_flags & CE_REMOVE)
>   removed++;
> diff --git a/read-cache.c b/read-cache.c
> index b30ee75..4529fab 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -30,6 +30,8 @@ static void replace_index_entry(struct index_state *istate, 
> int nr, struct cache
>  {
>   struct cache_entry *old = istate->cache[nr];
>  
> + if (istate->partially_read)
> + index_change_filter_opts(istate, NULL);
>   remove_name_hash(istate, old);
>   set_index_entry(istate, nr, ce);
>   istate->cache_changed = 1;
> @@ -467,6 +469,8 @@ int remove_index_entry_at(struct index_state *istate, int 
> pos)
>  {
>   struct cache_entry *ce = istate->cache[pos];
>  
> + if (istate->partially_read)
> + index_change_filter_opts(istate, NULL);
>   record_resolve_undo(istate, ce);
>   remove_name_hash(istate, ce);
>   istate->cache_changed = 1;
> @@ -978,6 +982,8 @@ int add_index_entry(struct index_state *istate, struct 
> cache_entry *ce, int opti
>  {
>   int pos;
>  
> + if (istate->partially_read)
> + index_change_filter_opts(istate, NULL);
>   if (option & ADD_CACHE_JUST_APPEND)
>   pos = istate->cache_nr;
>   else {
> @@ -1184,6 +119

Re: [PATCH 05/22] read-cache: add index reading api

2013-07-08 Thread Junio C Hamano
Thomas Gummerer  writes:

> Add an api for access to the index file.  Currently there is only a very
> basic api for accessing the index file, which only allows a full read of
> the index, and lets the users of the data filter it.  The new index api
> gives the users the possibility to use only part of the index and
> provides functions for iterating over and accessing cache entries.
>
> This simplifies future improvements to the in-memory format, as changes
> will be concentrated on one file, instead of the whole git source code.
>
> Signed-off-by: Thomas Gummerer 
> ---
>  cache.h |  57 +-
>  read-cache-v2.c |  96 +++--
>  read-cache.c| 108 
> 
>  read-cache.h|  12 ++-
>  4 files changed, 263 insertions(+), 10 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 5082b34..d38dfbd 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -127,7 +127,8 @@ struct cache_entry {
>   unsigned int ce_flags;
>   unsigned int ce_namelen;
>   unsigned char sha1[20];
> - struct cache_entry *next;
> + struct cache_entry *next; /* used by name_hash */
> + struct cache_entry *next_ce; /* used to keep a list of cache entries */

The reader often needs to rewind the read-pointer partially while
walking the index (e.g. next_cache_entry() in unpack-trees.c and how
the o->cache_bottom position is used throughout the subsystem).  I
am not sure if this singly-linked list is a good way to go.

> +/*
> + * Options by which the index should be filtered when read partially.
> + *
> + * pathspec: The pathspec which the index entries have to match
> + * seen: Used to return the seen parameter from match_pathspec()
> + * max_prefix, max_prefix_len: These variables are set to the longest
> + * common prefix, the length of the longest common prefix of the
> + * given pathspec

These probably should use "struct pathspec" abstration, not just the
"array of raw strings", no?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Junio C Hamano
Duy Nguyen  writes:

> There's also syntax sharing. I don't think each command should have
> its own syntax. f-e-r already has %(objectsize). If we plan to have a
> common syntax, perhaps %(disk-size) should be %(objectsize:disk) or
> something. Adding formatting to cat-file --batch from scratch could be
> another big chunk of code (that also comes with bugs, usually) and may
> or may not be compatible with the common syntax because of some
> oversight. --batch-cols=... or --batch-disk-size would be simpler, but
> we might never be able to remove that code.

True, but cat-file being a low-level plumbing, I actually am not all
that convinced that it should even know the custom formatting.
Configurability and customizability may look always good, but that
is true only until one realizes that they come with associated cost.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] t9902: fix 'test A == B' to use = operator

2013-07-08 Thread Junio C Hamano
Thomas Rast  writes:

> The == operator as an alias to = is not POSIX.  This doesn't actually
> matter for the execution of the script, because it only runs when the
> shell is bash.  However, it trips up test-lint, so it's nicer to use
> the standard form.

OK, my knee-jerk reaction was "this is only for bash" as you said,
but the test-lint part I agree with.

But then test-lint _ought_ to also catch the use of "local" in the
ideal world, so perhaps in the longer term we would need to treat
this bash-only script differently from others anyway???

>
> Signed-off-by: Thomas Rast 
> ---
>  t/t9902-completion.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
> index d9e3103..272a071 100755
> --- a/t/t9902-completion.sh
> +++ b/t/t9902-completion.sh
> @@ -69,7 +69,7 @@ run_completion ()
>   local -a COMPREPLY _words
>   local _cword
>   _words=( $1 )
> - test "${1: -1}" == ' ' && _words+=('')
> + test "${1: -1}" = ' ' && _words+=('')
>   (( _cword = ${#_words[@]} - 1 ))
>   __git_wrap__git_main && print_comp
>  }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Corrects an Abort Trap: 6 error with completions.

2013-07-08 Thread Junio C Hamano
Steven Klass  writes:

> Hi Folks,
>
>   Corrects an Abort Trap: 6 error with completions.
>   Fixed an issue where the two commands on a single line  would cause a 
> strange unrelated 'Abort trap: 6' error on  non-git commands on Mac OSX 10.8.
>
> Signed-off-by: Steven Klass 

Can you explain how/why the original causes "abort trap: 6"
(whatever it is) and how/why the updated one avoids it in the log
message?

It also is not quite clear when the error happens.  Do you mean, by
"non-git commands", something like:

$ ca

does not complete to "cal", "case", "cat", etc. and instead breaks
the shell?

I am confused.  The only change I can see in the patch is that it
makes the argument to this call to the __gitcomp shell function be a
string with tokens separated by LF and HT and no SP (the original
assumes that the tokens will be split by LF, HT or SP, and the shell
function locally sets $IFS to make sure that the change in this
patch does not make any difference).  And in many other places in
the same script, the __gitcomp shell function is called with an
argument with LF, HT or SP spearated tokens, e.g.

_git_add ()
{
case "$cur" in
--*)
__gitcomp "
--interactive --refresh --patch --update --dry-run
--ignore-errors --intent-to-add
"
return
esac



>
> ---
>  contrib/completion/git-completion.bash | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>  mode change 100644 => 100755 contrib/completion/git-completion.bash
>
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> old mode 100644
> new mode 100755
> index 6c3bafe..d63b1ba
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -2165,7 +2165,8 @@ _git_config ()
>user.name
>user.signingkey
>web.browser
> -  branch. remote.
> +  branch.
> +  remote.
>   "
>  }
>
> ---
>
> Steven Klass
>
> (480) 225-1112
> skl...@7stalks.com
> http://www.7stalks.com
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expanding pack idx fanout table

2013-07-08 Thread Junio C Hamano
Shawn Pearce  writes:

> Has anyone studied the impact of converting the pack idx fanout table
> from 256 entries to 65536 entries?
>
> Back of the envelope estimates for 3.1M objects in linux.git suggests
> a 2^16 fanout table would decrease the number of binary search
> iterations from ~14 to ~6. The increased table costs an extra 255 KiB
> of disk. On a 70M idx file this is noise.
>
> I'm starting to wonder if increasing the fanout table once the object
> count is above a certain threshold is a reasonable optimization for
> larger repositories.

Yeah, and I do not think we have to be worried too much about
backward compatibility for .idx files, as they are local and can be
regenerated if an older version cannot read it.

I also wonder if we can generate a finer-grained fan-out table on
the fly, perhaps lazily, without changing the on-disk format ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Corrects an Abort Trap: 6 error with completions.

2013-07-08 Thread Steven Klass
Hi Junio,

First - Thanks so much for your reply!  

The original cause was simply running a non-related command.  
Specifically I was trying to build some internal software.  Our internal 
software uses a build build tool from perforce called jam 
(http://www.perforce.com/resources/documentation/jam).  Similar to the make 
build tool this looks up a Jamfile (ergo Makefile) and does the recipe.  The 
issue is that running `jam` fatally aborts right from the start with an 
unintuitive `abort trap: 6` error.  

As you can well imagine figuring out what was causing this was pretty 
difficult - because both the error was cryptic and so unrelated.  I started 
backing out my environment until I was able to narrow this down to the 
bash_completions file.  Further investigation finally nailed it on the 
concatenation of the  `branch. remote.` line 2165.In summary when the lines 
were joined it showed the error, when they were on separate lines everything 
processed as expected.

Again - my only change was to shift them (  branch. remote. ) to two 
separate lines.

HTH

---

Steven Klass

(480) 225-1112
skl...@7stalks.com
http://www.7stalks.com

On Jul 8, 2013, at 10:32 AM, Junio C Hamano  wrote:

> Steven Klass  writes:
> 
>> Hi Folks,
>> 
>>  Corrects an Abort Trap: 6 error with completions.
>>  Fixed an issue where the two commands on a single line  would cause a 
>> strange unrelated 'Abort trap: 6' error on  non-git commands on Mac OSX 10.8.
>> 
>> Signed-off-by: Steven Klass 
> 
> Can you explain how/why the original causes "abort trap: 6"
> (whatever it is) and how/why the updated one avoids it in the log
> message?
> 
> It also is not quite clear when the error happens.  Do you mean, by
> "non-git commands", something like:
> 
>$ ca
> 
> does not complete to "cal", "case", "cat", etc. and instead breaks
> the shell?
> 
> I am confused.  The only change I can see in the patch is that it
> makes the argument to this call to the __gitcomp shell function be a
> string with tokens separated by LF and HT and no SP (the original
> assumes that the tokens will be split by LF, HT or SP, and the shell
> function locally sets $IFS to make sure that the change in this
> patch does not make any difference).  And in many other places in
> the same script, the __gitcomp shell function is called with an
> argument with LF, HT or SP spearated tokens, e.g.
> 
>_git_add ()
>{
>case "$cur" in
>--*)
>__gitcomp "
>--interactive --refresh --patch --update --dry-run
>--ignore-errors --intent-to-add
>"
>return
>esac
> 
> 
> 
>> 
>> ---
>> contrib/completion/git-completion.bash | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> mode change 100644 => 100755 contrib/completion/git-completion.bash
>> 
>> diff --git a/contrib/completion/git-completion.bash 
>> b/contrib/completion/git-completion.bash
>> old mode 100644
>> new mode 100755
>> index 6c3bafe..d63b1ba
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -2165,7 +2165,8 @@ _git_config ()
>>   user.name
>>   user.signingkey
>>   web.browser
>> -  branch. remote.
>> +  branch.
>> +  remote.
>>  "
>> }
>> 
>> ---
>> 
>> Steven Klass
>> 
>> (480) 225-1112
>> skl...@7stalks.com
>> http://www.7stalks.com

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expanding pack idx fanout table

2013-07-08 Thread Shawn Pearce
On Mon, Jul 8, 2013 at 10:37 AM, Junio C Hamano  wrote:
> Shawn Pearce  writes:
>
>> Has anyone studied the impact of converting the pack idx fanout table
>> from 256 entries to 65536 entries?
>>
>> Back of the envelope estimates for 3.1M objects in linux.git suggests
>> a 2^16 fanout table would decrease the number of binary search
>> iterations from ~14 to ~6. The increased table costs an extra 255 KiB
>> of disk. On a 70M idx file this is noise.
>>
>> I'm starting to wonder if increasing the fanout table once the object
>> count is above a certain threshold is a reasonable optimization for
>> larger repositories.
>
> Yeah, and I do not think we have to be worried too much about
> backward compatibility for .idx files, as they are local and can be
> regenerated if an older version cannot read it.
>
> I also wonder if we can generate a finer-grained fan-out table on
> the fly, perhaps lazily, without changing the on-disk format ;-)

Heh. Yes. The reader at runtime could expand the 256 fanout table to a
65536 fanout table without changing the on disk format. Unfortunately
doing the expansion will require O(N) time as the 2nd byte of each
SHA-1 must be examined from every bucket. If you are going to perform
O(N) lookups this expansion might save time as it lowers the "log N"
bound of the O(N log N) algorithm for N lookups. It doesn't help
enough when doing only N/20 lookups.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/22] make sure partially read index is not changed

2013-07-08 Thread Thomas Gummerer
Junio C Hamano  writes:

> Thomas Gummerer  writes:
>
>> A partially read index file currently cannot be written to disk.  Make
>> sure that never happens, by re-reading the index file if the index file
>> wasn't read completely before changing the in-memory index.
>
> I am not quite sure what you are trying to do.
>
> In operations that modify the index (replace_index_entry(),
> remove_index_entry_at(), etc.)  you lift the filter_ops and keep
> partially_read flag still on.  In the write-out codepath, you have
> an assert to make sure the caller has cleared the partially_read
> flag.  A natural way to clear the flag is to re-read the index from
> the file, but then you can easily lose the modifications.
>
> Also shouldn't the flag be cleared upon discard_index()?  If it is
> done there, you probably would not need to clear it in read_index().

Hrm, maybe the code isn't quite clear enough here, or maybe the patch
should come directly before (16/22) read-cache: read index-v5 to be more
clear.

The flag is always set to 0 in read_index_v2, as the whole index is
always read and therefore it never needs to be reset.  With
read_index_v5 on the other hand the flag is set when the filter_opts are
different than NULL.

But thinking about it, the flag is actually not necessary at all.  The
filter_opts should simply be checked for NULL for the assert and they
should also be set to NULL on discard_index.  Will fix this in the next
version.  Thanks.

> Should
> there be another safety that says "calling read_index() with the
> partially_read flag on is a bug" or something?

I'm not sure.  I think it doesn't hurt, as we discard the index when
we change the index_ops.  At the moment I can't think of a case where
where calling read_index() could be used when it's partially read
without discarding the cache first.  I'll add it in the next version.

>>
>> Signed-off-by: Thomas Gummerer 
>> ---
>>  builtin/update-index.c | 4 
>>  cache.h| 4 +++-
>>  read-cache-v2.c| 3 +++
>>  read-cache.c   | 8 
>>  4 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/builtin/update-index.c b/builtin/update-index.c
>> index 5c7762e..03f6426 100644
>> --- a/builtin/update-index.c
>> +++ b/builtin/update-index.c
>> @@ -49,6 +49,8 @@ static int mark_ce_flags(const char *path, int flag, int 
>> mark)
>>  int namelen = strlen(path);
>>  int pos = cache_name_pos(path, namelen);
>>  if (0 <= pos) {
>> +if (active_cache_partially_read)
>> +cache_change_filter_opts(NULL);
>>  if (mark)
>>  active_cache[pos]->ce_flags |= flag;
>>  else
>> @@ -253,6 +255,8 @@ static void chmod_path(int flip, const char *path)
>>  pos = cache_name_pos(path, strlen(path));
>>  if (pos < 0)
>>  goto fail;
>> +if (active_cache_partially_read)
>> +cache_change_filter_opts(NULL);
>>  ce = active_cache[pos];
>>  mode = ce->ce_mode;
>>  if (!S_ISREG(mode))
>> diff --git a/cache.h b/cache.h
>> index d38dfbd..f6c3407 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -293,7 +293,8 @@ struct index_state {
>>  struct cache_tree *cache_tree;
>>  struct cache_time timestamp;
>>  unsigned name_hash_initialized : 1,
>> - initialized : 1;
>> + initialized : 1,
>> + partially_read : 1;
>>  struct hash_table name_hash;
>>  struct hash_table dir_hash;
>>  struct index_ops *ops;
>> @@ -315,6 +316,7 @@ extern void free_name_hash(struct index_state *istate);
>>  #define active_alloc (the_index.cache_alloc)
>>  #define active_cache_changed (the_index.cache_changed)
>>  #define active_cache_tree (the_index.cache_tree)
>> +#define active_cache_partially_read (the_index.partially_read)
>>
>>  #define read_cache() read_index(&the_index)
>>  #define read_cache_from(path) read_index_from(&the_index, (path))
>> diff --git a/read-cache-v2.c b/read-cache-v2.c
>> index 1ed640d..2cc792d 100644
>> --- a/read-cache-v2.c
>> +++ b/read-cache-v2.c
>> @@ -273,6 +273,7 @@ static int read_index_v2(struct index_state *istate, 
>> void *mmap,
>>  src_offset += 8;
>>  src_offset += extsize;
>>  }
>> +istate->partially_read = 0;
>>  return 0;
>>  unmap:
>>  munmap(mmap, mmap_size);
>> @@ -495,6 +496,8 @@ static int write_index_v2(struct index_state *istate, 
>> int newfd)
>>  struct stat st;
>>  struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
>>
>> +if (istate->partially_read)
>> +die("BUG: index: cannot write a partially read index");
>>  for (i = removed = extended = 0; i < entries; i++) {
>>  if (cache[i]->ce_flags & CE_REMOVE)
>>  removed++;
>> diff --git a/read-cache.c b/read-cache.c
>> index b30ee75..4529fab 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -30,6 +30,8 @@ static void replace_index_entry(struct index_state 
>> *istate, int nr,

Re: [RFC/PATCH] fetch: make --prune configurable

2013-07-08 Thread Junio C Hamano
Michael Schubert  writes:

> $gmane/201715 brought up the idea to fetch --prune by default.

When you can summarize it in a few lines, e.g.

Without "git fetch --prune", remote-tracking branches for a branch
the other side already has removed will stay forever.  Some people
want to always run "git fetch --prune".

please refrain from forcing people to go to the web while reading
logs.

> Since --prune is a "potentially destructive operation" (Git doesn't
> keep reflogs for deleted references yet), we don't want to prune
> without users consent.
>
> To accommodate users who want to either prune always or when fetching
> from a particular remote, add two new configuration variables
> "fetch.prune" and "remote..prune":
>
>  - "fetch.prune" allows to enable prune for all fetch operations.
>
>  - "remote..prune" allows to change the behaviour per remote.
>

Add:

"git fetch --no-prune" from the command line will defeat the
configured default for safety.

(I didn't check if your patch already does so, though).

Other than that, the log message looks good.

Thanks for starting to work on this.

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index b4d4887..74e8026 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1067,6 +1067,10 @@ fetch.unpackLimit::
>   especially on slow filesystems.  If not set, the value of
>   `transfer.unpackLimit` is used instead.
>  
> +fetch.prune::
> + If true, fetch will automatically behave as if the `--prune`
> + option was given on the command line.
> +
>  format.attach::
>   Enable multipart/mixed attachments as the default for
>   'format-patch'.  The value can also be a double quoted string
> @@ -2010,6 +2014,11 @@ remote..vcs::
>   Setting this to a value  will cause Git to interact with
>   the remote with the git-remote- helper.
>  
> +remote..prune::
> + When set to true, fetching from this remote by default will also
> + remove any remote-tracking branches which no longer exist on the
> + remote (as if the `--prune` option was give on the command line).

We may want to say something about interaction between the two
variables.  E.g. fetch.prune=true, remote.origin.prune=false would
hopefully not to prune when you are fetching from your 'origin', and
fetch.prune=false, remote.origin.prune=true would.

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index d784b2e..3953317 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -30,7 +30,14 @@ enum {
>   TAGS_SET = 2
>  };
>  
> -static int all, append, dry_run, force, keep, multiple, prune, 
> update_head_ok, verbosity;
> +enum {
> + PRUNE_UNSET = 0,
> + PRUNE_DEFAULT = 1,
> + PRUNE_FORCE = 2
> +};
> +
> +static int prune = PRUNE_DEFAULT;

I find this unconventional in that usually _UNSET means "the user
hasn't explicitly said anything about what she wants" (hence
typically a variable is initialized to that value).  Also I am not
sure what "FORCE" means.

If this were 

enum {
PRUNE_UNSET = -1,
PRUNE_NO = 0,
PRUNE_YES = 1
};

then I would understand, but at that point, that is a typical
setting for a boolean variable, so we could just use -1/0/1.

> +static int all, append, dry_run, force, keep, multiple, update_head_ok, 
> verbosity;
>  static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
>  static int tags = TAGS_DEFAULT, unshallow;
>  static const char *depth;
> @@ -54,6 +61,17 @@ static int option_parse_recurse_submodules(const struct 
> option *opt,
>   return 0;
>  }
>  
> +static int git_fetch_config(const char *k, const char *v, void *cb)
> +{
> + if (!strcmp(k, "fetch.prune")) {
> + int boolval = git_config_bool(k, v);
> + if (boolval)
> + prune = PRUNE_FORCE;
> + return 0;

This is not good, is it?  Imagine fetch.prune=true in ~/.gitconfig
and fetch.prune=false in $GIT_DIR/config; I'd expect the more
specific one to set "prune" back to non-FORCE value.

As you do not have transport available before you process
parse_options(), I think you need two variables, "prune" that is
used to determine what happens (same as in the code before your
patch) and a new one "fetch_prune_config" that records what we read
from the fetch.prune configuration.

So I _suspect_ the interaction between the configuration parser and
the command line option parser should look more like this, in order
to implement the correct order of precedence:

static int fetch_prune_config = -1; /* unspecified */
static int prune = -1; /* unspecified */
#define PRUNE_BY_DEFAULT 0

...

/* set "fetch_prune_config" */
git_config(git_fetch_config);
--> git_fetch_config():
if (!strcmp(k, "fetch.prune")) {
fetch_prune_config = git_config_bool(k, v);
return 0;
}

...

/* 

Re: [PATCH 1/2] push: avoid suggesting "merging" remote changes

2013-07-08 Thread John Keeping
On Mon, Jul 08, 2013 at 03:47:19PM +0200, Matthieu Moy wrote:
> John Keeping  writes:
> 
> >  static const char message_advice_pull_before_push[] =
> > N_("Updates were rejected because the tip of your current branch is 
> > behind\n"
> > -  "its remote counterpart. Merge the remote changes (e.g. 'git 
> > pull')\n"
> > -  "before pushing again.\n"
> > +  "its remote counterpart. Integrate the remote changes (e.g.\n"
> > +  "'git pull ...') before pushing again.\n"
> 
> To me, "merge" includes "rebase", so I'd say the merge -> integrate
> change is not needed, but I have nothing against it either.

Yes, I agree that in some sense "merge" does include "rebase" but I
suspect some people read it to mean "git merge" and saying "integrate"
removes that potential source of confusion.

> The "..." added are a bit weird with the quotes around. Quotes may
> suggest that the content is to be taken literally, which is not the case
> anymore. Not a real objection anyway, just thinking aloud.

I hadn't thought of it that way, but I wonder how else we can delimit
the command.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] fetch: make --prune configurable

2013-07-08 Thread John Keeping
On Mon, Jul 08, 2013 at 02:56:57PM +0200, Michael Schubert wrote:
> $gmane/201715 brought up the idea to fetch --prune by default.
> Since --prune is a "potentially destructive operation" (Git doesn't
> keep reflogs for deleted references yet), we don't want to prune
> without users consent.
> 
> To accommodate users who want to either prune always or when fetching
> from a particular remote, add two new configuration variables
> "fetch.prune" and "remote..prune":
> 
>  - "fetch.prune" allows to enable prune for all fetch operations.
> 
>  - "remote..prune" allows to change the behaviour per remote.

Should this be "remote..pruneFetch"?  I'd quite like to be able to
configure --prune for git-push as well (I just haven't got around to
actually doing anything about it yet...) and it might be better to be
explicit in the remote. section from the start.

I'm not sure it's necessary since we already have "remote" and
"pushremote" so we could have "prune" and "pushprune" but perhaps it's
worth considering.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] test-lint: detect 'export FOO=bar'

2013-07-08 Thread Torsten Bögershausen
On 2013-07-08 17.20, Thomas Rast wrote:
> Some shells do not understand the one-line construct, and instead need
> 
>   FOO=bar &&
>   export FOO
> 
> Detect this in the test-lint target.
> 
> Signed-off-by: Thomas Rast 
> ---
> 
> I wrote:
> 
>> Torsten Bögershausen  writes:
> [...]
>>> -   export HARNESS_ACTIVE=t &&
>>> +   HARNESS_ACTIVE=t &&
>>> +   export HARNESS_ACTIVE &&
> [...]
>> Ack.  Sorry for breaking this -- I suppose test-lint would have caught
>> me out?
> 
> Well, no, not yet.
Thanks for working on this

> 
> 
>  t/check-non-portable-shell.pl | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
> index 8b5a71d..45971f4 100755
> --- a/t/check-non-portable-shell.pl
> +++ b/t/check-non-portable-shell.pl
> @@ -21,6 +21,7 @@ while (<>) {
>   /^\s*declare\s+/ and err 'arrays/declare not portable';
>   /^\s*[^#]\s*which\s/ and err 'which is not portable (please use type)';
>   /test\s+[^=]*==/ and err '"test a == b" is not portable (please use =)';
> + /^\s*export\s+[^=]*=/ and err '"export FOO=bar" is not portable (please 
> use FOO=bar && export FOO)';
I have a slightly tighter reg exp in my tree, but credits should go to Thomas: 

/^\s*export\s+\S+=\S+/

/Torsten





--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints

2013-07-08 Thread Junio C Hamano
Peter Krefting  writes:

> brian m. carlson:
>
>> +/* U+FFFE and U+ are guaranteed non-characters. */
>> +if ((codepoint & 0x1e) == 0xfffe)
>> +return bad_offset;
>
> I missed this the first time around: All Unicode characters whose
> lower 16-bits are FFFE or  are non-characters, so you can re-write
> that to:
>
>   /* U+xxFFFE and U+xx are guaranteed non-characters. */
>   if ((codepoint & 0xfffe) == 0xfffe)
>return bad_offset;
>
> Also, the range U+FDD0--U+FDEF are also non-characters, if you wish to
> be really pedantic.

Yeah, while we are at it, doing this may not hurt.  I think Brian's
two patches are in fairly good shape otherwise, so perhaps you can
do this as a follow-up patch on top of the tip of the topic,
e82bd6cc (commit: reject overlong UTF-8 sequences, 2013-07-04)?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] git svn geotrust certificate problem

2013-07-08 Thread Fredrik Gustafsson
On Fri, Jul 05, 2013 at 07:16:01PM +0400, Ilya Holinov wrote:
> I have svn repository on https singed with GeoTrust issued certificate.
> Every time i try to access this repository i have message :
> 
> $ git svn rebase
> Error validating server certificate for 'https://svn.egspace.ru:443':
>  - The certificate is not issued by a trusted authority. Use the
>fingerprint to validate the certificate manually!
> Certificate information:
>  - Hostname: *.egspace.ru
>  - Valid: from Apr 28 01:38:17 2013 GMT until Apr 30 12:00:40 2014 GMT
>  - Issuer: GeoTrust, Inc., US
>  - Fingerprint: b2:8d:f8:3b:7c:d2:a2:36:e2:1d:c3:5c:56:ec:87:6f:22:3e:4b:a8
> Certificate problem.
> (R)eject, accept (t)emporarily or accept (p)ermanently? p
> Authentication realm:  VisualSVN Server
> Username: holinov
> Password for 'holinov':
> 
> Even if i choose permanently every next attempt to access in i have
> same issue. And this happens on svn rebase on every commit. I mean if
> i have 10 commits in local repository i will be asked about cert and
> user login\passwor for every one of them (and that's is verry
> annoying).
> But if i use TortoiseSVN i have no problem with checking that cert.
> 
> P.S.: I'm using Windows 8 x64.
> P.P.S: I like git very much but in this case it makes me impossible to
> work in this way.

This isn't really my thing to answer, I don't know windows well enough.
However since you still haven't got an answer I'll give it a try.

Please see the following link:
https://confluence.atlassian.com/display/SOURCETREEKB/Resolving+SSL+Self-Signed+Certificate+Errors

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/22] read-cache: add index reading api

2013-07-08 Thread Thomas Gummerer
Junio C Hamano  writes:

> Thomas Gummerer  writes:
>
>> Add an api for access to the index file.  Currently there is only a very
>> basic api for accessing the index file, which only allows a full read of
>> the index, and lets the users of the data filter it.  The new index api
>> gives the users the possibility to use only part of the index and
>> provides functions for iterating over and accessing cache entries.
>>
>> This simplifies future improvements to the in-memory format, as changes
>> will be concentrated on one file, instead of the whole git source code.
>>
>> Signed-off-by: Thomas Gummerer 
>> ---
>>  cache.h |  57 +-
>>  read-cache-v2.c |  96 +++--
>>  read-cache.c| 108 
>> 
>>  read-cache.h|  12 ++-
>>  4 files changed, 263 insertions(+), 10 deletions(-)
>>
>> diff --git a/cache.h b/cache.h
>> index 5082b34..d38dfbd 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -127,7 +127,8 @@ struct cache_entry {
>>  unsigned int ce_flags;
>>  unsigned int ce_namelen;
>>  unsigned char sha1[20];
>> -struct cache_entry *next;
>> +struct cache_entry *next; /* used by name_hash */
>> +struct cache_entry *next_ce; /* used to keep a list of cache entries */
>
> The reader often needs to rewind the read-pointer partially while
> walking the index (e.g. next_cache_entry() in unpack-trees.c and how
> the o->cache_bottom position is used throughout the subsystem).  I
> am not sure if this singly-linked list is a good way to go.

I'm not very familiar with the unpack-trees code, but from a quick look
the pointer (or position in the cache) is always only moved forward.  A
problem I do see though is skipping a number of entries at once.  An
example for that below:
int matches;
matches = 
cache_tree_matches_traversal(o->src_index->cache_tree,
   names, info);
/*
 * Everything under the name matches; skip the
 * entire hierarchy.  diff_index_cached codepath
 * special cases D/F conflicts in such a way that
 * it does not do any look-ahead, so this is safe.
 */
if (matches) {
o->cache_bottom += matches;
return mask;
}

This could probably be transformed into something like
skip_cache_tree_matches(cache-tree, names, info);

I'll take some time to familiarize myself with the unpack-trees code to
see if I can find a better solution than this, and if there are more
pitfalls.

>> +/*
>> + * Options by which the index should be filtered when read partially.
>> + *
>> + * pathspec: The pathspec which the index entries have to match
>> + * seen: Used to return the seen parameter from match_pathspec()
>> + * max_prefix, max_prefix_len: These variables are set to the longest
>> + * common prefix, the length of the longest common prefix of the
>> + * given pathspec
>
> These probably should use "struct pathspec" abstration, not just the
> "array of raw strings", no?

Yes, thanks, that's probably a good idea.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] pack-revindex: radix-sort the revindex

2013-07-08 Thread Brandon Casey
On Sun, Jul 7, 2013 at 3:14 AM, Jeff King  wrote:
> The pack revindex stores the offsets of the objects in the
> pack in sorted order, allowing us to easily find the on-disk
> size of each object. To compute it, we populate an array
> with the offsets from the sha1-sorted idx file, and then use
> qsort to order it by offsets.
>
> That does O(n log n) offset comparisons, and profiling shows
> that we spend most of our time in cmp_offset. However, since
> we are sorting on a simple off_t, we can use numeric sorts
> that perform better. A radix sort can run in O(k*n), where k
> is the number of "digits" in our number. For a 64-bit off_t,
> using 16-bit "digits" gives us k=4.
>
> On the linux.git repo, with about 3M objects to sort, this
> yields a 400% speedup. Here are the best-of-five numbers for
> running "echo HEAD | git cat-file --batch-disk-size", which
> is dominated by time spent building the pack revindex:
>
>   before after
>   real0m0.834s   0m0.204s
>   user0m0.788s   0m0.164s
>   sys 0m0.040s   0m0.036s
>
> On a smaller repo, the radix sort would not be
> as impressive (and could even be worse), as we are trading
> the log(n) factor for the k=4 of the radix sort. However,
> even on git.git, with 173K objects, it shows some
> improvement:
>
>   before after
>   real0m0.046s   0m0.017s
>   user0m0.036s   0m0.012s
>   sys 0m0.008s   0m0.000s
>
> Signed-off-by: Jeff King 
> ---
> I think there are probably still two potential issues here:
>
>   1. My while() loop termination probably has issues when we have to use
>  all 64 bits to represent the pack offset (not likely, but...)
>
>   2. We put "int pos[65536]" on the stack. This is a little big, but is
>  probably OK, as I think the usual small stack problems we have seen
>  are always in threaded code. But it would not be a big deal to heap
>  allocate it (it would happen once per radix step, which is only 4
>  times for the whole sort).
>
>  pack-revindex.c | 77 
> +
>  1 file changed, 72 insertions(+), 5 deletions(-)
>
> diff --git a/pack-revindex.c b/pack-revindex.c
> index 77a0465..d2adf36 100644
> --- a/pack-revindex.c
> +++ b/pack-revindex.c
> @@ -59,11 +59,78 @@ static int cmp_offset(const void *a_, const void *b_)
> /* revindex elements are lazily initialized */
>  }
>
> -static int cmp_offset(const void *a_, const void *b_)
> +/*
> + * This is a least-significant-digit radix sort using a 16-bit "digit".
> + */
> +static void sort_revindex(struct revindex_entry *entries, int n, off_t max)

If 'n' is the number of objects in the pack, shouldn't it be unsigned?

The data type for struct packed_git.num_objects is uint32_t.  Looks
like create_pack_revindex uses the wrong datatype when it captures
num_objects in the int num_ent and passes it to sort_revindex.  So, it
looks like that function needs to be updated too.

>  {
> -   const struct revindex_entry *a = a_;
> -   const struct revindex_entry *b = b_;
> -   return (a->offset < b->offset) ? -1 : (a->offset > b->offset) ? 1 : 0;
> +   /*
> +* We need O(n) temporary storage, so we sort back and forth between
> +* the real array and our tmp storage. To keep them straight, we 
> always
> +* sort from "a" into buckets in "b".
> +*/
> +   struct revindex_entry *tmp = xcalloc(n, sizeof(*tmp));
> +   struct revindex_entry *a = entries, *b = tmp;
> +   int digits = 0;
> +
> +   /*
> +* We want to know the bucket that a[i] will go into when we are using
> +* the digit that is N bits from the (least significant) end.
> +*/
> +#define BUCKET_FOR(a, i, digits) ((a[i].offset >> digits) & 0x)
> +
> +   while (max / (((off_t)1) << digits)) {

Is there any reason this shouldn't be simplified to just:

   while (max >> digits) {

I glanced briefly at the assembly and it appears that gcc does
actually emit a divide instruction to accomplish this, which I think
we can avoid by just rearranging the operation.

> +   struct revindex_entry *swap;
> +   int i;
> +   int pos[65536] = {0};
> +
> +   /*
> +* We want pos[i] to store the index of the last element that
> +* will go in bucket "i" (actually one past the last element).
> +* To do this, we first count the items that will go in each
> +* bucket, which gives us a relative offset from the last
> +* bucket. We can then cumulatively add the index from the
> +* previous bucket to get the true index.
> +*/
> +   for (i = 0; i < n; i++)
> +   pos[BUCKET_FOR(a, i, digits)]++;
> +   for (i = 1; i < ARRAY_SIZE(pos); i++)
> +   pos[i] += pos[i-1];
> +
> +   /*
> +* Now we can drop the elements in

[PATCH 5.5/22] Add documentation for the index api

2013-07-08 Thread Thomas Gummerer
Document the new index api and add examples of how it should be used
instead of the old functions directly accessing the index.

Helped-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Thomas Gummerer 
---

Duy Nguyen  writes:

> Hmm.. I was confused actually (documentation on the api would help
> greatly).

As promised, a draft for a documentation for the index api as it is in
this series.

Documentation/technical/api-in-core-index.txt | 108 +-
 1 file changed, 106 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-in-core-index.txt 
b/Documentation/technical/api-in-core-index.txt
index adbdbf5..5269bb1 100644
--- a/Documentation/technical/api-in-core-index.txt
+++ b/Documentation/technical/api-in-core-index.txt
@@ -1,14 +1,116 @@
 in-core index API
 =

+Reading API
+---
+
+`read_index()`::
+   Read the whole index file from disk.
+
+`index_name_pos(name, namelen)`::
+   Find a cache_entry with name in the index.  Returns pos if an
+   entry is matched exactly and -pos-1 if an entry is matched
+   partially.
+   e.g.
+   index:
+   file1
+   file2
+   path/file1
+   zzz
+
+   index_name_pos("path/file1", 10) returns 2, while
+   index_name_pos("path", 4) returns -1
+
+`read_index_filtered(opts)`::
+   This method behaves differently for index-v2 and index-v5.
+
+   For index-v2 it simply reads the whole index as read_index()
+   does, so we are sure we don't have to reload anything if the
+   user wants a different filter.  It also sets the filter_opts
+   in the index_state, which is used to limit the results when
+   iterating over the index with for_each_index_entry().
+
+   The whole index is read to avoid the need to eventually
+   re-read the index later, because the performance is no
+   different when reading it partially.
+
+   For index-v5 it creates an adjusted_pathspec to filter the
+   reading.  First all the directory entries are read and then
+   the cache_entries in the directories that match the adjusted
+   pathspec are read.  The filter_opts in the index_state are set
+   to filter out the rest of the cache_entries that are matched
+   by the adjusted pathspec but not by the pathspec given.  The
+   rest of the index entries are filtered out when iterating over
+   the cache with for_each_index_entries.
+
+`get_index_entry_by_name(name, namelen, &ce)`::
+   Returns a cache_entry matched by the name, returned via the
+   &ce parameter.  If a cache entry is matched exactly, 1 is
+   returned, otherwise 0.  For an example see index_name_pos().
+   This function should be used instead of the index_name_pos()
+   function to retrieve cache entries.
+
+`for_each_index_entry(fn, cb_data)`::
+   Iterates over all cache_entries in the index filtered by
+   filter_opts in the index_stat.  For each cache entry fn is
+   executed with cb_data as callback data.  From within the loop
+   do `return 0` to continue, or `return 1` to break the loop.
+
+`next_index_entry(ce)`::
+   Returns the cache_entry that follows after ce
+
+`index_change_filter_opts(opts)`::
+   This function again has a slightly different functionality for
+   index-v2 and index-v5.
+
+   For index-v2 it simply changes the filter_opts, so
+   for_each_index_entry uses the changed index_opts, to iterate
+   over a different set of cache entries.
+
+   For index-v5 it refreshes the index if the filter_opts have
+   changed and sets the new filter_opts in the index state, again
+   to iterate over a different set of cache entries as with
+   index-v2.
+
+   This has some optimization potential, in the case that the
+   opts get stricter (less of the index should be read) it
+   doesn't have to reload anything, but currently does.
+
+Using the new index api
+---
+
+Currently loops over a specific set of index entry were written as:
+  i = start_index;
+  while (i < active_nr) { ce = active_cache[i]; do(something); i++; }
+
+they should be rewritten to:
+  ce = start;
+  while (ce) { do(something); ce = next_cache_entry(ce); }
+
+which is the equivalent operation but hides the in-memory format of
+the index from the user.
+
+For getting a cache entry get_cache_entry_by_name() should be used
+instead of cache_name_pos(). e.g.:
+  int pos = cache_name_pos(name, namelen);
+  struct cache_entry *ce = active_cache[pos];
+  if (pos < 0) { do(something) }
+  else { do(somethingelse) }
+
+should be written as:
+  struct cache_entry *ce;
+  int ret = get_cache_entry_by_name(name, namelen, &ce);
+  if (!ret) { do(something) }
+  else { do(somethingelse) }
+
+TODO
+
 Talk about  and , things like:

 * cache -> the_index macros
-* read_index()
 * write_index()
 * ie_match_stat() and ie_modified(); how they are different and when to
   use which.
-* index_name_pos(

[PATCH] cache.h: move remote/connect API out of it

2013-07-08 Thread Junio C Hamano
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me.  This structure is not
about the local ref used by sha1-name API to name local objects.

It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values.  It belongs to "remote.h" together
with "struct refspec".

While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h

Signed-off-by: Junio C Hamano 
---

 * I hate to to this kind of code-movement in the middle of the
   cycle, but every time I follow the push->transport codepath, I
   become disoriented by these "struct ref"s.

 builtin/fetch-pack.c   |  2 ++
 builtin/receive-pack.c |  1 +
 builtin/send-pack.c|  1 +
 cache.h| 62 --
 connect.c  |  1 +
 fetch-pack.c   |  1 +
 fetch-pack.h   |  1 +
 refs.c |  8 ---
 remote.c   |  8 +++
 remote.h   | 54 +++
 send-pack.c|  1 +
 transport.c|  2 ++
 transport.h|  1 +
 upload-pack.c  |  1 +
 14 files changed, 74 insertions(+), 70 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index aba4465..c6888c6 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -1,6 +1,8 @@
 #include "builtin.h"
 #include "pkt-line.h"
 #include "fetch-pack.h"
+#include "remote.h"
+#include "connect.h"
 
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet|-q] [--keep|-k] [--thin] "
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e3eb5fc..7434d9b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -8,6 +8,7 @@
 #include "commit.h"
 #include "object.h"
 #include "remote.h"
+#include "connect.h"
 #include "transport.h"
 #include "string-list.h"
 #include "sha1-array.h"
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 152c4ea..e86d3b5 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -5,6 +5,7 @@
 #include "sideband.h"
 #include "run-command.h"
 #include "remote.h"
+#include "connect.h"
 #include "send-pack.h"
 #include "quote.h"
 #include "transport.h"
diff --git a/cache.h b/cache.h
index dd0fb33..cb2891d 100644
--- a/cache.h
+++ b/cache.h
@@ -1035,68 +1035,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-struct ref {
-   struct ref *next;
-   unsigned char old_sha1[20];
-   unsigned char new_sha1[20];
-   char *symref;
-   unsigned int
-   force:1,
-   forced_update:1,
-   deletion:1,
-   matched:1;
-
-   /*
-* Order is important here, as we write to FETCH_HEAD
-* in numeric order. And the default NOT_FOR_MERGE
-* should be 0, so that xcalloc'd structures get it
-* by default.
-*/
-   enum {
-   FETCH_HEAD_MERGE = -1,
-   FETCH_HEAD_NOT_FOR_MERGE = 0,
-   FETCH_HEAD_IGNORE = 1
-   } fetch_head_status;
-
-   enum {
-   REF_STATUS_NONE = 0,
-   REF_STATUS_OK,
-   REF_STATUS_REJECT_NONFASTFORWARD,
-   REF_STATUS_REJECT_ALREADY_EXISTS,
-   REF_STATUS_REJECT_NODELETE,
-   REF_STATUS_REJECT_FETCH_FIRST,
-   REF_STATUS_REJECT_NEEDS_FORCE,
-   REF_STATUS_UPTODATE,
-   REF_STATUS_REMOTE_REJECT,
-   REF_STATUS_EXPECTING_REPORT
-   } status;
-   char *remote_status;
-   struct ref *peer_ref; /* when renaming */
-   char name[FLEX_ARRAY]; /* more */
-};
-
-#define REF_NORMAL (1u << 0)
-#define REF_HEADS  (1u << 1)
-#define REF_TAGS   (1u << 2)
-
-extern struct ref *find_ref_by_name(const struct ref *list, const char *name);
-
-#define CONNECT_VERBOSE   (1u << 0)
-extern struct child_process *git_connect(int fd[2], const char *url, const 
char *prog, int flags);
-extern int finish_connect(struct child_process *conn);
-extern int git_connection_is_socket(struct child_process *conn);
-struct extra_have_objects {
-   int nr, alloc;
-   unsigned char (*array)[20];
-};
-extern struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
-struct ref **list, unsigned int flags,
-struct extra_have_objects *);
-extern int server_supports(const char *feature);
-extern int parse_feature_request(const char *features, const char *feature);
-extern const char *server_feature_value(const char *feature, int *len_ret);
-extern const char *parse_feature_value(const char *feature_list, const char 
*feature, int *len_ret);
-
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
 
 /* A hook for count-objects to report in

Re: [PATCH 4/4] pack-revindex: radix-sort the revindex

2013-07-08 Thread Brandon Casey
On Mon, Jul 8, 2013 at 1:50 PM, Brandon Casey  wrote:
> On Sun, Jul 7, 2013 at 3:14 AM, Jeff King  wrote:

>> diff --git a/pack-revindex.c b/pack-revindex.c
>> index 77a0465..d2adf36 100644
>> --- a/pack-revindex.c
>> +++ b/pack-revindex.c
>> @@ -59,11 +59,78 @@ static int cmp_offset(const void *a_, const void *b_)
>> /* revindex elements are lazily initialized */
>>  }
>>
>> -static int cmp_offset(const void *a_, const void *b_)
>> +/*
>> + * This is a least-significant-digit radix sort using a 16-bit "digit".
>> + */
>> +static void sort_revindex(struct revindex_entry *entries, int n, off_t max)
>
> If 'n' is the number of objects in the pack, shouldn't it be unsigned?
>
> The data type for struct packed_git.num_objects is uint32_t.  Looks
> like create_pack_revindex uses the wrong datatype when it captures
> num_objects in the int num_ent and passes it to sort_revindex.  So, it
> looks like that function needs to be updated too.

Hmm.  It seems more than just create_pack_revindex holds num_objects
in a signed int.  Don't we support 4G objects in packs?

find_pack_revindex uses a signed int for the index variables in its
binary search which means it will fail when num_objects >= 2^31.

-Brandon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cache.h: move remote/connect API out of it

2013-07-08 Thread Junio C Hamano
Junio C Hamano  writes:

> The definition of "struct ref" in "cache.h", a header file so
> central to the system, always confused me.  This structure is not
> about the local ref used by sha1-name API to name local objects.
>
> It is what refspecs are expanded into, after finding out what refs
> the other side has, to define what refs are updated after object
> transfer succeeds to what values.  It belongs to "remote.h" together
> with "struct refspec".
>
> While we are at it, also move the types and functions related to the
> Git transport connection to a new header file connect.h
>
> Signed-off-by: Junio C Hamano 
> ---
>
>  * I hate to to this kind of code-movement in the middle of the
>cycle, but every time I follow the push->transport codepath, I
>become disoriented by these "struct ref"s.

>
>  builtin/fetch-pack.c   |  2 ++
>  builtin/receive-pack.c |  1 +
>  builtin/send-pack.c|  1 +
>  cache.h| 62 
> --
>  connect.c  |  1 +
>  fetch-pack.c   |  1 +
>  fetch-pack.h   |  1 +
>  refs.c |  8 ---
>  remote.c   |  8 +++
>  remote.h   | 54 +++
>  send-pack.c|  1 +
>  transport.c|  2 ++
>  transport.h|  1 +
>  upload-pack.c  |  1 +
>  14 files changed, 74 insertions(+), 70 deletions(-)

Of course, I should have noticed that the math does not work out
without this part:

 connect.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/connect.h b/connect.h
new file mode 100644
index 000..9dff25c
--- /dev/null
+++ b/connect.h
@@ -0,0 +1,13 @@
+#ifndef CONNECT_H
+#define CONNECT_H
+
+#define CONNECT_VERBOSE   (1u << 0)
+extern struct child_process *git_connect(int fd[2], const char *url, const 
char *prog, int flags);
+extern int finish_connect(struct child_process *conn);
+extern int git_connection_is_socket(struct child_process *conn);
+extern int server_supports(const char *feature);
+extern int parse_feature_request(const char *features, const char *feature);
+extern const char *server_feature_value(const char *feature, int *len_ret);
+extern const char *parse_feature_value(const char *feature_list, const char 
*feature, int *len_ret);
+
+#endif
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


standarize mtime when git checkout

2013-07-08 Thread Rick Liu
Hi,

Currently when doing "git checkout" (either for a branch or a tag),
if the file doesn't exist before,
the file will be created using current datetime.

This causes problem while trying to tar the git repository source files 
(excluding .git folder).
The tar binary can be different 
even all of file contents are the same (eg. from the same GIT commit)
because the mtime for the files might be different due to different "git 
checkout" time.

eg:
User A checkout the commit at time A and then tarball the folder.
User B checkout the same commit as time B and then tarball the folder.
The result tarball are binary different 
even though all of tarball contents are the same 
except the mtime for each file.


Can we use GIT's commit time as the mtime for all of files/folders when we do 
"git checkout"?


Rick

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN

2013-07-08 Thread Junio C Hamano
The command line parser of "git push" for "--tags", "--delete", and
"--thin" options still used outdated OPT_BOOLEAN.  Because these
options do not give escalating levels when given multiple times,
they should use OPT_BOOL.

Signed-off-by: Junio C Hamano 
---
 builtin/push.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/push.c b/builtin/push.c
index 2d84d10..342d792 100644
--- a/builtin/push.c
+++ b/builtin/push.c
@@ -427,15 +427,15 @@ int cmd_push(int argc, const char **argv, const char 
*prefix)
OPT_BIT( 0 , "all", &flags, N_("push all refs"), 
TRANSPORT_PUSH_ALL),
OPT_BIT( 0 , "mirror", &flags, N_("mirror all refs"),
(TRANSPORT_PUSH_MIRROR|TRANSPORT_PUSH_FORCE)),
-   OPT_BOOLEAN( 0, "delete", &deleterefs, N_("delete refs")),
-   OPT_BOOLEAN( 0 , "tags", &tags, N_("push tags (can't be used 
with --all or --mirror)")),
+   OPT_BOOL( 0, "delete", &deleterefs, N_("delete refs")),
+   OPT_BOOL( 0 , "tags", &tags, N_("push tags (can't be used with 
--all or --mirror)")),
OPT_BIT('n' , "dry-run", &flags, N_("dry run"), 
TRANSPORT_PUSH_DRY_RUN),
OPT_BIT( 0,  "porcelain", &flags, N_("machine-readable 
output"), TRANSPORT_PUSH_PORCELAIN),
OPT_BIT('f', "force", &flags, N_("force updates"), 
TRANSPORT_PUSH_FORCE),
{ OPTION_CALLBACK, 0, "recurse-submodules", &flags, N_("check"),
N_("control recursive pushing of submodules"),
PARSE_OPT_OPTARG, option_parse_recurse_submodules },
-   OPT_BOOLEAN( 0 , "thin", &thin, N_("use thin pack")),
+   OPT_BOOL( 0 , "thin", &thin, N_("use thin pack")),
OPT_STRING( 0 , "receive-pack", &receivepack, "receive-pack", 
N_("receive pack program")),
OPT_STRING( 0 , "exec", &receivepack, "receive-pack", 
N_("receive pack program")),
OPT_BIT('u', "set-upstream", &flags, N_("set upstream for git 
pull/status"),
-- 
1.8.3.2-876-ge3d3f5e

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: standarize mtime when git checkout

2013-07-08 Thread Junio C Hamano
"Rick Liu"  writes:

> Can we use GIT's commit time as the mtime for all of files/folders when we do 
> "git checkout"?

No.  That will screw up common practice of build based on file
timestamps (e.g. make).

You may be interested in "git archive $commit" which will set the file
timestamps that of the commit object.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test-lib.sh - cygwin does not have usable FIFOs

2013-07-08 Thread Mark Levedahl

On 07/06/2013 08:55 PM, Jonathan Nieder wrote:

Mark Levedahl wrote:


Do not use FIFOs on cygwin, they do not work. Cygwin includes
coreutils, so has mkfifo, and that command does something. However,
the resultant named pipe is known (on the Cygwin mailing list at
least) to not work correctly.

Hm.  How would you recommend going about writing a script that takes
output from a command, transforms it, and then feeds it back into
that command's input?  Are sockets a more reliable way to do this kind
of IPC on Cygwin?

See reinit_git and try_dump from t9010-svn-fe.sh for context.

Thanks,
Jonathan



On the one hand, sockets work fine on cygwin so that path would probably 
work.


However, I don't understand why git would need to consume its own output 
- If named pipes are really needed to use git-svn because git-svn 
depends upon git feeding the same git process, then that package should 
not be available on cygwin or any other platform that does not support 
fifos. If not, then I don't think the test suite should require fifos or 
any other construct with the same git process feeding itself either, it 
just blurs the line about what is actually being tested.


Mark
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/22] read-cache: add index reading api

2013-07-08 Thread Junio C Hamano
Thomas Gummerer  writes:

>> The reader often needs to rewind the read-pointer partially while
>> walking the index (e.g. next_cache_entry() in unpack-trees.c and how
>> the o->cache_bottom position is used throughout the subsystem).  I
>> am not sure if this singly-linked list is a good way to go.
>
> I'm not very familiar with the unpack-trees code, but from a quick look
> the pointer (or position in the cache) is always only moved forward.

I am more worried about o->cache_bottom processing, where it
currently is an index into an array.

With your ce->next_in_list_of_read_entries change, a natural rewrite
would be to point at the ce with o->cache_bottom, but then that
would mean you cannot in-place replace the entries like we used to
be able to in an array based implementation.

But your series does not seem to touch unpack-trees yet, so I may be
worried too much before it becomes necessary.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN

2013-07-08 Thread Jonathan Nieder
Junio C Hamano wrote:

> The command line parser of "git push" for "--tags", "--delete", and
> "--thin" options still used outdated OPT_BOOLEAN.  Because these
> options do not give escalating levels when given multiple times,
> they should use OPT_BOOL.

Thanks.  Looks obviously correct, so

Reviewed-by: Jonathan Nieder 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Duy Nguyen
On Mon, Jul 8, 2013 at 8:37 PM, Ramkumar Ramachandra  wrote:
> Duy Nguyen wrote:
>> The short/long naming is the least I worry about. We could add long
>> names to pretty specifiers. The thing about the last attempt is, you
>> add some extra things on top elsewhere, but format_commit_item code
>> may need to be aware of those changes, which are not obvious when
>> sombody just focuses on format_commit_item. Having all specifiers in
>> one place would be better (hence no hooks, no callbacks) because we
>> get a full picture. And yes we need to deal with specifers that make
>> no sense in certain context.
>
> Yeah, it would certainly be nice to have all the format-specifiers
> that one unified parser acts on, but isn't this just a matter of
> refactoring?  Shouldn't we be starting with cheap callbacks, get
> things working, and guard against regressions in the refactoring phase
> first?  How else do you propose to start out?

I prefer a series merged to master is a complete change. If
refactoring is needed, it should be done as part of the series as
well. Two reasons:
 - We might overlook something. The best way to avoid missing is
finish and verify it.
 - A promise to do things later could happen really late, or never
happens. As you are sastisfied with the functionality you have less
motivation to clean the code. Meanwhile the maintainer takes extra
maintenance cost.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] fetch: make --prune configurable

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 02:56:57PM +0200, Michael Schubert wrote:

> $gmane/201715 brought up the idea to fetch --prune by default.
> Since --prune is a "potentially destructive operation" (Git doesn't
> keep reflogs for deleted references yet), we don't want to prune
> without users consent.
> 
> To accommodate users who want to either prune always or when fetching
> from a particular remote, add two new configuration variables
> "fetch.prune" and "remote..prune":
> 
>  - "fetch.prune" allows to enable prune for all fetch operations.
> 
>  - "remote..prune" allows to change the behaviour per remote.

Thanks. As the person who brought up the destructive nature of --prune
in the thread you mentioned, I have no problem at all with doing
something like this, where the user gets to make the choice. And it is
even a good building block if we later do have deleted-branch reflogs;
we can just flip the default from "off" to "on".

In the meantime, I don't know if it is worth mentioning in the
documentation that the remote branches are hard to get back. On the one
hand, it is the (or at least a) reason why the default is not "on". But
it is also far from the only place refs get deleted, so I don't know if
it is worth calling attention to it specifically.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: standarize mtime when git checkout

2013-07-08 Thread René Scharfe

Am 08.07.2013 23:39, schrieb Rick Liu:

Hi,

Currently when doing "git checkout" (either for a branch or a tag),
if the file doesn't exist before,
the file will be created using current datetime.

This causes problem while trying to tar the git repository source files 
(excluding .git folder).
The tar binary can be different
even all of file contents are the same (eg. from the same GIT commit)
because the mtime for the files might be different due to different "git 
checkout" time.

eg:
User A checkout the commit at time A and then tarball the folder.
User B checkout the same commit as time B and then tarball the folder.
The result tarball are binary different
even though all of tarball contents are the same
except the mtime for each file.


Can we use GIT's commit time as the mtime for all of files/folders when we do "git 
checkout"?


That would break tools like make which rely on a files mtime to build 
them.  They wouldn't be able to detect switching between source file 
versions that are older than the latest build.


You can use "git archive" to create tar files in which all entries have 
their mtime set to the commit time.  Such archives only contain tracked 
(committed) files, though.  And different versions of git can create 
slightly different archives, but such changes have been rare.


René

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expanding pack idx fanout table

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 08:54:24AM -0700, Shawn O. Pearce wrote:

> Has anyone studied the impact of converting the pack idx fanout table
> from 256 entries to 65536 entries?
> 
> Back of the envelope estimates for 3.1M objects in linux.git suggests
> a 2^16 fanout table would decrease the number of binary search
> iterations from ~14 to ~6. The increased table costs an extra 255 KiB
> of disk. On a 70M idx file this is noise.

No, I hadn't considered it, but I think your analysis is correct that it
would decrease the lookup cost. Having run "perf" on git a lot of times,
I do see find_pack_entry_one turn up as a noticeable hot spot, but
usually not more than a few percent.

Which kind of makes sense, because of the way we cache objects in
memory. If you are doing something like `rev-list --objects`, you are
going to do O(n) packfile lookups, but you end up doing _way_ more
lookups in the object hash. Not just one per tree entry, but one per
tree entry for each commit in which the entry was untouched.

I tried (maybe a year or so ago) to make the object hash faster using a
similar fan-out trick, but I don't remember it having any real impact
(because we keep our hash tables relatively collision free already). I
think I also tried a full prefix-trie, with no luck.

Obviously the exact results would depend on your workload, but in
general I'd expect the object hash to take the brunt of the load for
repeated lookups, and for non-repeated lookups to be dominated by the
time to actually access the object, as opposed to saving a few hashcmps.
That may change as we start using the pack .idx for more clever things
than simply accessing the objects, though (e.g., it might make a
difference when coupled with my commit cache patches).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cache.h: move remote/connect API out of it

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 02:09:59PM -0700, Junio C Hamano wrote:

> The definition of "struct ref" in "cache.h", a header file so
> central to the system, always confused me.  This structure is not
> about the local ref used by sha1-name API to name local objects.
> [...]
>  * I hate to to this kind of code-movement in the middle of the
>cycle, but every time I follow the push->transport codepath, I
>become disoriented by these "struct ref"s.

FWIW, this has often bugged me, too. I did not check what fallouts this
will have for series in flight, but in general, I think it is a good
thing to be doing.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A local shared clone is now much slower

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 08:00:09AM -0700, Junio C Hamano wrote:

> I think this deserves to be backported to 'maint' track for
> 1.8.3.x.  Here is an attempt to do so.

Agreed. As it makes certain local-clone workflows really painful, I
think my original can be considered a performance regression for those
cases.

Your back-port looks good to me. Thanks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] describe: use argv-array

2013-07-08 Thread Jeff King
On Sun, Jul 07, 2013 at 03:33:43PM -0700, Junio C Hamano wrote:

> + argv_array_init(&args);
> + argv_array_push(&args, "name-rev");
> + argv_array_push(&args, "--name-only");
> + argv_array_push(&args, "--no-undefined");
> [...]
> - memcpy(args + i, argv, argc * sizeof(char *));
> - args[i + argc] = NULL;
> - return cmd_name_rev(i + argc, args, prefix);
> + return cmd_name_rev(args.argc, args.argv, prefix);

This leaks the memory allocated by "args". The original did, too, and it
is probably not that big a deal (we exit right after anyway). The fix
would be something like:

  rc = cmd_name_rev(args.argc, args.argv, prefix);
  argv_array_clear(&args);
  return rc;

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] describe/name-rev: tell name-rev to peel the incoming object to commit first

2013-07-08 Thread Jeff King
On Sun, Jul 07, 2013 at 03:33:44PM -0700, Junio C Hamano wrote:

> With this on top of the other patches in this series, you would get:
> 
> $ git describe --contains $(git rev-parse v1.8.3 v1.8.3^0)
> v1.8.3
> v1.8.3
> 
> while you can still differentiate tags and the commits they point at
> with:
> 
> $ git name-rev --refs=tags/\* --name-only $(git rev-parse v1.8.3 v1.8.3^0)
> v1.8.3
> v1.8.3^0
> 
> The difference in these two behaviours is achieved by adding --peel-to-commit
> option to "name-rev" and using it when "describe" internally calls it.

I am somewhat mixed on this.

You are changing the default behavior of name-rev and adding a new
option to restore it, so I wonder who (if anyone) might be broken. The
documentation is now also out of date; not only does it not mention
"peel-to-commit", but it claims the argument to name-rev is a
committish, which is not really true without that option.

On the other hand, the new default behavior seems way more sane to me.
In general, I would expect name-rev to:

  1. Behave more or less the same between "git name-rev $sha1" and "echo
 $sha1 | git name-rev --stdin". Your patch improves that. Though I
 note that --peel-to-commit does not affect --stdin at all. Should
 it? And of course the two differ in that the command line will take
 any rev-parse expression, and --stdin only looks for full sha1s.

  2. If name-rev prints "$X $Y", I would expect "git rev-parse $X" to
 equal "git rev-parse $Y". With peeling, that is not the case, and
 you get the misleading example that Ram showed:

   $ git name-rev 8af0605
   8af0605 tags/v1.8.3^0

or more obviously weird:

   $ git name-rev v1.8.3
   v1.8.3 tags/v1.8.3^0

So I think your series moves in a good direction, but I would just worry
that it is breaking backwards compatibility (but like I said, I am not
clear on who is affected and what it means for them).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] describe/name-rev: tell name-rev to peel the incoming object to commit first

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 06:38:32PM +0530, Ramkumar Ramachandra wrote:

> Junio C Hamano wrote:
> > With this on top of the other patches in this series, you would get:
> >
> > $ git describe --contains $(git rev-parse v1.8.3 v1.8.3^0)
> > v1.8.3
> > v1.8.3
> >
> > while you can still differentiate tags and the commits they point at
> > with:
> >
> > $ git name-rev --refs=tags/\* --name-only $(git rev-parse v1.8.3 
> > v1.8.3^0)
> > v1.8.3
> > v1.8.3^0
> >
> > The difference in these two behaviours is achieved by adding 
> > --peel-to-commit
> > option to "name-rev" and using it when "describe" internally calls it.
> 
> Essentially a revert of [2/4] for describe-purposes, achieved by
> adding an ugly command-line option to name-rev.

I don't think it is a revert. The two patches complement each other.

2/4 is basically "if we have a non-commit object which is pointed at
directly by a tip, make sure we name it by that tip". But you can only
get such an object by "name-rev --stdin", since name-rev peels its
command-line arguments.

4/4 is "stop peeling command line objects, so we can find their exact
tips". IOW, it lets the command line do the same thing that --stdin was
able to do in 2/4.

> Before we argue any further, let me ask: who uses name-rev (and
> depends strongly on its output)?!  Our very own testsuite does not
> exercise it.  There are exactly two users of describe/name-rev:
> 
> 1. prompt, obviously.
> 2. DAG-tests, for simplification.

Yeah, I'm not clear on who we are breaking with the change in default
peeling behavior, nor why the "describe --contains" wrapper wants to
keep it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] remote-http: use argv-array

2013-07-08 Thread Junio C Hamano
Instead of using a hand-managed argument array, use argv-array API
to manage dynamically formulated command line.

Signed-off-by: Junio C Hamano 
---
 remote-curl.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 60eda63..884b3a3 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -7,6 +7,7 @@
 #include "run-command.h"
 #include "pkt-line.h"
 #include "sideband.h"
+#include "argv-array.h"
 
 static struct remote *remote;
 static const char *url; /* always ends with a trailing slash */
@@ -787,36 +788,34 @@ static int push_dav(int nr_spec, char **specs)
 static int push_git(struct discovery *heads, int nr_spec, char **specs)
 {
struct rpc_state rpc;
-   const char **argv;
-   int argc = 0, i, err;
+   int i, err;
+   struct argv_array args;
+
+   argv_array_init(&args);
+   argv_array_pushl(&args, "send-pack", "--stateless-rpc", 
"--helper-status");
 
-   argv = xmalloc((10 + nr_spec) * sizeof(char*));
-   argv[argc++] = "send-pack";
-   argv[argc++] = "--stateless-rpc";
-   argv[argc++] = "--helper-status";
if (options.thin)
-   argv[argc++] = "--thin";
+   argv_array_push(&args, "--thin");
if (options.dry_run)
-   argv[argc++] = "--dry-run";
+   argv_array_push(&args, "--dry-run");
if (options.verbosity == 0)
-   argv[argc++] = "--quiet";
+   argv_array_push(&args, "--quiet");
else if (options.verbosity > 1)
-   argv[argc++] = "--verbose";
-   argv[argc++] = options.progress ? "--progress" : "--no-progress";
-   argv[argc++] = url;
+   argv_array_push(&args, "--verbose");
+   argv_array_push(&args, options.progress ? "--progress" : 
"--no-progress");
+   argv_array_push(&args, url);
for (i = 0; i < nr_spec; i++)
-   argv[argc++] = specs[i];
-   argv[argc++] = NULL;
+   argv_array_push(&args, specs[i]);
 
memset(&rpc, 0, sizeof(rpc));
rpc.service_name = "git-receive-pack",
-   rpc.argv = argv;
+   rpc.argv = args.argv;
 
err = rpc_service(&rpc, heads);
if (rpc.result.len)
write_or_die(1, rpc.result.buf, rpc.result.len);
strbuf_release(&rpc.result);
-   free(argv);
+   argv_array_clear(&args);
return err;
 }
 
-- 
1.8.3.2-876-ge3d3f5e

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] describe/name-rev: tell name-rev to peel the incoming object to commit first

2013-07-08 Thread Junio C Hamano
Jeff King  writes:

>   1. Behave more or less the same between "git name-rev $sha1" and "echo
>  $sha1 | git name-rev --stdin". Your patch improves that. Though I
>  note that --peel-to-commit does not affect --stdin at all. Should
>  it? And of course the two differ in that the command line will take
>  any rev-parse expression, and --stdin only looks for full sha1s.

To "Should it?", I do not deeply care.  "--peel-to-commit" is an
exception that only is needed to support "describe".

I could instead have tucked "^0" at the end of each argument when
"describe" calls out to "name-rev" without adding this new option,
which is much much closer to what is really going on.

And that will alleviate your #2 below.

>   2. If name-rev prints "$X $Y", I would expect "git rev-parse $X" to
>  equal "git rev-parse $Y". With peeling, that is not the case, and
>  you get the misleading example that Ram showed:
>
>$ git name-rev 8af0605
>8af0605 tags/v1.8.3^0
>
> or more obviously weird:
>
>$ git name-rev v1.8.3
>v1.8.3 tags/v1.8.3^0
>
> So I think your series moves in a good direction, but I would just worry
> that it is breaking backwards compatibility (but like I said, I am not
> clear on who is affected and what it means for them).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] describe/name-rev: tell name-rev to peel the incoming object to commit first

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 10:33:26PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >   1. Behave more or less the same between "git name-rev $sha1" and "echo
> >  $sha1 | git name-rev --stdin". Your patch improves that. Though I
> >  note that --peel-to-commit does not affect --stdin at all. Should
> >  it? And of course the two differ in that the command line will take
> >  any rev-parse expression, and --stdin only looks for full sha1s.
> 
> To "Should it?", I do not deeply care.  "--peel-to-commit" is an
> exception that only is needed to support "describe".
> 
> I could instead have tucked "^0" at the end of each argument when
> "describe" calls out to "name-rev" without adding this new option,
> which is much much closer to what is really going on.

Yeah, I tend to think that is a more sane interface, even though it is a
little more work in git-describe.

Although I am still not clear on why it would not be up to the caller of
git-describe in the first place to decide which they wanted.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cache.h: move remote/connect API out of it

2013-07-08 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Jul 08, 2013 at 02:09:59PM -0700, Junio C Hamano wrote:
>
>> The definition of "struct ref" in "cache.h", a header file so
>> central to the system, always confused me.  This structure is not
>> about the local ref used by sha1-name API to name local objects.
>> [...]
>>  * I hate to to this kind of code-movement in the middle of the
>>cycle, but every time I follow the push->transport codepath, I
>>become disoriented by these "struct ref"s.
>
> FWIW, this has often bugged me, too. I did not check what fallouts this
> will have for series in flight, but in general, I think it is a good
> thing to be doing.

The fallout can be seen at the tip of 'pu', which is fairly minimum
at the moment.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] t4211: fix broken test when one -L range is subset of another

2013-07-08 Thread Eric Sunshine
t4211 attempts to test multiple git-log -L ranges where one range is a
superset of the other, and falsely succeeds because its "expected"
output is incorrect.

Overlapping -L ranges handed to git-log are coalesced by
line-log.c:sort_and_merge_range_set() into a set of non-overlapping,
disjoint ranges. When one range is a subset of another,
sort_and_merge_range_set() should coalesce both ranges to the superset
range, but instead the coalesced range often is incorrectly truncated to
the end of the subset range. For example, ranges 2-8 and 3-4 are
coalesced incorrectly to 2-4.

One can observe this incorrect behavior with git-log -L using the test
repository created by t4211. The superset/subset ranges t4211 employs
are 4-$ and 8-12 (where $ represents end-of-file). The coalesced range
should be 4-$. Manually invoking git-log with the same ranges the test
employs, we see:

  % git log -L 4:a.c simple |
awk '/^commit [0-9a-f]{40}/ { print substr($2,1,7) }'
  4659538
  100b61a
  39b6eb2
  a6eb826
  f04fb20
  de4c48a

  % git log -L 8,12:a.c simple | awk ...
  f04fb20
  de4c48a

  % git log -L 4:a.c -L 8,12:a.c simple | awk ...
  a6eb826
  f04fb20
  de4c48a

This last output is incorrect. 8-12 is a subset of 4-$, hence the output
of the coalesced range should be the same as the 4-$ output shown first.
In fact, the above incorrect output is the truncated bogus range 4-12:

  % git log -L 4,12:a.c simple | awk ...
  a6eb826
  f04fb20
  de4c48a

Fix the test to correctly fail in the presence of the
sort_and_merge_range_set() coalescing bug. Do so by changing the
"expected" output to the commits mentioned in the 4-$ output above.

Signed-off-by: Eric Sunshine 
---
 t/t4211-line-log.sh  |   4 +-
 t/t4211/expect.multiple-superset | 134 ++-
 2 files changed, 133 insertions(+), 5 deletions(-)

diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index 7776f93..549df9e 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -50,8 +50,8 @@ canned_test "-M -L ':f:b.c' parallel-change" 
parallel-change-f-to-main
 canned_test "-L 4,12:a.c -L :main:a.c simple" multiple
 canned_test "-L 4,18:a.c -L :main:a.c simple" multiple-overlapping
 canned_test "-L :main:a.c -L 4,18:a.c simple" multiple-overlapping
-canned_test "-L 4:a.c -L 8,12:a.c simple" multiple-superset
-canned_test "-L 8,12:a.c -L 4:a.c simple" multiple-superset
+canned_test_failure "-L 4:a.c -L 8,12:a.c simple" multiple-superset
+canned_test_failure "-L 8,12:a.c -L 4:a.c simple" multiple-superset
 
 test_bad_opts "-L" "switch.*requires a value"
 test_bad_opts "-L b.c" "argument.*not of the form"
diff --git a/t/t4211/expect.multiple-superset b/t/t4211/expect.multiple-superset
index a1f5bc4..d930b6e 100644
--- a/t/t4211/expect.multiple-superset
+++ b/t/t4211/expect.multiple-superset
@@ -1,3 +1,100 @@
+commit 4659538844daa2849b1a9e7d6fadb96fcd26fc83
+Author: Thomas Rast 
+Date:   Thu Feb 28 10:48:43 2013 +0100
+
+change back to complete line
+
+diff --git a/a.c b/a.c
+--- a/a.c
 b/a.c
+@@ -4,19 +4,21 @@
+ long f(long x)
+ {
+   int s = 0;
+   while (x) {
+   x >>= 1;
+   s++;
+   }
+   return s;
+ }
+ 
+ /*
+  * This is only an example!
+  */
+ 
+ int main ()
+ {
+   printf("%ld\n", f(15));
+   return 0;
+-}
+\ No newline at end of file
++}
++
++/* incomplete lines are bad! */
+
+commit 100b61a6f2f720f812620a9d10afb3a960ccb73c
+Author: Thomas Rast 
+Date:   Thu Feb 28 10:48:10 2013 +0100
+
+change to an incomplete line at end
+
+diff --git a/a.c b/a.c
+--- a/a.c
 b/a.c
+@@ -4,19 +4,19 @@
+ long f(long x)
+ {
+   int s = 0;
+   while (x) {
+   x >>= 1;
+   s++;
+   }
+   return s;
+ }
+ 
+ /*
+  * This is only an example!
+  */
+ 
+ int main ()
+ {
+   printf("%ld\n", f(15));
+   return 0;
+-}
++}
+\ No newline at end of file
+
+commit 39b6eb2d5b706d3322184a169f666f25ed3fbd00
+Author: Thomas Rast 
+Date:   Thu Feb 28 10:45:41 2013 +0100
+
+touch comment
+
+diff --git a/a.c b/a.c
+--- a/a.c
 b/a.c
+@@ -3,19 +3,19 @@
+ long f(long x)
+ {
+   int s = 0;
+   while (x) {
+   x >>= 1;
+   s++;
+   }
+   return s;
+ }
+ 
+ /*
+- * A comment.
++ * This is only an example!
+  */
+ 
+ int main ()
+ {
+   printf("%ld\n", f(15));
+   return 0;
+ }
+
 commit a6eb82647d5d67f893da442f8f9375fd89a3b1e2
 Author: Thomas Rast 
 Date:   Thu Feb 28 10:45:16 2013 +0100
@@ -7,7 +104,7 @@ Date:   Thu Feb 28 10:45:16 2013 +0100
 diff --git a/a.c b/a.c
 --- a/a.c
 +++ b/a.c
-@@ -3,9 +3,9 @@
+@@ -3,19 +3,19 @@
 -int f(int x)
 +long f(long x)
  {
@@ -18,6 +115,17 @@ diff --git a/a.c b/a.c
}
return s;
  }
+ 
+ /*
+  * A comment.
+  */
+ 
+ int main ()
+ {
+-  printf("%d\n", f(15));
++  printf("%ld\n", f(15));
+   return 0;
+ }
 
 commit f04fb20f2c77850996cba739709acc6faecc58f7
 Author: Thomas Rast 
@@ -28,7 +136,7 @@ Date:   Thu Feb 28 10:44:55 20

[PATCH v2 2/2] range_set: fix coalescing bug when range is a subset of another

2013-07-08 Thread Eric Sunshine
When coalescing ranges, sort_and_merge_range_set() unconditionally
assumes that the end of a range being folded into a preceding range
should become the end of the coalesced range. This assumption, however,
is invalid when one range is a subset of another.  For example, given
ranges 1-5 and 2-3 added via range_set_append_unsafe(),
sort_and_merge_range_set() incorrectly coalesces them to range 1-3
rather than the correct union range 1-5. Fix this bug.

Signed-off-by: Eric Sunshine 
---
 line-log.c  | 3 ++-
 t/t4211-line-log.sh | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/line-log.c b/line-log.c
index 4bbb09b..8cc29a0 100644
--- a/line-log.c
+++ b/line-log.c
@@ -116,7 +116,8 @@ static void sort_and_merge_range_set(struct range_set *rs)
 
for (i = 1; i < rs->nr; i++) {
if (rs->ranges[i].start <= rs->ranges[o-1].end) {
-   rs->ranges[o-1].end = rs->ranges[i].end;
+   if (rs->ranges[o-1].end < rs->ranges[i].end)
+   rs->ranges[o-1].end = rs->ranges[i].end;
} else {
rs->ranges[o].start = rs->ranges[i].start;
rs->ranges[o].end = rs->ranges[i].end;
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index 549df9e..7776f93 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -50,8 +50,8 @@ canned_test "-M -L ':f:b.c' parallel-change" 
parallel-change-f-to-main
 canned_test "-L 4,12:a.c -L :main:a.c simple" multiple
 canned_test "-L 4,18:a.c -L :main:a.c simple" multiple-overlapping
 canned_test "-L :main:a.c -L 4,18:a.c simple" multiple-overlapping
-canned_test_failure "-L 4:a.c -L 8,12:a.c simple" multiple-superset
-canned_test_failure "-L 8,12:a.c -L 4:a.c simple" multiple-superset
+canned_test "-L 4:a.c -L 8,12:a.c simple" multiple-superset
+canned_test "-L 8,12:a.c -L 4:a.c simple" multiple-superset
 
 test_bad_opts "-L" "switch.*requires a value"
 test_bad_opts "-L b.c" "argument.*not of the form"
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] fix broken range_set tests and coalescing

2013-07-08 Thread Eric Sunshine
This is a re-roll of a patch [1] which fixes the
line-log.c:sort_and_merge_range_set() coalescing bug.  This re-roll
inserts a new patch before the lone patch from v1.

patch 1/2: Fix broken tests in t4211 which should have detected the
sort_and_merge_range_set() bug but didn't due to incorrect
"expected" state. Mark the tests as expect-failure.

patch 2/2: Fix the sort_and_merge_range_set() coalesce bug. Same as v1
but also flips the tests to expect-success.

[1]: http://article.gmane.org/gmane.comp.version-control.git/229774

Eric Sunshine (2):
  t4211: fix broken test when one -L range is subset of another
  range_set: fix coalescing bug when range is a subset of another

 line-log.c   |   3 +-
 t/t4211/expect.multiple-superset | 134 ++-
 2 files changed, 133 insertions(+), 4 deletions(-)

-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remote-http: use argv-array

2013-07-08 Thread Bert Wesarg
On Tue, Jul 9, 2013 at 7:18 AM, Junio C Hamano  wrote:
> Instead of using a hand-managed argument array, use argv-array API
> to manage dynamically formulated command line.
>
> Signed-off-by: Junio C Hamano 
> ---
>  remote-curl.c | 31 +++
>  1 file changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/remote-curl.c b/remote-curl.c
> index 60eda63..884b3a3 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -7,6 +7,7 @@
>  #include "run-command.h"
>  #include "pkt-line.h"
>  #include "sideband.h"
> +#include "argv-array.h"
>
>  static struct remote *remote;
>  static const char *url; /* always ends with a trailing slash */
> @@ -787,36 +788,34 @@ static int push_dav(int nr_spec, char **specs)
>  static int push_git(struct discovery *heads, int nr_spec, char **specs)
>  {
> struct rpc_state rpc;
> -   const char **argv;
> -   int argc = 0, i, err;
> +   int i, err;
> +   struct argv_array args;
> +
> +   argv_array_init(&args);
> +   argv_array_pushl(&args, "send-pack", "--stateless-rpc", 
> "--helper-status");

missing NULL sentinel. GCC has the 'sentinel' [1] attribute to catch
such errors. Or use macro magic:

void argv_array_pushl_(struct argv_array *array, ...);
#define argv_array_pushl(array, ...) argv_array_pushl_(array, __VA_ARGS__, NULL)

Bert

[1] 
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bsentinel_007d-function-attribute-2708
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remote-http: use argv-array

2013-07-08 Thread Jeff King
On Tue, Jul 09, 2013 at 08:05:19AM +0200, Bert Wesarg wrote:

> > +   argv_array_pushl(&args, "send-pack", "--stateless-rpc", 
> > "--helper-status");
> 
> missing NULL sentinel. GCC has the 'sentinel' [1] attribute to catch
> such errors. Or use macro magic:
> 
> void argv_array_pushl_(struct argv_array *array, ...);
> #define argv_array_pushl(array, ...) argv_array_pushl_(array, __VA_ARGS__, 
> NULL)

Nice catch. We cannot use variadic macros, because we support pre-C99
compilers that do not have them. But the sentinel attribute is a good
idea. Here's a patch.

-- >8 --
Subject: [PATCH] argv-array: add sentinel attribute to argv_array_pushl

This attribute can help gcc notice when callers forget to add a
NULL sentinel to the end of the function. We shouldn't need
to #ifdef for other compilers, as __attribute__ is already a
no-op on non-gcc-compatible compilers.

Suggested-by: Bert Wesarg 
Signed-off-by: Jeff King 
---
This is our first use of an __attribute__ that is not "noreturn" or
"format". I assume this one should be supported on other gcc-compatible
compilers like clang.

 argv-array.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/argv-array.h b/argv-array.h
index 40248d4..e805748 100644
--- a/argv-array.h
+++ b/argv-array.h
@@ -15,6 +15,7 @@ void argv_array_pushf(struct argv_array *, const char *fmt, 
...);
 void argv_array_push(struct argv_array *, const char *);
 __attribute__((format (printf,2,3)))
 void argv_array_pushf(struct argv_array *, const char *fmt, ...);
+__attribute__((sentinel))
 void argv_array_pushl(struct argv_array *, ...);
 void argv_array_pop(struct argv_array *);
 void argv_array_clear(struct argv_array *);
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html