Re: Something wrong with diff --color-words=regexp?

2015-02-19 Thread Johannes Sixt

Am 20.02.2015 um 00:52 schrieb Mike Hommey:

Hi,

I was trying to use --color-words with a regex to check a diff, and it appears
it displays things out of order. Am I misunderstanding what my regexp should be
doing or is there a bug?

$ git diff -U3 HEAD^ dom/base/nsDOMFileReader.cpp
diff --git a/dom/base/nsDOMFileReader.cpp b/dom/base/nsDOMFileReader.cpp
index 6267e0e..fa22590 100644
--- a/dom/base/nsDOMFileReader.cpp
+++ b/dom/base/nsDOMFileReader.cpp
@@ -363,7 +363,7 @@ nsDOMFileReader::DoReadData(nsIAsyncInputStream* aStream, 
uint64_t aCount)
return NS_ERROR_OUT_OF_MEMORY;
  }
  if (mDataFormat != FILE_AS_ARRAYBUFFER) {
-  mFileData = (char *) moz_realloc(mFileData, mDataLen + aCount);
+  mFileData = (char *) realloc(mFileData, mDataLen + aCount);
NS_ENSURE_TRUE(mFileData, NS_ERROR_OUT_OF_MEMORY);
  }

$ git diff -U3 --color-words='[^ ()]' HEAD^ dom/base/nsDOMFileReader.cpp
diff --git a/dom/base/nsDOMFileReader.cpp b/dom/base/nsDOMFileReader.cpp
index 6267e0e..fa22590 100644
--- a/dom/base/nsDOMFileReader.cpp
+++ b/dom/base/nsDOMFileReader.cpp
@@ -363,7 +363,7 @@ nsDOMFileReader::DoReadData(nsIAsyncInputStream* aStream, 
uint64_t aCount)
   return NS_ERROR_OUT_OF_MEMORY;
 }
 if (mDataFormat != FILE_AS_ARRAYBUFFER) {
   mFileData = (char *moz_) realloc(mFileData, mDataLen + aCount);
   NS_ENSURE_TRUE(mFileData, NS_ERROR_OUT_OF_MEMORY);
 }


Your regexp says that every character (with a few exceptions) by itself 
is a word. Your diff says that it deleted the words 'm', 'o', 'z', and 
'_'. So, that is not wrong.


Furthermore, your regexp says that space, '(' and ')' are whitespace. 
Whitespace is *ignored* for computation of the word difference. 
Nevertheless, --color-word mode helpfully keeps the whitespace of the 
post-image to produce readable output. In doing so, it has to choose 
whether to keep the whitespace before or after a word. It chooses to 
keep it before a word. Hence, you see the whitespace sequence ') ' 
attached in front of 'r' (of 'realloc') instead of after '*'. So, the 
procedure is a matter of choice, which sometimes does not match 
expectations.


Perhaps you meant to say

--color-words='[^ ()]+'

to split the diff text into longer words.

-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFH] GSoC 2015 application

2015-02-19 Thread Jeff King
On Fri, Feb 20, 2015 at 06:35:09AM +0100, Michael Haggerty wrote:

> On 02/18/2015 08:14 PM, Jeff King wrote:
> > The response to my previous email was not overwhelming, but people did
> > express some interest in Git doing GSoC this year. So I've started on
> > the application, using last year's version as a template.
> 
> Regretfully, I can't in good conscience volunteer to be a GSoC mentor
> this year. I have too many other projects going on and don't see how I
> can free up enough time to be a good mentor.

Thanks for letting us know. I am somewhat in the same boat. I might be
able to make time, but the bar that the student/project combo would have
to clear would be quite high for me to agree to do so.

This brings up an important issue. We cannot do GSoC without mentors. I
had hoped that people populating the "ideas" list would volunteer to
mentor for their projects.

But so far the possibilities are:

  - Stefan

  - me, who has already promised to be stingy

  - Matthieu, who also cited time constraints

  - Junio, who contributed some project ideas, but who in the past has
declined to mentor in order to remain impartial as the maintainer
who evaluates student results (which I think is quite reasonable)

So...basically 1 mentor and 2 reticent maybes? That doesn't look good.
We are not committed to anything until we accept student proposals,
of course. But I would not want to waste students' time in applying if
it is not realistic for us to accept them.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFH] GSoC 2015 application

2015-02-19 Thread Jeff King
On Fri, Feb 20, 2015 at 10:26:15AM +0700, Duy Nguyen wrote:

> On Thu, Feb 19, 2015 at 2:14 AM, Jeff King  wrote:
> > and the list of microprojects:
> >
> >   http://git.github.io/SoC-2015-Microprojects.html
> >
> 
> There is debian bug 777690 [1] that's basically about making tag's
> version sort aware about -rc, -pre suffixes. I imagine it would touch
> versioncmp.c and builtin/tag.c (to retrieve the suffixes from config
> file).
> 
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777690

I think that's a reasonable thing to work on, but it's too big for a
microproject and too small for a GSoC.

I think this could be an "extra credit" for the project to unify
for-each-ref, "tag -l", and "branch -l", though. That will vastly
enhance the supporting abilities the latter two (e.g., you could sort by
taggerdate).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Martin Fick
On Feb 19, 2015 5:42 PM, David Turner  wrote:
>
> On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote: 
> > >    * 'git push'? 
> > 
> > This one is not affected by how deep your repo's history is, or how 
> > wide your tree is, so should be quick.. 
> > 
> > Ah the number of refs may affect both git-push and git-pull. I think 
> > Stefan knows better than I in this area. 
>
> I can tell you that this is a bit of a problem for us at Twitter.  We 
> have over 100k refs, which adds ~20MiB of downstream traffic to every 
> push. 
>
> I added a hack to improve this locally inside Twitter: The client sends 
> a bloom filter of shas that it believes that the server knows about; the 
> server sends only the sha of master and any refs that are not in the 
> bloom filter.  The client  uses its local version of the servers' refs 
> as if they had just been sent.  This means that some packs will be 
> suboptimal, due to false positives in the bloom filter leading some new 
> refs to not be sent.  Also, if there were a repack between the pull and 
> the push, some refs might have been deleted on the server; we repack 
> rarely enough and pull frequently enough that this is hopefully not an 
> issue. 
>
> We're still testing to see if this works.  But due to the number of 
> assumptions it makes, it's probably not that great an idea for general 
> use. 

Good to hear that others are starting to experiment with solutions to this 
problem!  I hope to hear more updates on this.

I have a prototype of a simpler, and
I believe more robust solution, but aimed at a smaller use case I think.  On 
connecting, the client sends a sha of all its refs/shas as defined by a 
refspec, which it also sends to the server, which it believes the server might 
have the same refs/shas values for.  The server can then calculate the value of 
its refs/shas which meet the same refspec, and then omit sending those refs if 
the "verification" sha matches, and instead send only a confirmation that they 
matched (along with any refs outside of the refspec).  On a match, the client 
can inject the local values of the refs which met the refspec and be guaranteed 
that they match the server's values.

This optimization is aimed at the worst case scenario (and is thus the 
potentially best case "compression"), when the client and server match for all 
refs (a refs/* refspec)  This is something that happens often on Gerrit server 
startup, when it verifies that its mirrors are up-to-date.  One reason I chose 
this as a starting optimization, is because I think it is one use case which 
will actually not benefit from "fixing" the git protocol to only send relevant 
refs since all the refs are in fact relevant here! So something like this will 
likely be needed in any future git protocol in order for it to be efficient for 
this use case.  And I believe this use case is likely to stick around.

With a minor tweak, this optimization should work when replicating actual 
expected updates also by excluding the expected updating refs from the 
verification so that the server always sends their values since they will 
likely not match and would wreck the optimization.  However, for this use case 
it is not clear whether it is actually even worth caring about the non updating 
refs?  In theory the knowledge of the non updating refs can potentially reduce 
the amount of data transmitted, but I suspect that as the ref count increases, 
this has diminishing returns and mostly ends up chewing up CPU and memory in a 
vain attempt to reduce network traffic.

Please do keep us up-to-date of your results,

-Martin


Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a 
Linux Foundation Collaborative 
ProjectN�r��yb�X��ǧv�^�)޺{.n�+ا���ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: [RFH] GSoC 2015 application

2015-02-19 Thread Michael Haggerty
On 02/18/2015 08:14 PM, Jeff King wrote:
> The response to my previous email was not overwhelming, but people did
> express some interest in Git doing GSoC this year. So I've started on
> the application, using last year's version as a template.

Regretfully, I can't in good conscience volunteer to be a GSoC mentor
this year. I have too many other projects going on and don't see how I
can free up enough time to be a good mentor.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interested in helping open source friends on HP-UX?

2015-02-19 Thread Jeff King
On Thu, Feb 19, 2015 at 02:21:11PM +0100, Michael J Gruber wrote:

> > It passes NO_ICONV through to the test suite, sets up a prerequisite,
> > disables some test scripts which are purely about i18n (e.g.,
> > t3900-i18n-commit), and marks some of the scripts with one-off tests
> > using the ICONV prereq.
> 
> Hmm. I know we pass other stuff down, but is this really a good idea? It
> relies on the fact that the git that we test was built with the options
> from there. This assumptions breaks (with) GIT_TEST_INSTALLED, if not more.
> 
> Basically, it may break as soon as we run the tests by other means than
> "make", which is quite customary if you run single tests.
> 
> (And we do pass config.mak down, me thinks, but NO_ICONV may come from
> the command line.)

It's not quite so bad as you make out. We write the value to the
GIT-BUILD-OPTIONS file during "make", no matter where it comes from, and
load that in test-lib.sh. So:

  make NO_ICONV=Nope
  cd t
  ./t3901-i18n-patch.sh

works just fine (for this and for any of the other options we mark
there).

It won't work for GIT_TEST_INSTALLED, but that is not a new problem.
Fundamentally you cannot expect to test a version built without option X
without telling git _somehow_ that it was built that way.

I suspect GIT_TEST_INSTALLED is not all that widely used, or somebody
would have complained before. But if we really want to support it, I
think the right thing is to bake GIT-BUILD-OPTIONS into the binary, so
that "git --build-options" dumps it. It might also have value for
debugging and forensics in general.

> Jeff, you got it wrong. You should do the hard part and leave the easy
> part to us!

Oops. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] log --decorate: do not leak "commit" color into the next item

2015-02-19 Thread Jeff King
On Thu, Feb 19, 2015 at 10:02:12AM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Yeah, I think this is a good fix. I had a vague feeling that we may have
> > done this on purpose to let the decoration color "inherit" from the
> > existing colors for backwards compatibility, but I don't think that
> > could ever have worked (since color.decorate.* never defaulted to
> > "normal").
> 
> Hmph, but that $gmane/191118 talks about giving bold to commit-color
> and then expecting for decors to inherit the boldness, a wish I can
> understand.  But I do not necessarily agree with it---it relies on
> that after "(" and ", " there is no reset,
> which is not how everything else works.

I don't see anybody actually _wanting_ the inheritance. It is mentioned
merely as an observation. So yeah, we would break anybody who does:

  [color "diff"]
  commit = blue

  [color "decorate"]
  branch = normal
  remoteBranch = normal
  tag = normal
  stash = normal
  HEAD = normal

and expects the "blue" to persist automatically.

But given that this behaves in the opposite way of every other part of
git's color handling, I think we can call it a bug, and people doing
that are crazy (they should s/normal/blue/ in the latter config).

> So this change at least needs to come with an explanation to people
> who are used to and took advantage of this color attribute leakage,
> definitely in the log message and preferrably to the documentation
> that covers all the color.*. settings, I think.

I'd agree it is worth a mention in the log (and possibly release notes),
but I don't think it is worth polluting the documentation forever
(though explaining that we never inherit might be worth doing, and that
is perhaps what you meant).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFH] GSoC 2015 application

2015-02-19 Thread Duy Nguyen
On Thu, Feb 19, 2015 at 2:14 AM, Jeff King  wrote:
> and the list of microprojects:
>
>   http://git.github.io/SoC-2015-Microprojects.html
>

There is debian bug 777690 [1] that's basically about making tag's
version sort aware about -rc, -pre suffixes. I imagine it would touch
versioncmp.c and builtin/tag.c (to retrieve the suffixes from config
file).

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777690
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFH] GSoC 2015 application

2015-02-19 Thread Jeff King
On Thu, Feb 19, 2015 at 11:32:46AM +0100, Matthieu Moy wrote:

> > I do need somebody to volunteer as backup admin. This doesn't need
> > to involve any specific commitment, but is mostly about what to do if I
> > get hit by a bus.
> 
> If you promise me to try hard not to be hit by a bus and no one else
> steps in, I can be the backup admin.

Thanks. I need you to register and create a profile at:

  https://www.google-melange.com/gsoc/homepage/google/gsoc2015

and tell me your username (the information from last year does not carry
forward automatically). Then I mark you as backup admin and (I think)
you have to then accept.

> Throwing out a few ideas for discussion, I can write something if people
> agree.
> 
> * "git bisect fixed/unfixed", to allow bisecting a fix instead of a
>   regression less painfully. There were already some proposed patches
>   ( 
> https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#git_bisect_fix.2Funfixed
>  ),
>   so it shouldn't be too hard. Perhaps this item can be included in the
>   "git bisect --first-parent" idea (turning it into "git bisect
>   improvements").

That seems like a reasonable topic. I was about to say "but it's much
more complicated than fix/unfixed..." but it looks like that wiki entry
covers the past discussion (and reading and understanding that would be
a first step for the student). I agree it's probably smaller than a
full-summer project and can get lumped into the other bisect idea.

> * Be nicer to the user on tracked/untracked merge conflicts
> [...]

Sounds OK to me, though I agree the merging of untracked files is a
little controversial. There are also a lot of corner cases in
merge-recursive, and I think still some documented cases where we can
overwrite untracked files. Maybe a more encompassing project would be to
organize and dig into some of those corner cases.

>  SoC-2015-Microprojects.md | 42 ++
>  1 file changed, 42 insertions(+)

Thanks, applied, although...

> +### Move ~/.git-credentials and ~/.git-credential-cache to ~/.config/git
> +
> +Most of git dotfiles can be located, at the user's option, in
> +~/. or in ~/.config/git/, following the [XDG
> +standard](http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html).
> +~/.git-credentials and ~/.git-credential-cache are still hardcoded as
> +~/., and should allow using the XDG directory layout too
> +(~/.git-credentials could be allowed as ~/.config/git/credential and
> +~/.git-credential-cache could be allowed as ~/.cache/git/credential,
> +possibly modified by $XDG_CONFIG_HOME and $XDG_CACHE_HOME).
> +
> +Each of these files can be a microproject of its own. The suggested
> +approach is:
> +
> +* See how XDG was implemented for other files (run "git log --grep
> +  XDG" in Git's source code) and read the XDG specification.
> +
> +* Implement and test the new behavior, without breaking compatibility
> +  with the old behavior.
> +
> +* Update the documentation

I think these might be getting a little larger than "micro". That's OK
if the student can handle it, but we may want to mark them as such. I'll
leave it for now, though, as we have a bit more breathing room on the
microprojects.

> +### Add configuration options for some commonly used command-line options
> +
> +This includes:
> +
> +* git am -3
> +
> +* git am -c
> +
> +Some people always run the command with these options, and would
> +prefer to be able to activate them by default in ~/.gitconfig.

The direction here seems reasonable, though I think we have
mailinfo.scissors already, so "-c" may not be a good example.

> +### Add more builtin patterns for userdiff
> +
> +"git diff" shows the function name corresponding to each hunk after
> +the @@ ... @@ line. For common languages (C, HTML, Ada, Matlab, ...),
> +the way to find the function name is built-in Git's source code as
> +regular expressions (see userdiff.c). A few languages are common
> +enough to deserve a built-in driver, but are not yet recognized. For
> +example, CSS, shell.

I am not sure that understanding the horrible regexes involved in some
userdiff counts as "micro", but OK. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] index-pack: kill union delta_base to save memory

2015-02-19 Thread Nguyễn Thái Ngọc Duy
Once we know the number of objects in the input pack, we allocate an
array of nr_objects of struct delta_entry. On x86-64, this struct is
32 bytes long. The union delta_base, which is part of struct
delta_entry, provides enough space to store either ofs-delta (8 bytes)
or ref-delta (20 bytes).

Notice that with "recent" Git versions, ofs-delta objects are
preferred over ref-delta objects and ref-delta objects have no reason
to be present in a clone pack. So in clone case we waste (20-8) *
nr_objects bytes because of this union. That's about 38MB out of 100MB
for deltas[] with 3.4M objects, or 38%. deltas[] would be around 62MB
without the waste.

This patch attempts to eliminate that. deltas[] array is split into
two: one for ofs-delta and one for ref-delta. Many functions are also
duplicated because of this split. With this patch, ofs_deltas[] array
takes 51MB. ref_deltas[] should remain unallocated in clone case (0
bytes). This array grows as we see ref-delta. We save about half in
clone case, or 25% of total bookkeeping.

The saving is more than the calculation above because some padding in
the old delta_entry struct is removed. ofs_delta_entry is 16 bytes,
including the 4 bytes padding. That's 13MB for padding, but packing
the struct could break platforms that do not support unaligned
access. If someone on 32-bit is really low on memory and only deals
with packs smaller than 2G, using 32-bit off_t would eliminate the
padding and save 27MB on top.

A note about ofs_deltas allocation. We could use ref_deltas memory
allocation strategy for ofs_deltas. But that probably just adds more
overhead on top. ofs-deltas are generally the majority (1/2 to 2/3) in
any pack. Incremental realloc may lead to too many memcpy. And if we
preallocate, say 1/2 or 2/3 of nr_objects initially, the growth rate
of ALLOC_GROW() could make this array larger than nr_objects, wasting
more memory.

Brought-up-by: Matthew Sporleder 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/index-pack.c | 260 +++
 1 file changed, 160 insertions(+), 100 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 07b2c0c..eae41c4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -28,11 +28,6 @@ struct object_stat {
int base_object_no;
 };
 
-union delta_base {
-   unsigned char sha1[20];
-   off_t offset;
-};
-
 struct base_data {
struct base_data *base;
struct base_data *child;
@@ -52,26 +47,28 @@ struct thread_local {
int pack_fd;
 };
 
-/*
- * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
- * to memcmp() only the first 20 bytes.
- */
-#define UNION_BASE_SZ  20
-
 #define FLAG_LINK (1u<<20)
 #define FLAG_CHECKED (1u<<21)
 
-struct delta_entry {
-   union delta_base base;
+struct ofs_delta_entry {
+   off_t offset;
+   int obj_no;
+};
+
+struct ref_delta_entry {
+   unsigned char sha1[20];
int obj_no;
 };
 
 static struct object_entry *objects;
 static struct object_stat *obj_stat;
-static struct delta_entry *deltas;
+static struct ofs_delta_entry *ofs_deltas;
+static struct ref_delta_entry *ref_deltas;
 static struct thread_local nothread_data;
 static int nr_objects;
-static int nr_deltas;
+static int nr_ofs_deltas;
+static int nr_ref_deltas;
+static int ref_deltas_alloc;
 static int nr_resolved_deltas;
 static int nr_threads;
 
@@ -480,7 +477,8 @@ static void *unpack_entry_data(unsigned long offset, 
unsigned long size,
 }
 
 static void *unpack_raw_entry(struct object_entry *obj,
- union delta_base *delta_base,
+ off_t *ofs_offset,
+ unsigned char *ref_sha1,
  unsigned char *sha1)
 {
unsigned char *p;
@@ -509,11 +507,10 @@ static void *unpack_raw_entry(struct object_entry *obj,
 
switch (obj->type) {
case OBJ_REF_DELTA:
-   hashcpy(delta_base->sha1, fill(20));
+   hashcpy(ref_sha1, fill(20));
use(20);
break;
case OBJ_OFS_DELTA:
-   memset(delta_base, 0, sizeof(*delta_base));
p = fill(1);
c = *p;
use(1);
@@ -527,8 +524,8 @@ static void *unpack_raw_entry(struct object_entry *obj,
use(1);
base_offset = (base_offset << 7) + (c & 127);
}
-   delta_base->offset = obj->idx.offset - base_offset;
-   if (delta_base->offset <= 0 || delta_base->offset >= 
obj->idx.offset)
+   *ofs_offset = obj->idx.offset - base_offset;
+   if (*ofs_offset <= 0 || *ofs_offset >= obj->idx.offset)
bad_object(obj->idx.offset, _("delta base offset is out 
of bound"));
break;
case OBJ_COMMIT:
@@ -612,55 +609,108 @@ static void *get_data_from_pack(struct object_entry *obj)
return unpack_d

[PATCH 1/2] index-pack: reduce object_entry size to save memory

2015-02-19 Thread Nguyễn Thái Ngọc Duy
For each object in the input pack, we need one struct object_entry. On
x86-64, this struct is 64 bytes long. Although:

 - The 8 bytes for delta_depth and base_object_no are only useful when
   show_stat is set. And it's never set unless someone is debugging.

 - The three fields hdr_size, type and real_type take 4 bytes each
   even though they never use more than 4 bits.

By moving delta_depth and base_object_no out of struct object_entry
and make the other 3 fields one byte long instead of 4, we shrink 25%
of this struct.

On a 3.4M object repo (*) that's about 53MB. The saving is less
impressive compared to index-pack memory use for basic bookkeeping (**),
about 16%.

(*) linux-2.6.git already has 4M objects as of v3.19-rc7 so this is
not an unrealistic number of objects that we have to deal with.

(**)  3.4M * (sizeof(object_entry) + sizeof(delta_entry)) = 311MB

Brought-up-by: Matthew Sporleder 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/index-pack.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4632117..07b2c0c 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -18,9 +18,12 @@ static const char index_pack_usage[] =
 struct object_entry {
struct pack_idx_entry idx;
unsigned long size;
-   unsigned int hdr_size;
-   enum object_type type;
-   enum object_type real_type;
+   unsigned char hdr_size;
+   char type;
+   char real_type;
+};
+
+struct object_stat {
unsigned delta_depth;
int base_object_no;
 };
@@ -64,6 +67,7 @@ struct delta_entry {
 };
 
 static struct object_entry *objects;
+static struct object_stat *obj_stat;
 static struct delta_entry *deltas;
 static struct thread_local nothread_data;
 static int nr_objects;
@@ -873,13 +877,15 @@ static void resolve_delta(struct object_entry *delta_obj,
void *base_data, *delta_data;
 
if (show_stat) {
-   delta_obj->delta_depth = base->obj->delta_depth + 1;
+   int i = delta_obj - objects;
+   int j = base->obj - objects;
+   obj_stat[i].delta_depth = obj_stat[j].delta_depth + 1;
deepest_delta_lock();
-   if (deepest_delta < delta_obj->delta_depth)
-   deepest_delta = delta_obj->delta_depth;
+   if (deepest_delta < obj_stat[i].delta_depth)
+   deepest_delta = obj_stat[i].delta_depth;
deepest_delta_unlock();
+   obj_stat[i].base_object_no = j;
}
-   delta_obj->base_object_no = base->obj - objects;
delta_data = get_data_from_pack(delta_obj);
base_data = get_base_data(base);
result->obj = delta_obj;
@@ -902,7 +908,7 @@ static void resolve_delta(struct object_entry *delta_obj,
  * "want"; if so, swap in "set" and return true. Otherwise, leave it untouched
  * and return false.
  */
-static int compare_and_swap_type(enum object_type *type,
+static int compare_and_swap_type(char *type,
 enum object_type want,
 enum object_type set)
 {
@@ -1499,7 +1505,7 @@ static void show_pack_info(int stat_only)
struct object_entry *obj = &objects[i];
 
if (is_delta_type(obj->type))
-   chain_histogram[obj->delta_depth - 1]++;
+   chain_histogram[obj_stat[i].delta_depth - 1]++;
if (stat_only)
continue;
printf("%s %-6s %lu %lu %"PRIuMAX,
@@ -1508,8 +1514,8 @@ static void show_pack_info(int stat_only)
   (unsigned long)(obj[1].idx.offset - obj->idx.offset),
   (uintmax_t)obj->idx.offset);
if (is_delta_type(obj->type)) {
-   struct object_entry *bobj = 
&objects[obj->base_object_no];
-   printf(" %u %s", obj->delta_depth, 
sha1_to_hex(bobj->idx.sha1));
+   struct object_entry *bobj = 
&objects[obj_stat[i].base_object_no];
+   printf(" %u %s", obj_stat[i].delta_depth, 
sha1_to_hex(bobj->idx.sha1));
}
putchar('\n');
}
@@ -1672,6 +1678,8 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
curr_pack = open_pack_file(pack_name);
parse_pack_header();
objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
+   if (show_stat)
+   obj_stat = xcalloc(nr_objects + 1, sizeof(struct object_stat));
deltas = xcalloc(nr_objects, sizeof(struct delta_entry));
parse_pack_objects(pack_sha1);
resolve_deltas();
-- 
2.3.0.rc1.137.g477eb31

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] nd/slim-index-pack-memory-usage updates

2015-02-19 Thread Nguyễn Thái Ngọc Duy
Compared to 'pu', the first patch is unchanged, except the commit
message. The second patch has __attribute((packed)) removed because it
causes problems on some ARM systems. x86 people who want to save more
memory just have to put it back by themselves.

Nguyễn Thái Ngọc Duy (2):
  index-pack: reduce object_entry size to save memory
  index-pack: kill union delta_base to save memory

 builtin/index-pack.c | 290 +++
 1 file changed, 179 insertions(+), 111 deletions(-)

-- 
2.3.0.rc1.137.g477eb31

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread David Turner
On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote:
> >* 'git push'?
> 
> This one is not affected by how deep your repo's history is, or how
> wide your tree is, so should be quick..
> 
> Ah the number of refs may affect both git-push and git-pull. I think
> Stefan knows better than I in this area.

I can tell you that this is a bit of a problem for us at Twitter.  We
have over 100k refs, which adds ~20MiB of downstream traffic to every
push.

I added a hack to improve this locally inside Twitter: The client sends
a bloom filter of shas that it believes that the server knows about; the
server sends only the sha of master and any refs that are not in the
bloom filter.  The client  uses its local version of the servers' refs
as if they had just been sent.  This means that some packs will be
suboptimal, due to false positives in the bloom filter leading some new
refs to not be sent.  Also, if there were a repack between the pull and
the push, some refs might have been deleted on the server; we repack
rarely enough and pull frequently enough that this is hopefully not an
issue.

We're still testing to see if this works.  But due to the number of
assumptions it makes, it's probably not that great an idea for general
use.

There are probably more complex schemes to compute minimal (or
small-enough) packs; in particular, if the patch is just a few megs off
of master, it's better to just send the whole pack.  That doesn't work
for us because we've got a log-based replication scheme that the pack
appends to, and we don't want the log to get too big; we want
more-minimal packs than that.  But it might work for others.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Duy Nguyen
On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
 wrote:
> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:
>
>  * Around 500k commits
>  * Around 100k tags
>  * Around 5k branches
>  * Around 500 commits/day, almost entirely to the same branch
>  * 1.5 GB .git checkout.
>  * Mostly text source, but some binaries (we're trying to cut down[1] on 
> those)

Would be nice if you could make an anonymized version of this repo
public. Working on a "real" large repo is better than an artificial
one.

> But actually most of "git fetch" is spent in the reachability check
> subsequently done by "git-rev-list" which takes several seconds. I

I wonder if reachability bitmap could help here..

> haven't looked into it but there's got to be room for optimization
> there, surely it only has to do reachability checks for new refs, or
> could run in some "I trust this remote not to send me corrupt data"
> completely mode (which would make sense within a company where you can
> trust your main Git box).

No, it's not just about trusting the server side, it's about catching
data corruption on the wire as well. We have a trick to avoid
reachability check in clone case, which is much more expensive than a
fetch. Maybe we could do something further to help the fetch case _if_
reachability bitmaps don't help.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread brian m. carlson
On Thu, Feb 19, 2015 at 04:26:58PM -0500, Stephen Morton wrote:
> I posted this to comp.version-control.git.user and didn't get any response. I
> think the question is plumbing-related enough that I can ask it here.
> 
> I'm evaluating the feasibility of moving my team from SVN to git. We have a 
> very
> large repo. [1] We will have a central repo using GitLab (or similar) that
> everybody works with. Forks, code sharing, pull requests etc. will be done
> through this central server.
> 
> By 'performance', I guess I mean speed of day to day operations for devs.
> 
>* (Obviously, trivially, a (non-local) clone will be slow with a large 
> repo.)
>* Will a few simultaneous clones from the central server also slow down
>  other concurrent operations for other users?

This hasn't been a problem for us at $DAYJOB.  Git doesn't lock anything 
on fetches, so each process is independent.  We probably have about 
sixty developers (and maybe twenty other occasional users) that manage 
to interact with our Git server all day long.  We also have probably 
twenty smoker (CI) systems pulling at two hour intervals, or, when 
there's nothing to do, every two minutes, plus probably fifteen to 
twenty build systems pulling hourly.

I assume you will provide adequate resources for your server.

>* Will 'git pull' be slow?
>* 'git push'?

The most pathological case I've seen for git push is a branch with a 
single commit merged into the main development branch.  As of Git 2.3.0, 
the performance regression here is fixed.

Obviously, the speed of your network connection will affect this.  Even 
at 30 MB/s, cloning several gigabytes of data takes time.  Git tries 
hard to eliminate sending a lot of data, so if your developers keep 
reasonably up-to-date, the cost of establishing the connection will tend 
to dominate.

I see pull and push times that are less than 2 seconds in most cases.

>* 'git commit'? (It is listed as slow in reference [3].)
>* 'git stautus'? (Slow again in reference 3 though I don't see it.)

These can be slow with slow disks or over remote file systems.  I 
recommend not doing that.  I've heard rumbles that disk performance is 
better on Unix, but I don't use Windows so I can't say.

You should keep your .gitignore files up-to-date to avoid enumerating 
untracked files.  There's some work towards making this less of an 
issue.

git blame can be somewhat slow, but it's not something I use more than 
about once a day, so it doesn't bother me that much.

> Assuming I can put lots of resources into a central server with lots of CPU,
> RAM, fast SSD, fast networking, what aspects of the repo are most likely to
> affect devs' experience?
>* Number of commits
>* Sheer disk space occupied by the repo

The number of files can impact performance due to the number of stat()s 
required.

>* Number of tags.
>* Number of branches.

The number of tags and branches individually is really less relevant 
than the total number of refs (tags, branches, remote branches, etc). 
Very large numbers of refs can impact performance on pushes and pulls 
due to the need to enumerate them all.

>* Binary objects in the repo that cause it to bloat in size [1]
>* Other factors?

If you want good performance, I'd recommend the latest version of Git 
both client- and server-side.  Newer versions of Git provide pack 
bitmaps, which can dramatically speed up clones and fetches, and Git 
2.3.0 fixes a performance regression with large numbers of refs in 
non-shallow repositories.

It is totally worth it to roll your own packages of git if your vendor 
provides old versions.

> Of the various HW items listed above --CPU speed, number of cores, RAM, SSD,
> networking-- which is most critical here?

I generally find that having a good disk cache is important with large 
repositories.  It may be advantageous to make sure the developer 
machines have adequate memory.  Performance is notably better on 
development machines (VMs) with 2 GB or 4 GB of memory instead of 1 GB.

I can't speak to the server side, as I'm not directly involved with its 
deployment.

> Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB 
> repo,
> 50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 
> "fix-up
> branches" which are just one little dangling commit required to change the 
> code
> a little bit between a commit a tag that was not quite made from it.)

I routinely work on a repo that's 1.9 GB packed, with 25k (and rapidly 
growing) refs.  Other developers work on a repo that's 9 GB packed, with 
somewhat fewer refs.  We don't tend to have problems with this.

Obviously, performance is better on some of our smaller repos, but it's 
not unacceptable on the larger ones.  I generally find that the 940 KB 
repo with huge numbers of files performs worse than the 1.9 GB repo with 
somewhat fewer.  If you can split your repository into multiple logical 
repositories, that wil

Something wrong with diff --color-words=regexp?

2015-02-19 Thread Mike Hommey
Hi,

I was trying to use --color-words with a regex to check a diff, and it appears
it displays things out of order. Am I misunderstanding what my regexp should be
doing or is there a bug?

$ git diff -U3 HEAD^ dom/base/nsDOMFileReader.cpp 
diff --git a/dom/base/nsDOMFileReader.cpp b/dom/base/nsDOMFileReader.cpp
index 6267e0e..fa22590 100644
--- a/dom/base/nsDOMFileReader.cpp
+++ b/dom/base/nsDOMFileReader.cpp
@@ -363,7 +363,7 @@ nsDOMFileReader::DoReadData(nsIAsyncInputStream* aStream, 
uint64_t aCount)
   return NS_ERROR_OUT_OF_MEMORY;
 }
 if (mDataFormat != FILE_AS_ARRAYBUFFER) {
-  mFileData = (char *) moz_realloc(mFileData, mDataLen + aCount);
+  mFileData = (char *) realloc(mFileData, mDataLen + aCount);
   NS_ENSURE_TRUE(mFileData, NS_ERROR_OUT_OF_MEMORY);
 }
 
$ git diff -U3 --color-words='[^ ()]' HEAD^ dom/base/nsDOMFileReader.cpp 
diff --git a/dom/base/nsDOMFileReader.cpp b/dom/base/nsDOMFileReader.cpp
index 6267e0e..fa22590 100644
--- a/dom/base/nsDOMFileReader.cpp
+++ b/dom/base/nsDOMFileReader.cpp
@@ -363,7 +363,7 @@ nsDOMFileReader::DoReadData(nsIAsyncInputStream* aStream, 
uint64_t aCount)
  return NS_ERROR_OUT_OF_MEMORY;
}
if (mDataFormat != FILE_AS_ARRAYBUFFER) {
  mFileData = (char *moz_) realloc(mFileData, mDataLen + aCount);
  NS_ENSURE_TRUE(mFileData, NS_ERROR_OUT_OF_MEMORY);
}


(This is with 2.3.0)

Cheers,

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD/PATCH] stash: introduce checkpoint mode

2015-02-19 Thread Kyle J. McKay

On Feb 19, 2015, at 09:49, Junio C Hamano wrote:

"Kyle J. McKay"  writes:


What about a shortcut to "reset-and-apply" as well?

I have often been frustrated when "git stash apply" refuses to work
because I have changes that would be stepped on and there's no -- 
force

option like git checkout has.  I end up doing a reset just so I can
run stash apply.


Doesn't that cut both ways, though?

A single step short-cut, done in any way other than a more explicit
way such as "git reset --hard && git stash apply" (e.g. "git stash
reset-and-apply" or "git stash apply --force") that makes it crystal
clear that the user _is_ discarding, has a risk of encouraging users
to form a dangerous habit of invoking the short-cut without thinking
and leading to "oops, I didn't mean that!".


Does that reasoning not also apply to the plethora of commands that  
take "--force" already?


I didn't check them all, but tag, checkout, push and branch  
immediately come to mind.  Why is it okay for all those other commands  
to have a --force mode, but not git stash?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Duy Nguyen
On Fri, Feb 20, 2015 at 4:26 AM, Stephen Morton
 wrote:
> By 'performance', I guess I mean speed of day to day operations for devs.
>
>* (Obviously, trivially, a (non-local) clone will be slow with a large 
> repo.)
>* Will a few simultaneous clones from the central server also slow down
>  other concurrent operations for other users?

There are no locks in server when cloning, so in theory cloning does
not affect other operations. Cloning can use lots of memory though
(and a lot of cpu unless you turn on reachability bitmap feature,
which you should).

>* Will 'git pull' be slow?

If we exclude the server side, the size of your tree is the main
factor, but your 25k files should be fine (linux has 48k files).

>* 'git push'?

This one is not affected by how deep your repo's history is, or how
wide your tree is, so should be quick..

Ah the number of refs may affect both git-push and git-pull. I think
Stefan knows better than I in this area.

>* 'git commit'? (It is listed as slow in reference [3].)
>* 'git stautus'? (Slow again in reference 3 though I don't see it.)
(also git-add)

Again, the size of your tree. I'm trying to address problems in [3],
but at your repo's size, I don't think you need to worry about it.

>* Some operations might not seem to be day-to-day but if they are called
>  frequently by the web front-end to GitLab/Stash/GitHub etc then
>  they can become bottlenecks. (e.g. 'git branch --contains' seems terribly
>  adversely affected by large numbers of branches.)
>* Others?

git-blame could be slow when a file is modified a lot.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Ævar Arnfjörð Bjarmason
On Thu, Feb 19, 2015 at 10:26 PM, Stephen Morton
 wrote:
> I posted this to comp.version-control.git.user and didn't get any response. I
> think the question is plumbing-related enough that I can ask it here.
>
> I'm evaluating the feasibility of moving my team from SVN to git. We have a 
> very
> large repo. [1] We will have a central repo using GitLab (or similar) that
> everybody works with. Forks, code sharing, pull requests etc. will be done
> through this central server.
>
> By 'performance', I guess I mean speed of day to day operations for devs.
>
>* (Obviously, trivially, a (non-local) clone will be slow with a large 
> repo.)
>* Will a few simultaneous clones from the central server also slow down
>  other concurrent operations for other users?
>* Will 'git pull' be slow?
>* 'git push'?
>* 'git commit'? (It is listed as slow in reference [3].)
>* 'git stautus'? (Slow again in reference 3 though I don't see it.)
>* Some operations might not seem to be day-to-day but if they are called
>  frequently by the web front-end to GitLab/Stash/GitHub etc then
>  they can become bottlenecks. (e.g. 'git branch --contains' seems terribly
>  adversely affected by large numbers of branches.)
>* Others?
>
>
> Assuming I can put lots of resources into a central server with lots of CPU,
> RAM, fast SSD, fast networking, what aspects of the repo are most likely to
> affect devs' experience?
>* Number of commits
>* Sheer disk space occupied by the repo
>* Number of tags.
>* Number of branches.
>* Binary objects in the repo that cause it to bloat in size [1]
>* Other factors?
>
> Of the various HW items listed above --CPU speed, number of cores, RAM, SSD,
> networking-- which is most critical here?
>
> (Stash recommends 1.5 x repo_size x number of concurrent clones of
> available RAM.
> I assume that is good advice in general.)
>
> Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB 
> repo,
> 50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 
> "fix-up
> branches" which are just one little dangling commit required to change the 
> code
> a little bit between a commit a tag that was not quite made from it.)
>
> While there's lots of information online, much of it is old [3] and with git
> constantly evolving I don't know how valid it still is. Then there's anecdotal
> evidence that is of questionable value.[2]
> Are many/all of the issues Facebook identified [3] resolved? (Yes, I
> understand Facebook went with Mercurial. But I imagine the git team 
> nevertheless
> took their analysis to heart.)

Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:

 * Around 500k commits
 * Around 100k tags
 * Around 5k branches
 * Around 500 commits/day, almost entirely to the same branch
 * 1.5 GB .git checkout.
 * Mostly text source, but some binaries (we're trying to cut down[1] on those)

The main scaling issues we have with Git are:

 * "git pull" takes around 10 seconds or so
 * Operations like "git status" are much slower because they scale
with the size of the work tree
 * Similarly "git rebase" takes a much longer time for each applied
commit, I think because it does the equivalent of "git status" for
every applied commit. Each commit applied takes around 1-2 seconds.
 * We have a lot of contention on pushes because we're mostly pushing
to one branch.
 * History spelunking (e.g. git log --reverse -p -G) is taking
longer by the day

The obvious reason for why "git pull" is slow is because
git-upload-pack spews the complete set of refs at you each time. The
output from that command is around 10MB in size for us now. It takes
around 300 ms to run that locally from hot cache, a bit more to send
it over the network.

But actually most of "git fetch" is spent in the reachability check
subsequently done by "git-rev-list" which takes several seconds. I
haven't looked into it but there's got to be room for optimization
there, surely it only has to do reachability checks for new refs, or
could run in some "I trust this remote not to send me corrupt data"
completely mode (which would make sense within a company where you can
trust your main Git box).

The "git status" operations could be made faster by having something
like watchman, there's been some effort on getting that done in Git,
but I haven't tried it. This seems to have been the main focus of
Facebook's Mercurial optimization effort.

Some of this you can "solve" mostly by doing e.g. "git status -uno",
having support for such unsafe operations (e.g. teaching rebase and
pals to use it) would be nice at the cost of some safety, but having
something that feeds of inotify would be even better.

It takes around 3 minutes to reclone our repo, we really don't care
(we rarely re-clone). But I thought I'd mention it because for some
reason this is important to Facebook and along with inotify were the
two major things they focused on.

As f

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stefan Beller
On Thu, Feb 19, 2015 at 3:06 PM, Stephen Morton
 wrote:
>
> I think I addressed most of this in my original post with the paragraph
>
>  "Assume ridiculous numbers. Let me exaggerate: say 1 million commits,
> 15 GB repo,
>   50k tags, 1,000 branches. (Due to historical code fixups, another
> 5,000 "fix-up
>   branches" which are just one little dangling commit required to
> change the code
>   a little bit between a commit and a tag that was not quite made from it.)"
>
> To that I'd add 25k files,
> no major rewrites,
> no huge binary files, but lots of a few MB binary files with many revisions.
>
> But even without details of my specific concerns, I thought that
> perhaps the git developers know what limits git's performance even if
> large projects like the kernel are not hitting these limits.
>
> Steve

I did not realize you gave numbers below, as I started answering after
reading the
first paragraphs. Sorry about that.

I think lots of files organized in a hierarchical fashion ranging in
the small MB range is
not a huge deal. Also history is a non issue

The problem arises with having lots of branches.
"640 git branches ought to be enough for everybody -- Linus" (just kidding)
Git doesn't really scale efficiently with lots of branches (second
hand information
except for fetch/pull where I did some patches on another topic recently).

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stephen Morton
On Thu, Feb 19, 2015 at 5:21 PM, Stefan Beller  wrote:
> On Thu, Feb 19, 2015 at 1:26 PM, Stephen Morton
>  wrote:
>> I posted this to comp.version-control.git.user and didn't get any response. I
>> think the question is plumbing-related enough that I can ask it here.
>>
>> I'm evaluating the feasibility of moving my team from SVN to git. We have a 
>> very
>> large repo. [1]
>>
>> [1] (Yes, I'm investigating ways to make our repo not so large etc. That's
>> beyond the scope of the discussion I'd like to have with this
>> question. Thanks.)
>
> What do you mean by large?
> * lots of files
> * large files
> * or even large binary files (bad to diff/merge)
> * long history (i.e. lots of small changes)
> * impactful history (changes which rewrite nearly everything from scratch)
>
> For reference, the linux
> * has 48414 files, in 3128 directories
> * the largest file is 1.1M, the whole repo is 600M
> * no really large binary files
> * more than 500051 changes/commits including merges
> * started in 2004 (when git was invented essentially)
> * the .git folder is 1.4G compared to the 600M files,
>indicating it may have been rewritting 3 times (well this
>metric is bogus, there is lots of compression
>going on in .git)
>
> and linux seems to be doing ok with git.
>
> So as long as you cannot pinpoint your question on what you are exactly
> concerned about, there will be no helpful answer I guess.
>
> linux is by no means a really large project, there are other projects way
> larger than that (I am thinking about the KDE project for example)
> and they do fine as well.
>
> Thanks,
> Stefan

Hi Stefan,

I think I addressed most of this in my original post with the paragraph

 "Assume ridiculous numbers. Let me exaggerate: say 1 million commits,
15 GB repo,
  50k tags, 1,000 branches. (Due to historical code fixups, another
5,000 "fix-up
  branches" which are just one little dangling commit required to
change the code
  a little bit between a commit and a tag that was not quite made from it.)"

To that I'd add 25k files,
no major rewrites,
no huge binary files, but lots of a few MB binary files with many revisions.

But even without details of my specific concerns, I thought that
perhaps the git developers know what limits git's performance even if
large projects like the kernel are not hitting these limits.

Steve
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stefan Beller
On Thu, Feb 19, 2015 at 1:26 PM, Stephen Morton
 wrote:
> I posted this to comp.version-control.git.user and didn't get any response. I
> think the question is plumbing-related enough that I can ask it here.
>
> I'm evaluating the feasibility of moving my team from SVN to git. We have a 
> very
> large repo. [1]
>
> [1] (Yes, I'm investigating ways to make our repo not so large etc. That's
> beyond the scope of the discussion I'd like to have with this
> question. Thanks.)

What do you mean by large?
* lots of files
* large files
* or even large binary files (bad to diff/merge)
* long history (i.e. lots of small changes)
* impactful history (changes which rewrite nearly everything from scratch)

For reference, the linux
* has 48414 files, in 3128 directories
* the largest file is 1.1M, the whole repo is 600M
* no really large binary files
* more than 500051 changes/commits including merges
* started in 2004 (when git was invented essentially)
* the .git folder is 1.4G compared to the 600M files,
   indicating it may have been rewritting 3 times (well this
   metric is bogus, there is lots of compression
   going on in .git)

and linux seems to be doing ok with git.

So as long as you cannot pinpoint your question on what you are exactly
concerned about, there will be no helpful answer I guess.

linux is by no means a really large project, there are other projects way
larger than that (I am thinking about the KDE project for example)
and they do fine as well.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's cooking in git.git (Feb 2015, #05; Thu, 19)

2015-02-19 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.

The second and third batch of topics have been merged to 'master'.
I am tempted to start discarding topics in the Stalled category that
haven't seen much reviews and discussions on for a long time.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* av/wincred-with-at-in-username-fix (2015-01-25) 1 commit
  (merged to 'next' on 2015-02-16 at 69dd76d)
 + wincred: fix get credential if username has "@"

 The credential helper for Windows (in contrib/) used to mishandle
 a user name with an at-sign in it.


* ch/new-gpg-drops-rfc-1991 (2015-01-29) 2 commits
  (merged to 'next' on 2015-02-16 at e2daf10)
 + t/lib-gpg: sanity-check that we can actually sign
 + t/lib-gpg: include separate public keys in keyring.gpg

 Older GnuPG implementations may not correctly import the keyring
 material we prepare for the tests to use.


* jc/push-cert (2015-02-12) 1 commit
  (merged to 'next' on 2015-02-16 at f40b3c5)
 + transport-helper: fix typo in error message when --signed is not supported

 "git push --signed" gave an incorrectly worded error message when
 the other side did not support the capability.


* jc/remote-set-url-doc (2015-01-29) 1 commit
  (merged to 'next' on 2015-02-16 at 1f9c342)
 + Documentation/git-remote.txt: stress that set-url is not for triangular

 Clarify in the documentation that "remote..pushURL" and
 "remote..URL" are there to name the same repository accessed
 via different transports, not two separate repositories.


* jk/config-no-ungetc-eof (2015-02-05) 2 commits
  (merged to 'next' on 2015-02-16 at b7fc890)
 + config_buf_ungetc: warn when pushing back a random character
 + config: do not ungetc EOF

 Reading configuration from a blob object, when it ends with a lone
 CR, use to confuse the configuration parser.


* jk/decimal-width-for-uintmax (2015-02-05) 1 commit
  (merged to 'next' on 2015-02-16 at e608239)
 + decimal_width: avoid integer overflow

 We didn't format an integer that wouldn't fit in "int" but in
 "uintmax_t" correctly.


* jk/pack-bitmap (2015-02-04) 1 commit
  (merged to 'next' on 2015-02-16 at 2e30424)
 + ewah: fix building with gcc < 3.4.0

 The pack bitmap support did not build with older versions of GCC.


* ye/http-accept-language (2015-01-28) 1 commit
  (merged to 'next' on 2015-02-16 at 10ed819)
 + http: add Accept-Language header if possible

 Using environment variable LANGUAGE and friends on the client side,
 HTTP-based transports now send Accept-Language when making requests.

--
[New Topics]

* ak/git-pm-typofix (2015-02-18) 1 commit
 - Git.pm: two minor typo fixes

 Will merge to 'next'.


* jc/decorate-leaky-separator-color (2015-02-18) 1 commit
 - log --decorate: do not leak "commit" color into the next item

 "git log --decorate" did not reset colors correctly around the
 branch names.

 Will merge to 'next'.

--
[Stalled]

* nd/list-files (2015-02-09) 21 commits
 . t3080: tests for git-list-files
 . list-files: -M aka diff-cached
 . list-files -F: show submodules with the new indicator '&'
 . list-files: add -F/--classify
 . list-files: show directories as well as files
 . list-files: do not show duplicate cached entries
 . list-files: sort output and remove duplicates
 . list-files: add -t back
 . list-files: add -1 short for --no-column
 . list-files: add -R/--recursive short for --max-depth=-1
 . list-files: -u does not imply showing stages
 . list-files: make alias 'ls' default to 'list-files'
 . list-files: a user friendly version of ls-files and more
 . ls-files: support --max-depth
 . ls-files: add --column
 . ls-files: add --color to highlight file names
 . ls-files: buffer full item in strbuf before printing
 . ls_colors.c: highlight submodules like directories
 . ls_colors.c: add a function to color a file name
 . ls_colors.c: parse color.ls.* from config file
 . ls_colors.c: add $LS_COLORS parsing code

 A new "git list-files" Porcelain command, "ls-files" with bells and
 whistles.

 No comments?  No reviews?  No interests?


* nd/untracked-cache (2015-02-09) 24 commits
 - git-status.txt: advertisement for untracked cache
 - untracked cache: guard and disable on system changes
 - mingw32: add uname()
 - t7063: tests for untracked cache
 - update-index: test the system before enabling untracked cache
 - update-index: manually enable or disable untracked cache
 - status: enable untracked cache
 - untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
 - untracked cache: mark index dirty if untracked cache is updated
 - untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
 - untracked ca

Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stephen Morton
I posted this to comp.version-control.git.user and didn't get any response. I
think the question is plumbing-related enough that I can ask it here.

I'm evaluating the feasibility of moving my team from SVN to git. We have a very
large repo. [1] We will have a central repo using GitLab (or similar) that
everybody works with. Forks, code sharing, pull requests etc. will be done
through this central server.

By 'performance', I guess I mean speed of day to day operations for devs.

   * (Obviously, trivially, a (non-local) clone will be slow with a large repo.)
   * Will a few simultaneous clones from the central server also slow down
 other concurrent operations for other users?
   * Will 'git pull' be slow?
   * 'git push'?
   * 'git commit'? (It is listed as slow in reference [3].)
   * 'git stautus'? (Slow again in reference 3 though I don't see it.)
   * Some operations might not seem to be day-to-day but if they are called
 frequently by the web front-end to GitLab/Stash/GitHub etc then
 they can become bottlenecks. (e.g. 'git branch --contains' seems terribly
 adversely affected by large numbers of branches.)
   * Others?


Assuming I can put lots of resources into a central server with lots of CPU,
RAM, fast SSD, fast networking, what aspects of the repo are most likely to
affect devs' experience?
   * Number of commits
   * Sheer disk space occupied by the repo
   * Number of tags.
   * Number of branches.
   * Binary objects in the repo that cause it to bloat in size [1]
   * Other factors?

Of the various HW items listed above --CPU speed, number of cores, RAM, SSD,
networking-- which is most critical here?

(Stash recommends 1.5 x repo_size x number of concurrent clones of
available RAM.
I assume that is good advice in general.)

Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB repo,
50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 "fix-up
branches" which are just one little dangling commit required to change the code
a little bit between a commit a tag that was not quite made from it.)

While there's lots of information online, much of it is old [3] and with git
constantly evolving I don't know how valid it still is. Then there's anecdotal
evidence that is of questionable value.[2]
Are many/all of the issues Facebook identified [3] resolved? (Yes, I
understand Facebook went with Mercurial. But I imagine the git team nevertheless
took their analysis to heart.)


Thanks,
Steve


[1] (Yes, I'm investigating ways to make our repo not so large etc. That's
beyond the scope of the discussion I'd like to have with this
question. Thanks.)
[2] The large amounts of anecdotal evidence relate to the "why don't you try it
yourself?" response to my question. I will I I have to but setting up a
properly methodical study is time consuming and difficult --I don't want to
produce poor anecdotal numbers that don't really hold up-- and if somebody's
already done the work, then I should leverage it.
[3] http://thread.gmane.org/gmane.comp.version-control.git/189776
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] remote-curl: fall back to Basic auth if Negotiate fails

2015-02-19 Thread brian m. carlson
On Wed, Feb 18, 2015 at 04:17:46PM +, Dan Langille (dalangil) wrote:
> I just built from ‘master’, on FreeBSD 9.3:
> 
> cd ~/src
> git clone https://github.com/git/git.git
> cd git
> gmake
> 
> Then tried ~/src/git/git clone https://OUR_REPO
> 
>  It cores too, and I see: git-remote-https.core

Can you compile with debugging symbols and provide a backtrace?  I'm not 
seeing any such behavior on my end, and I'm not sure whether it's my 
patch or something else that might be present in master.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-19 Thread brian m. carlson
On Tue, Feb 17, 2015 at 09:51:38AM +0100, Matthieu Moy wrote:
> This should be fixable from Git itself, by replacing the calls to
> "unlink" with something like
> 
> int unlink_or_chmod(...) {
>   if (unlink(...)) {
>   chmod(...); // give user write permission
>   return unlink(...);
>   }
> }
> 
> This does not add extra cost in the normal case, and would fix this
> particular issue for afp shares. So, I think that would fix the biggest
> problem for afp-share users without disturbing others. It seems
> reasonable to me to do that unconditionnally.

This can have security issues if you're trying to unlink a symlink, as 
chmod will dereference the symlink but unlink will not.  Giving the file 
owner write permission may not be sufficient, as the user may be a 
member of a group with write access to the repo.  A malicious user who 
also has access to the repo could force the current user to chmod an 
arbitrary file such that it had looser permissions.

I've seen a case where Perl's ExtUtils::MakeMaker chmoded 
/etc/mime.types 0666 as a result of this.

I don't think there's a secure way to implement this unless you're on an 
OS with lchmod or fchmodat that supports AT_SYMLINK_NOFOLLOW.  Linux is 
not one of those systems.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses

2015-02-19 Thread brian m. carlson

On Thu, Feb 19, 2015 at 09:54:52AM -0800, Junio C Hamano wrote:

I can see that you do not agree with the "If we accept it" part
(where "it" refers to "allowing [...] was a bug.")---past acceptance
was not a bug for you.

Brian is for that "If we accept it", and sees it as a bug.

So let's see what he comes up with as a follow-up to the "we should
explicitly document it" part.


Here's what I propose:

-- >8 --
Subject: [PATCH] Documentation: note deprecated syntax for IPv6 SSH URLs

We have historically accepted some invalid syntax for SSH URLs
containing IPv6 literals.  Older versions of Git accepted URLs missing
the brackets required by RFC 2732.  Note that this behavior is
deprecated and that other protocol handlers will not accept this syntax.

Signed-off-by: brian m. carlson 
---
Documentation/urls.txt | 4 
1 file changed, 4 insertions(+)

diff --git a/Documentation/urls.txt b/Documentation/urls.txt
index 9ccb246..2c1a84f 100644
--- a/Documentation/urls.txt
+++ b/Documentation/urls.txt
@@ -38,6 +38,10 @@ The ssh and git protocols additionally support ~username 
expansion:
- git://host.xz{startsb}:port{endsb}/~{startsb}user{endsb}/path/to/repo.git/
- {startsb}user@{endsb}host.xz:/~{startsb}user{endsb}/path/to/repo.git/

+For backwards compatibility reasons, Git, when using ssh URLs, accepts
+some URLs containing IPv6 literals that are missing the brackets. This
+syntax is deprecated, and other protocol handlers do not permit this.
+
For local repositories, also supported by Git natively, the following
syntaxes may be used:

--
2.2.1.209.g41e5f3a
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interested in helping open source friends on HP-UX?

2015-02-19 Thread H.Merijn Brand
On Thu, 19 Feb 2015 14:21:11 +0100, Michael J Gruber
 wrote:

> Jeff, you got it wrong. You should do the hard part and leave the easy
> part to us!
> 
> Thanks anyways, I'll add this to my HP_UX branch.

I did not mention this in earlier mails. When using the HP C-ANSI-C
compiler, MAX_INT is not set.

I had to add
--8<---
#ifndef   SIZE_MAX
#  define SIZE_MAX  (18446744073709551615UL)
/* define SIZE_MAX  (4294967295U) */
#  endif
-->8---

to these files

sha1_file.c
utf8.c
walker.c
wrapper.c

And yes, that could be dynamic and probably be in another header file

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.21   porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/http://www.test-smoke.org/
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/


pgpzl0ap1Jy8W.pgp
Description: OpenPGP digital signature


[PATCH v3] submodule: Improve documentation of update subcommand

2015-02-19 Thread Michal Sojka
The documentation of 'git submodule update' has several problems:

1) It mentions that value 'none' of submodule.$name.update can be
   overridden by --checkout, but other combinations of configuration
   values and command line options are not mentioned.

2) The documentation of submodule.$name.update is scattered across three
   places, which is confusing.

3) The documentation of submodule.$name.update in gitmodules.txt is
   incorrect, because the code always uses the value from .git/config
   and never from .gitmodules.

4) Documentation of --force was incomplete, because it is only effective
   in case of checkout method of update.

This patch fixes all these problems. Now, submodule.$name.update is
fully documented in git-submodule.txt and the other files just refer to
it. This is based on discussion between Junio C Hamano, Jens Lehmann and
myself.

Signed-off-by: Michal Sojka 
---
 Documentation/config.txt| 15 +++
 Documentation/git-submodule.txt | 58 +
 Documentation/gitmodules.txt| 18 +
 3 files changed, 57 insertions(+), 34 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ae6791d..fb2ae37 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2411,12 +2411,17 @@ status.submodulesummary::
 
 submodule..path::
 submodule..url::
+   The path within this project and URL for a submodule. These
+   variables are initially populated by 'git submodule init';
+   edit them to override the URL and other values found in the
+   `.gitmodules` file. See linkgit:git-submodule[1] and
+   linkgit:gitmodules[5] for details.
+
 submodule..update::
-   The path within this project, URL, and the updating strategy
-   for a submodule.  These variables are initially populated
-   by 'git submodule init'; edit them to override the
-   URL and other values found in the `.gitmodules` file.  See
-   linkgit:git-submodule[1] and linkgit:gitmodules[5] for details.
+   The default updating strategy for a submodule. This variable
+   is populated by `git submodule init` from the
+   linkgit:gitmodules[5] file. See description of 'update'
+   command in linkgit:git-submodule[1].
 
 submodule..branch::
The remote branch name for a submodule, used by `git submodule
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 8e6af65..72c6fb2 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -154,14 +154,36 @@ If `--force` is specified, the submodule's work tree will 
be removed even if
 it contains local modifications.
 
 update::
-   Update the registered submodules, i.e. clone missing submodules and
-   checkout the commit specified in the index of the containing repository.
-   This will make the submodules HEAD be detached unless `--rebase` or
-   `--merge` is specified or the key `submodule.$name.update` is set to
-   `rebase`, `merge` or `none`. `none` can be overridden by specifying
-   `--checkout`. Setting the key `submodule.$name.update` to `!command`
-   will cause `command` to be run. `command` can be any arbitrary shell
-   command that takes a single argument, namely the sha1 to update to.
+   Update the registered submodules to match what the superproject
+   expects by cloning missing submodules and updating the working
+   tree of the submodules. The "updating" can be done in several
+   ways depending on command line options and the value of
+   `submodule..update` in .git/config:
+
+   checkout;; the new commit recorded in the superproject will be
+   checked out in the submodule on a detached HEAD. This is
+   done when `--checkout` option is given, or no option is
+   given, and `submodule..update` is unset, or if it is set
+   to 'checkout'.
+
+   rebase;; the current branch of the submodule will be rebased
+   onto the commit recoded in the superproject. This is done
+   when `--rebase` option is given, or no option is given, and
+   `submodule..update` is set to 'rebase'.
+
+   merge;; the commit recorded in the superproject will be merged
+   into the current branch in the submodule. This is done
+   when `--merge` option is given, or no option is given, and
+   `submodule..update` is set to 'merge'.
+
+   custom command;; arbitrary shell command that takes a single
+   argument (the sha1 of the commit recorded in the
+   superproject) is executed. This is done when no option is
+   given, and `submodule..update` has the form of
+   '!command'.
++
+When no option is given and `submodule..update` is set to 'none',
+the submodule is not updated.
 +
 If the submodule is not yet initialized, and you just want to use the
 setting as stored in .gitmodules, you can automatically initializ

Re: Git Feature Request - show current branch

2015-02-19 Thread Junio C Hamano
Michael J Gruber  writes:

> Randall S. Becker venit, vidit, dixit 19.02.2015 14:32:
>> git symbolic-ref --short HEAD
>
> That errors out when HEAD is detached.

Isn't that what you would want to happen anyway?

if current=$(that command)
then
you know $current is checked out
else
you know HEAD is detached
fi

If you used another command that gives either the name of the
current branch or 4-letter H-E-A-D without any other indication, you
cannot tell if you checked out the "HEAD" branch aka refs/heads/HEAD
or you are not on any branch.  The former would happen after doing
this:

$ git update-ref refs/heads/HEAD HEAD
$ git checkout HEAD

Of course, this is not a recommended practice, and "git branch"
these days refuses to create refs/heads/HEAD to discourage you from
doing so by mistake, but there is no guarantee that the repository
whatever script you are writing to work in was created and used by
sane people ;-) so you would want to be defensive, no?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] log --decorate: do not leak "commit" color into the next item

2015-02-19 Thread Junio C Hamano
Jeff King  writes:

> Yeah, I think this is a good fix. I had a vague feeling that we may have
> done this on purpose to let the decoration color "inherit" from the
> existing colors for backwards compatibility, but I don't think that
> could ever have worked (since color.decorate.* never defaulted to
> "normal").

Hmph, but that $gmane/191118 talks about giving bold to commit-color
and then expecting for decors to inherit the boldness, a wish I can
understand.  But I do not necessarily agree with it---it relies on
that after "(" and ", " there is no reset,
which is not how everything else works.

So this change at least needs to come with an explanation to people
who are used to and took advantage of this color attribute leakage,
definitely in the log message and preferrably to the documentation
that covers all the color.*. settings, I think.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses

2015-02-19 Thread Junio C Hamano
Torsten Bögershausen  writes:

> On 02/18/2015 07:40 PM, Junio C Hamano wrote:
>> "brian m. carlson"  writes:
>>
>>> I understand that this used to work, but it probably shouldn't have
>>> ever been accepted.  It's nonstandard, and if we accept it for ssh,
>>> people will want it to work for https, and due to libcurl, it simply
>>> won't.
>>>
>>> I prefer to see our past acceptance of this format as a bug.  This is
>>> the first that I've heard of anyone noticing this (since 2013), so it
>>> can't be in common usage.
>>>
>>> If we accept it, we should explicitly document it as being deprecated
>>> and note that it's inconsistent with the way everything else works.
>> I was reviewing my Undecided pile today, and I think your objection
>> makes sense.
>>
>> Either of you care to update documentation, please, before I drop
>> this series and forget about it?
>
> The URL RFC is much stricter regarding which characters that are allowed
> in which part of the URL, as least as I read it.
> ...
> I'm somewhat unsure what to write in the documentation, I must admit.

I can see that you do not agree with the "If we accept it" part
(where "it" refers to "allowing [...] was a bug.")---past acceptance
was not a bug for you.

Brian is for that "If we accept it", and sees it as a bug.

So let's see what he comes up with as a follow-up to the "we should
explicitly document it" part.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] submodule: Fix documentation of update subcommand

2015-02-19 Thread Michal Sojka
On Thu, Feb 19 2015, Junio C Hamano wrote:
> Michal Sojka  writes:
>
>> The documentation of 'git submodule update' has several problems:
>>
>> 1) It says that submodule.$name.update can be overridden by --checkout
>>only if its value is `none`.
>
> Hmm, I do not read the existing sentence that way, though.  The
> "only if" above is only in your head and not in the documentation,
> no?

Yes, you're right.

> The way I understand it is that the explanation does not even bother
> to say that it is overridable when update is set to something that
> clearly corresponds to --option (e.g. 'update=rebase' is for people
> too lazy to type --rebase from the command line), but because it is
> unclear when it is set to 'update=none', it specifically singles out
> that case.

I updated the commit message a bit.

>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index ae6791d..f30cbbc 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -2411,12 +2411,29 @@ status.submodulesummary::
>>
>>  submodule..path::
>>  submodule..url::
>> +The path within this project and URL for a submodule. These
>> +variables are initially populated by 'git submodule init';
>> +edit them to override the URL and other values found in the
>> +`.gitmodules` file. See linkgit:git-submodule[1] and
>> +linkgit:gitmodules[5] for details.
>> +
>
> OK.
>
>>  submodule..update::
>> -The path within this project, URL, and the updating strategy
>> -for a submodule.  These variables are initially populated
>> -by 'git submodule init'; edit them to override the
>> -URL and other values found in the `.gitmodules` file.  See
>> -linkgit:git-submodule[1] and linkgit:gitmodules[5] for details.
>> +The default updating strategy for a submodule, used by `git
>> +submodule update`. This variable is populated by `git
>> +submodule init` from linkgit:gitmodules[5].
>> +
>> +If the value is 'checkout' (the default), the new commit
>> +specified in the superproject will be checked out in the
>
> Have you formatted this?  I _think_ this change would break the
> typesetting by having an empty line there.

Right. I need to add a '+' and deindent.

>> +submodule on a detached HEAD.
>> +If 'rebase', the current branch of the submodule will be
>> +rebased onto the commit specified in the superproject.
>> +If 'merge', the commit specified in the superproject will be
>> +merged into the current branch in the submodule. If 'none',
>> +the submodule with name `$name` will not be updated by
>> +default.
>> +If the value is of form '!command', it will cause `command` to
>> +be run. `command` can be any arbitrary shell command that
>> +takes a single argument, namely the sha1 to update to.
>
> I have a feeling that it is better to leave the explanations of
> these values in git-submodule.txt (i.e. where you took the above
> text from) and say "see description of 'update' command in
> linkgit:git-submodule[1]" here to avoid duplication.

OK

>>  submodule..branch::
>>  The remote branch name for a submodule, used by `git submodule
>> diff --git a/Documentation/git-submodule.txt 
>> b/Documentation/git-submodule.txt
>> index 8e6af65..c92908e 100644
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -154,14 +154,13 @@ If `--force` is specified, the submodule's work tree 
>> will be removed even if
>>  it contains local modifications.
>>
>>  update::
>> -Update the registered submodules, i.e. clone missing submodules and
>> -checkout the commit specified in the index of the containing repository.
>> -This will make the submodules HEAD be detached unless `--rebase` or
>> -`--merge` is specified or the key `submodule.$name.update` is set to
>> -`rebase`, `merge` or `none`. `none` can be overridden by specifying
>> -`--checkout`. Setting the key `submodule.$name.update` to `!command`
>> -will cause `command` to be run. `command` can be any arbitrary shell
>> -command that takes a single argument, namely the sha1 to update to.
>> +Update the registered submodules to match what the superproject
>> +expects by cloning missing submodules and updating the working
>> +tree of the submodules
>
> This part is better than the original.

Indeed. You wrote this in a previous email :)

>>  The "updating" can take various forms
>> +and can be configured in .git/config by the
>> +`submodule.$name.update` key or by explicitely giving one of
>> +'--checkout' (the default), '--merge' or '--rebase' options. See
>> +linkgit:git-config[1] for details.
>
> Because submodule..update is interesting only to those who run
> "git submodule update", and also the command line options that
> interact with the setting are only described here not in config.txt,
> I think it is better to have the description of various modes here.
>
> And the description, if it i

Re: [RFD/PATCH] stash: introduce checkpoint mode

2015-02-19 Thread Junio C Hamano
"Kyle J. McKay"  writes:

> What about a shortcut to "reset-and-apply" as well?
>
> I have often been frustrated when "git stash apply" refuses to work
> because I have changes that would be stepped on and there's no --force
> option like git checkout has.  I end up doing a reset just so I can
> run stash apply.

Doesn't that cut both ways, though?

A single step short-cut, done in any way other than a more explicit
way such as "git reset --hard && git stash apply" (e.g. "git stash
reset-and-apply" or "git stash apply --force") that makes it crystal
clear that the user _is_ discarding, has a risk of encouraging users
to form a dangerous habit of invoking the short-cut without thinking
and leading to "oops, I didn't mean that!".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] connect.c: Improve parsing of literal IPV6 addresses

2015-02-19 Thread Torsten Bögershausen
On 02/18/2015 07:40 PM, Junio C Hamano wrote:
> "brian m. carlson"  writes:
>
>> On Thu, Jan 22, 2015 at 11:05:29PM +0100, Torsten Bögershausen wrote:
>>> We want to support ssh://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/
>>>   because e.g. the Git shipped with Debian (1.7.10.4) (and a lot of
>>> other installations) supports it.
>> I understand that this used to work, but it probably shouldn't have
>> ever been accepted.  It's nonstandard, and if we accept it for ssh,
>> people will want it to work for https, and due to libcurl, it simply
>> won't.
>>
>> I prefer to see our past acceptance of this format as a bug.  This is
>> the first that I've heard of anyone noticing this (since 2013), so it
>> can't be in common usage.
>>
>> If we accept it, we should explicitly document it as being deprecated
>> and note that it's inconsistent with the way everything else works.
> I was reviewing my Undecided pile today, and I think your objection
> makes sense.
>
> Either of you care to update documentation, please, before I drop
> this series and forget about it?
The URL RFC is much stricter regarding which characters that are allowed
in which part of the URL, as least as I read it.

The "problem" started when /usr/bin/ssh excepted things like
/usr/bin/ssh fe80:x:y:z%eth0 and Git simply passed the hostname
to ssh.

And when the [] was there, it was stripped because ssh doesn't like them.
URLs like

ssh://bmc@2001:470:1f05:79::1/git/bmc/homedir.git/

simply worked, and nobody ever complained about this,
(until now),  Git never rejected IPV6 URLs without [], please correct me if
I'm wrong.

Git never cared about the exact URL, so that IPV6 URL's without [] where allowed
from "day one".

On top of that, we support the short form,
user@host:~ or other variants.
But we never claimed to be compatible to RFC 1738, even if it makes sense to do 
so.

What exactly should we write in the documentation ?

Git supports RFC1738 (but is not as strict in parsing the URL, because
we assume that the host name resolver will do some checking for us.

Git currently does not support user@[fe80::x:y:z], even if RFC suggests it

Git never claimed to be 100% compatible to RFC 1738, and will
probably never be, (as we have old code that is as it is).

We (at least I) don't want to break existing repos, rejecting URL's that had 
been
working before and stopped working because the Git version is updated or so)

This patch series is attempting to be backwards compatible to what
old, older. and oldest versions of Git accepted.

At the price that we accept URL's which do not conform to the RFC are accepted.
It fixes the long standing issue that user@[fe80:] did not work.

I'm somewhat unsure what to write in the documentation, I must admit.

Unfortunately URL parsing is a tricky thing, this patch tries to do 
improvements.
Especially it adds test cases, which are good to prevent further breakage.
 
Updating the documentation was never part of the patch series,
and if the documentation is updated, this is done in a separate commit anyway.

How much does this series qualify for the "we didn't update the docs",
but fixed the code, let's drop it ?
 


 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Feature Request - show current branch

2015-02-19 Thread Michael J Gruber
Randall S. Becker venit, vidit, dixit 19.02.2015 14:32:
> git symbolic-ref --short HEAD

That errors out when HEAD is detached.

git rev-parse --symbolic-full-name [--abbrev-ref] HEAD

returns the branch name or HEAD. Though it's a bit difficult to discover.

I guess git 3.0 will have "git branch" and "git branches" :)

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Strange reachability inconsistency (apparently, at least...)

2015-02-19 Thread ydirson
I have a (fsck-clean) git tree in which for 2 commits A and B:

* "git merge-base --is-ancestor A B" returns 0
* "git log B..A" returns a non-empty set of commits

I get this behaviour with 2.3.0 as well as with 2.1.3 and 1.7.12.

Is that a real bug or am I just misinterpreting something ?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git blame swallows up lines in case of mixed line endings

2015-02-19 Thread Sokolov, Konstantin (ext)
Hi Folks,

I encounter unexpected behavior in the following case:

file content:

line1
line2
line3
line4

This is what I get as console output (on Windows):

> git blame -s file.txt
7db36436 1) line1
line3436 2) line2
7db36436 3) line4

This is the real content:

> git blame -s file.txt > blame.txt

blame.txt opened in Notepad++:

7db36436 1) line1 
7db36436 2) line2 
line3 
7db36436 3) line4 

Admittedly, very stupid editors, such as Windows Notepad, cannot handle mixed 
line endings as well. But is this also the way git blame should behave?

Kind regards
Konstantin





--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD/PATCH] stash: introduce checkpoint mode

2015-02-19 Thread Kyle J. McKay

On Feb 19, 2015, at 04:34, Michael J Gruber wrote:


"git stash save" performs the steps "create-store-reset". Often,
users try to use "stash save" as a way to to save their current state
(index, worktree) before an operation like "checkout/reset --patch"  
they
don't feel confident about, and are forced to do "git stash save &&  
git

stash apply".

Provide an extra mode that does "create-store" only without the reset,
so that one can "ceckpoint" the sate and keep working on it.


s/sate/state/


Suggested-by: "Kyle J. McKay" 
Signed-off-by: Michael J Gruber 
---

Notes:
   I'm not sure about how to best expose this mode:

   git stash checkpoint
   git stash save --checkpoint

   Maybe it is best to document the former and rename "--checkpoint"
   to "--no-reset"?


Once the user figures out that "save" is really "save-and-reset" I  
think "--no-reset" makes more sense.


It certainly seems more discoverable via an explicit "checkpoint"  
command though, but that's really just an alias so maybe it's better  
left up to the user to make one.


There would need to be some updated docs (git-stash.txt) to go with  
the change...



   Also, a "safe return" to a checkpoint probably requires

   git reset --hard && git stash pop

   although "git stash pop" will do in many cases. Should we provide  
a shortcut

   "restore" which does the reset-and-pop?


What about a shortcut to "reset-and-apply" as well?

I have often been frustrated when "git stash apply" refuses to work  
because I have changes that would be stepped on and there's no --force  
option like git checkout has.  I end up doing a reset just so I can  
run stash apply.


What about if git stash apply/pop grokked a --force option?  That  
would seem to eliminate the need for a "reset-and-pop"/"reset-and- 
apply" shortcut while also being useful to non-checkpoint stashes as  
well.



git-stash.sh | 13 +
1 file changed, 13 insertions(+)

diff --git a/git-stash.sh b/git-stash.sh
index d4cf818..42f140c 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -193,12 +193,16 @@ store_stash () {
}

save_stash () {
+   checkpoint=
keep_index=
patch_mode=
untracked=
while test $# != 0
do
case "$1" in
+   -c|--checkpoint)
+   checkpoint=t
+   ;;
-k|--keep-index)
keep_index=t
;;
@@ -267,6 +271,11 @@ save_stash () {
die "$(gettext "Cannot save the current status")"
say Saved working directory and index state "$stash_msg"

+   if test -n "$checkpoint"
+   then
+   exit 0
+   fi
+
if test -z "$patch_mode"
then
git reset --hard ${GIT_QUIET:+-q}
@@ -576,6 +585,10 @@ save)
shift
save_stash "$@"
;;
+checkpoint)
+   shift
+   save_stash "--checkpoint" "$@"
+   ;;
apply)
shift
apply_stash "$@"
--


Otherwise this looks good.  A very small change to add the  
functionality.


-Kyle
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Git Feature Request - show current branch

2015-02-19 Thread Randall S. Becker
Hi Martin,

I use:

git symbolic-ref --short HEAD

in scripts. Not sure it's the best way, but it works 100% for me.

Regards,
Randall

-Original Message-
From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On Behalf Of 
mdc...@seznam.cz
Sent: February 19, 2015 8:15 AM
To: git@vger.kernel.org
Subject: Git Feature Request - show current branch

Hello,

To start with, I did not find an official way to submit feature request so 
hopefully this is the right way to do so - if not then my apologize & 
appreciate if somebody could re-submit to the proper place.

I'd like to request adding a parameter to 'git branch' that would only show the 
current branch (w/o the star) - i.e. the outcome should only be the name of the 
branch that is normally marked with the star when I do 'git branch' command. 
This may be very helpful in some external scripts that just simply need to know 
the name of the current branch. I know there are multiple ways to do this today 
(some described here: 
http://stackoverflow.com/questions/6245570/how-to-get-current-branch-name-in-git)
 but I really think that adding simple argument to 'git branch' would be very 
useful instead of forcing people to use 'workarounds'.

My suggestion is is to name the parameter '--current' or '--show-current'.
Example:

Command: git branch
Outcome:
 branchA
 branchB
* master

Command: git branch --current
Outcome:
master

Thank you,
Martin
--
To unsubscribe from this list: send the line "unsubscribe git" in the body of a 
message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interested in helping open source friends on HP-UX?

2015-02-19 Thread Michael J Gruber
Jeff King venit, vidit, dixit 19.02.2015 13:54:
> On Thu, Feb 19, 2015 at 12:20:02PM +0100, Michael J Gruber wrote:
> 
>> OK, so we should use NO_ICONV on HP_UX then.
>>
 Failing so many tests with NO_ICONV is certainly not ideal, but I'm not
 sure we should care to protect so many tests with a prerequisite.
>>>
>>> How feasible is it to isolate those tests into separate test files that
>>> people that know to not use e.g. Asian can safely ignore them?
>>
>> We have the prerequisite mechanism for that, and most probably, the
>> tests are "isolated" already, in the sense that with NO_ICONV, only
>> trivial setup tests succeed for those test files but all "proper" tests
>> fail. But I'll check. Need a good test to set the prerequisite, though.
> 
> I took a first pass at this. The results are below (and I am hoping one
> of you can use it as a base to build on, as I do not want to commit to
> doing the second half, as you will see :) ).
> 
> It passes NO_ICONV through to the test suite, sets up a prerequisite,
> disables some test scripts which are purely about i18n (e.g.,
> t3900-i18n-commit), and marks some of the scripts with one-off tests
> using the ICONV prereq.

Hmm. I know we pass other stuff down, but is this really a good idea? It
relies on the fact that the git that we test was built with the options
from there. This assumptions breaks (with) GIT_TEST_INSTALLED, if not more.

Basically, it may break as soon as we run the tests by other means than
"make", which is quite customary if you run single tests.

(And we do pass config.mak down, me thinks, but NO_ICONV may come from
the command line.)

> Note that it also has some code changes around reencode_string_len.
> These aren't strictly necessary, but they silence gcc warnings when
> compiled with NO_ICONV. In that case we do:
> 
>   #define reencode_string_len(a,b,c,d,e) NULL
> 
> but "e" is an out-parameter. We don't promise it is valid if the
> function returns NULL (which it does here). I'm kind of surprised the
> compiler doesn't realize that:
> 
>   foo = reencode_string_len(...);
>   if (foo)
>   bar();
> 
> is dead code, since the first line becomes "foo = NULL". So that's
> optional.
> 
> So, on to the tricky parts. Here are the failures that remain:
> 
>   1. The script builds up a commit history through the script, and later
>  tests depend on this for things like commit timestamps or the exact
>  shape of history. t9350 is an example of this (it has one failing
>  test which can be marked, but then other tests later fail in
>  confusing ways).
> 
>   2. The script creates commits with encoded commit messages, then uses
>  those both for cases that care about the encoding, and those that
>  do not. t4041 is an example here. I think it would be best to use
>  vanilla commit mesages for the main body of tests, and then
>  explicitly test the encoding-related features separately. I think
>  t4205 and t6006 are in this boat, too.
> 
> I also tested this on a system with a working "iconv". If we are
> building with NO_ICONV, I am tempted to say that there should be no need
> to run the "iconv" command-line program at all. But t6006, for example,
> does it a lot outside of any test_expect_*. Probably it should be:
> 
>   test_lazy_prereq ICONV '
>   test -z "$NO_ICONV" &&
>   utf8_o=$(printf "\303\263") &&
>   latin1_o=$(printf "\363") &&
>   test "$(echo $utf8_o | iconv -f UTF-8 -t ISO-8559-1)" = "$latin1_o"
>   '
> 
> or something, and all of that setup should be wrapped in a
> "test_expect_success ICONV ...". Of course that is the easy part. The
> hard part is splitting the ICONV setup from the vanilla commit setup so
> that the other tests can run.

Jeff, you got it wrong. You should do the hard part and leave the easy
part to us!

Thanks anyways, I'll add this to my HP_UX branch.

> ---
> diff --git a/Makefile b/Makefile
> index e8ce649..c460ce8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2112,6 +2112,7 @@ endif
>  ifdef GIT_TEST_CMP_USE_COPIED_CONTEXT
>   @echo GIT_TEST_CMP_USE_COPIED_CONTEXT=YesPlease >>$@
>  endif
> + @echo NO_ICONV=\''$(subst ','\'',$(subst ','\'',$(NO_ICONV)))'\' >>$@
>   @echo NO_GETTEXT=\''$(subst ','\'',$(subst ','\'',$(NO_GETTEXT)))'\' 
> >>$@
>   @echo GETTEXT_POISON=\''$(subst ','\'',$(subst 
> ','\'',$(GETTEXT_POISON)))'\' >>$@
>  ifdef GIT_PERF_REPEAT_COUNT
> diff --git a/pretty.c b/pretty.c
> index 9d34d02..74fe5fb 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1497,7 +1497,7 @@ void format_commit_message(const struct commit *commit,
>   }
>  
>   if (output_enc) {
> - int outsz;
> + int outsz = 0;
>   char *out = reencode_string_len(sb->buf, sb->len,
>   output_enc, utf8, &outsz);
>   if (out)
> diff --git a/strbuf.c b/strbuf.c
> index 88cafd4..6d8ad4b 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -94,7 +94,7 @@ void strbuf_ltr

Git Feature Request - show current branch

2015-02-19 Thread mdconf
Hello,

To start with, I did not find an official way to submit feature request so 
hopefully this is the right way to do so - if not then my apologize & 
appreciate if somebody could re-submit to the proper place.

I'd like to request adding a parameter to 'git branch' that would only show the 
current branch (w/o the star) - i.e. the outcome should only be the name of the 
branch that is normally marked with the star when I do 'git branch' command. 
This may be very helpful in some external scripts that just simply need to know 
the name of the current branch. I know there are multiple ways to do this today 
(some described here: 
http://stackoverflow.com/questions/6245570/how-to-get-current-branch-name-in-git)
 but I really think that adding simple argument to 'git branch' would be very 
useful instead of forcing people to use 'workarounds'.

My suggestion is is to name the parameter '--current' or '--show-current'.
Example:

Command: git branch
Outcome:
 branchA
 branchB
* master

Command: git branch --current
Outcome:
master

Thank you,
Martin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interested in helping open source friends on HP-UX?

2015-02-19 Thread Jeff King
On Thu, Feb 19, 2015 at 12:20:02PM +0100, Michael J Gruber wrote:

> OK, so we should use NO_ICONV on HP_UX then.
> 
> >> Failing so many tests with NO_ICONV is certainly not ideal, but I'm not
> >> sure we should care to protect so many tests with a prerequisite.
> > 
> > How feasible is it to isolate those tests into separate test files that
> > people that know to not use e.g. Asian can safely ignore them?
> 
> We have the prerequisite mechanism for that, and most probably, the
> tests are "isolated" already, in the sense that with NO_ICONV, only
> trivial setup tests succeed for those test files but all "proper" tests
> fail. But I'll check. Need a good test to set the prerequisite, though.

I took a first pass at this. The results are below (and I am hoping one
of you can use it as a base to build on, as I do not want to commit to
doing the second half, as you will see :) ).

It passes NO_ICONV through to the test suite, sets up a prerequisite,
disables some test scripts which are purely about i18n (e.g.,
t3900-i18n-commit), and marks some of the scripts with one-off tests
using the ICONV prereq.

Note that it also has some code changes around reencode_string_len.
These aren't strictly necessary, but they silence gcc warnings when
compiled with NO_ICONV. In that case we do:

  #define reencode_string_len(a,b,c,d,e) NULL

but "e" is an out-parameter. We don't promise it is valid if the
function returns NULL (which it does here). I'm kind of surprised the
compiler doesn't realize that:

  foo = reencode_string_len(...);
  if (foo)
bar();

is dead code, since the first line becomes "foo = NULL". So that's
optional.

So, on to the tricky parts. Here are the failures that remain:

  1. The script builds up a commit history through the script, and later
 tests depend on this for things like commit timestamps or the exact
 shape of history. t9350 is an example of this (it has one failing
 test which can be marked, but then other tests later fail in
 confusing ways).

  2. The script creates commits with encoded commit messages, then uses
 those both for cases that care about the encoding, and those that
 do not. t4041 is an example here. I think it would be best to use
 vanilla commit mesages for the main body of tests, and then
 explicitly test the encoding-related features separately. I think
 t4205 and t6006 are in this boat, too.

I also tested this on a system with a working "iconv". If we are
building with NO_ICONV, I am tempted to say that there should be no need
to run the "iconv" command-line program at all. But t6006, for example,
does it a lot outside of any test_expect_*. Probably it should be:

  test_lazy_prereq ICONV '
test -z "$NO_ICONV" &&
utf8_o=$(printf "\303\263") &&
latin1_o=$(printf "\363") &&
test "$(echo $utf8_o | iconv -f UTF-8 -t ISO-8559-1)" = "$latin1_o"
  '

or something, and all of that setup should be wrapped in a
"test_expect_success ICONV ...". Of course that is the easy part. The
hard part is splitting the ICONV setup from the vanilla commit setup so
that the other tests can run.

---
diff --git a/Makefile b/Makefile
index e8ce649..c460ce8 100644
--- a/Makefile
+++ b/Makefile
@@ -2112,6 +2112,7 @@ endif
 ifdef GIT_TEST_CMP_USE_COPIED_CONTEXT
@echo GIT_TEST_CMP_USE_COPIED_CONTEXT=YesPlease >>$@
 endif
+   @echo NO_ICONV=\''$(subst ','\'',$(subst ','\'',$(NO_ICONV)))'\' >>$@
@echo NO_GETTEXT=\''$(subst ','\'',$(subst ','\'',$(NO_GETTEXT)))'\' 
>>$@
@echo GETTEXT_POISON=\''$(subst ','\'',$(subst 
','\'',$(GETTEXT_POISON)))'\' >>$@
 ifdef GIT_PERF_REPEAT_COUNT
diff --git a/pretty.c b/pretty.c
index 9d34d02..74fe5fb 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1497,7 +1497,7 @@ void format_commit_message(const struct commit *commit,
}
 
if (output_enc) {
-   int outsz;
+   int outsz = 0;
char *out = reencode_string_len(sb->buf, sb->len,
output_enc, utf8, &outsz);
if (out)
diff --git a/strbuf.c b/strbuf.c
index 88cafd4..6d8ad4b 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -94,7 +94,7 @@ void strbuf_ltrim(struct strbuf *sb)
 int strbuf_reencode(struct strbuf *sb, const char *from, const char *to)
 {
char *out;
-   int len;
+   int len = 0;
 
if (same_encoding(from, to))
return 0;
diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh
index 4bf1dbe..d522677 100755
--- a/t/t3900-i18n-commit.sh
+++ b/t/t3900-i18n-commit.sh
@@ -7,6 +7,11 @@ test_description='commit and log output encodings'
 
 . ./test-lib.sh
 
+if ! test_have_prereq ICONV; then
+   skip_all='skipping i18n tests, iconv not available'
+   test_done
+fi
+
 compare_with () {
git show -s $1 | sed -e '1,/^$/d' -e 's/^//' >current &&
case "$3" in
diff --git a/t/t3901-i18n-patch.sh b/t/t3901-i18n-patch.sh
index a392f3d..c4f9d06 100755
--- a/t

[RFD/PATCH] stash: introduce checkpoint mode

2015-02-19 Thread Michael J Gruber
"git stash save" performs the steps "create-store-reset". Often,
users try to use "stash save" as a way to to save their current state
(index, worktree) before an operation like "checkout/reset --patch" they
don't feel confident about, and are forced to do "git stash save && git
stash apply".

Provide an extra mode that does "create-store" only without the reset,
so that one can "ceckpoint" the sate and keep working on it.

Suggested-by: "Kyle J. McKay" 
Signed-off-by: Michael J Gruber 
---

Notes:
I'm not sure about how to best expose this mode:

git stash checkpoint
git stash save --checkpoint

Maybe it is best to document the former and rename "--checkpoint"
to "--no-reset"?

Also, a "safe return" to a checkpoint probably requires

git reset --hard && git stash pop

although "git stash pop" will do in many cases. Should we provide a shortcut
"restore" which does the reset-and-pop?

 git-stash.sh | 13 +
 1 file changed, 13 insertions(+)

diff --git a/git-stash.sh b/git-stash.sh
index d4cf818..42f140c 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -193,12 +193,16 @@ store_stash () {
 }
 
 save_stash () {
+   checkpoint=
keep_index=
patch_mode=
untracked=
while test $# != 0
do
case "$1" in
+   -c|--checkpoint)
+   checkpoint=t
+   ;;
-k|--keep-index)
keep_index=t
;;
@@ -267,6 +271,11 @@ save_stash () {
die "$(gettext "Cannot save the current status")"
say Saved working directory and index state "$stash_msg"
 
+   if test -n "$checkpoint"
+   then
+   exit 0
+   fi
+
if test -z "$patch_mode"
then
git reset --hard ${GIT_QUIET:+-q}
@@ -576,6 +585,10 @@ save)
shift
save_stash "$@"
;;
+checkpoint)
+   shift
+   save_stash "--checkpoint" "$@"
+   ;;
 apply)
shift
apply_stash "$@"
-- 
2.3.0.191.ge77e8b9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interested in helping open source friends on HP-UX?

2015-02-19 Thread Michael J Gruber
H.Merijn Brand venit, vidit, dixit 19.02.2015 12:14:
> On Thu, 19 Feb 2015 11:33:01 +0100, Michael J Gruber
>  wrote:
> 
>> Jeff King venit, vidit, dixit 18.02.2015 19:57:
>>> On Wed, Feb 18, 2015 at 10:47:16AM -0800, Junio C Hamano wrote:
>>>
> It seems like we could use
>
>   (cd src && tar cf - .) | (cd dst && tar xf -)
>
> here as a more portable alternative. I don't think we can rely on rsync
> being everywhere.

 Thanks; I wasn't even aware that we used rsync in our tests.  We
 certainly do not want to rely on it.
>>>
>>> I don't think we do.
>>>
>>> Grepping for rsync in t/, it is mentioned in three places:
>>>
>>>   1. In t1509, we use it, but that test script does not run unless you
>>>  set a bunch of environment variables to enable it.
>>>
>>>   2. In a sample patch for t4100. Obviously this one doesn't execute. :)
>>>
>>>   3. In t5500, to test "rsync:" protocol supported. This is behind a
>>>  check that we can run rsync at all (though it does not properly use
>>>  prereqs or use the normal "skip" procedure).
>>>
 Why not "cp -r src dst", though?
>>>
>>> I was assuming that the "-P" in the original had some purpose. My "cp
>>> -r" does not seem to dereference symlinks, but maybe there is something
>>> I am missing.
>>>
>>> -Peff
>>
>> There's a symlink in sub that needs to be preserved.
>>
>> I'm cooking up a mini-series covering tar/cp -P so far and hopefully the
>> JP encodings later. Do I understand correctly that for Merijin's use
> 
> Merijn, no second j. You can also call me Tux, as that is what the perl
> people do just because of that :)
> 
>> case on HP-UX, we want
>>
>> - as few extra tools (GNU...) as possible for the run time git
>> - may get a few more tools installed to run the test
> 
> You can require as many GNU tools for testing as you like: I'll install
> them. I just need to be sure they are not required runtime. (tar, cp)
> 
>> I still don't have a clear picture of the iconv situation: Does your
>> iconv library require OLD_ICONV to compile?
> 
> No
> 
>> Is there a reason you want to disable it?
> 
> Yes, if I build a package/depot, and the package depends on iconv, it
> is highly likely to fail on the client side after installation, as I do
> not control the version of iconv/libiconv installed.
> 
> As HP does not have libiconv installed by default, I have experienced
> many tools to be unusable after installation because of that dependency.
> 
> Another reason is that I built 64bitall, as my CURl and SSL environment
> is 64bitall for every other project on these systems (including Oracle
> related, which *only* ships 64bit objects on HP-UX) and the OpenSource
> repos for HP-UX only ship 32bit software (sad, but true). That implies
> that I cannot require libiconv.so to be present on the client side.
> 
> I'd like my git to be as standalone as possible

OK, so we should use NO_ICONV on HP_UX then.

>> Failing so many tests with NO_ICONV is certainly not ideal, but I'm not
>> sure we should care to protect so many tests with a prerequisite.
> 
> How feasible is it to isolate those tests into separate test files that
> people that know to not use e.g. Asian can safely ignore them?
> 
>> Michael

We have the prerequisite mechanism for that, and most probably, the
tests are "isolated" already, in the sense that with NO_ICONV, only
trivial setup tests succeed for those test files but all "proper" tests
fail. But I'll check. Need a good test to set the prerequisite, though.

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interested in helping open source friends on HP-UX?

2015-02-19 Thread H.Merijn Brand
On Thu, 19 Feb 2015 11:33:01 +0100, Michael J Gruber
 wrote:

> Jeff King venit, vidit, dixit 18.02.2015 19:57:
> > On Wed, Feb 18, 2015 at 10:47:16AM -0800, Junio C Hamano wrote:
> > 
> >>> It seems like we could use
> >>>
> >>>   (cd src && tar cf - .) | (cd dst && tar xf -)
> >>>
> >>> here as a more portable alternative. I don't think we can rely on rsync
> >>> being everywhere.
> >>
> >> Thanks; I wasn't even aware that we used rsync in our tests.  We
> >> certainly do not want to rely on it.
> > 
> > I don't think we do.
> > 
> > Grepping for rsync in t/, it is mentioned in three places:
> > 
> >   1. In t1509, we use it, but that test script does not run unless you
> >  set a bunch of environment variables to enable it.
> > 
> >   2. In a sample patch for t4100. Obviously this one doesn't execute. :)
> > 
> >   3. In t5500, to test "rsync:" protocol supported. This is behind a
> >  check that we can run rsync at all (though it does not properly use
> >  prereqs or use the normal "skip" procedure).
> > 
> >> Why not "cp -r src dst", though?
> > 
> > I was assuming that the "-P" in the original had some purpose. My "cp
> > -r" does not seem to dereference symlinks, but maybe there is something
> > I am missing.
> > 
> > -Peff
> 
> There's a symlink in sub that needs to be preserved.
> 
> I'm cooking up a mini-series covering tar/cp -P so far and hopefully the
> JP encodings later. Do I understand correctly that for Merijin's use

Merijn, no second j. You can also call me Tux, as that is what the perl
people do just because of that :)

> case on HP-UX, we want
> 
> - as few extra tools (GNU...) as possible for the run time git
> - may get a few more tools installed to run the test

You can require as many GNU tools for testing as you like: I'll install
them. I just need to be sure they are not required runtime. (tar, cp)

> I still don't have a clear picture of the iconv situation: Does your
> iconv library require OLD_ICONV to compile?

No

> Is there a reason you want to disable it?

Yes, if I build a package/depot, and the package depends on iconv, it
is highly likely to fail on the client side after installation, as I do
not control the version of iconv/libiconv installed.

As HP does not have libiconv installed by default, I have experienced
many tools to be unusable after installation because of that dependency.

Another reason is that I built 64bitall, as my CURl and SSL environment
is 64bitall for every other project on these systems (including Oracle
related, which *only* ships 64bit objects on HP-UX) and the OpenSource
repos for HP-UX only ship 32bit software (sad, but true). That implies
that I cannot require libiconv.so to be present on the client side.

I'd like my git to be as standalone as possible

> Failing so many tests with NO_ICONV is certainly not ideal, but I'm not
> sure we should care to protect so many tests with a prerequisite.

How feasible is it to isolate those tests into separate test files that
people that know to not use e.g. Asian can safely ignore them?

> Michael

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.21   porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/http://www.test-smoke.org/
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/


pgpKDoBd2cl5q.pgp
Description: OpenPGP digital signature


Re: Should "git log --decorate" indicate whether the HEAD is detached?

2015-02-19 Thread Julien Cretel
On Wed, Feb 18, 2015 at 5:07 PM, Junio C Hamano  wrote:

> Julien's "HEAD=master, other" vs "HEAD, master, other" may be
> subdued enough to be undistracting, I would guess.  I do not think
> the distinction between "HEAD = master" and "HEAD -> master" would
> be useful, on the other hand.

Just to clarify, I suggested these two notations as alternatives for
denoting the same state: "HEAD is attached to master". They were not
meant to denote different states. Accordingly, a detached HEAD could
be denoted by "HEAD, master, other" (i.e. the same as the current
output of "git log --decorate").
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experience with Recovering From User Error (And suggestions for improvements)

2015-02-19 Thread Michael J Gruber
Kyle J. McKay venit, vidit, dixit 19.02.2015 02:17:
> On Feb 18, 2015, at 01:46, Michael J Gruber wrote:
>> Armin Ronacher venit, vidit, dixit 16.02.2015 14:29:
>>> Hi,
>>>
>>> On 16/02/15 13:09, Ævar Arnfjörð Bjarmason wrote:
 We should definitely make recovery like this harder, but is there a
 reason for why you don't use "git reset --keep" instead of --hard?
>>> This was only the second time in years of git usage that the reset  
>>> was
>>> incorrectly done.  I suppose at this point I might try to retrain my
>>> muscle memory to type something else :)
>>>
 If we created such hooks for "git reset --hard" we'd just need to
 expose some other thing as that low-level operation (and break  
 scripts
 that already rely on it doing the minimal "yes I want to change the
 tree no matter what" thing), and then we'd just be back to square  
 one
 in a few years when users started using "git reset --really- 
 hard" (or
 whatever the flag would be).
>>> I don't think that's necessary, I don't think it would make the
>>> operation much slower to just make a dangling commit and write out  
>>> a few
>>> blobs.  The garbage collect will soon enough take care of that data
>>> anyways.  But I guess that would need testing on large trees to see  
>>> how
>>> bad that goes.
>>>
>>> I might look into the git undo thing that was mentioned.
>>>
>>> Regards,
>>> Armin
>>>
>>
>> Are you concerned about the index only, not unstaged worktree changes?
>>
>> In this case, keeping a reflog for the index may help, and it would
>> somehow fit into the overall concept.
> 
> There was this concept of a git stash checkpoint to save work in  
> progress without creating a normal commit that I read about some time  
> ago (blog? Git book? -- don't recall) that was basically just this:
> 
>git stash save
>git stash apply
> 
> The problem with that is that it touches the working tree and can  
> trigger rebuilds etc.  However, when I ran across the undocumented  
> "git stash create" command I was able to write a simple git-checkpoint  
> script [1] that creates a new stash entry without touching the index  
> or working tree which I find quite handy from time to time.

I think that would make for a nice additional command/mode that we could
support for git-stash.sh. Alle the pieces are there.

> So I think that what Armin originally asked for (create a dangling  
> commit of changes before reset --hard) could be accomplished simply by  
> running:
> 
>git checkpoint && git stash drop
> 
>> Otherwise, we would basically need a full stash before a hard reset.
>> That's not the first time where we could need a distinction between
>> "command run by user" and "command run by script". For the former, we
>> could allow overriding default options, re-aliasing internal commands,
>> adding expensive safety hooks. For the latter we can't.
>>
>> It's just that we don't have such a concept yet (other than checking  
>> tty).
> 
> But of course plugging that into git reset somehow is indeed the  
> problem since you cannot alias/redefine git commands.
> 
> -Kyle
> 
> [1] https://gist.github.com/mackyle/83b1ba13e263356bdab0

Also, "git stash create" does the tree creation and object creation that
we wanted to avoid at least for scripts.

And "git reset --hard-but-safe" suffers from the user education problems
that have been mentioned already.

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interested in helping open source friends on HP-UX?

2015-02-19 Thread Michael J Gruber
Jeff King venit, vidit, dixit 18.02.2015 19:57:
> On Wed, Feb 18, 2015 at 10:47:16AM -0800, Junio C Hamano wrote:
> 
>>> It seems like we could use
>>>
>>>   (cd src && tar cf - .) | (cd dst && tar xf -)
>>>
>>> here as a more portable alternative. I don't think we can rely on rsync
>>> being everywhere.
>>
>> Thanks; I wasn't even aware that we used rsync in our tests.  We
>> certainly do not want to rely on it.
> 
> I don't think we do.
> 
> Grepping for rsync in t/, it is mentioned in three places:
> 
>   1. In t1509, we use it, but that test script does not run unless you
>  set a bunch of environment variables to enable it.
> 
>   2. In a sample patch for t4100. Obviously this one doesn't execute. :)
> 
>   3. In t5500, to test "rsync:" protocol supported. This is behind a
>  check that we can run rsync at all (though it does not properly use
>  prereqs or use the normal "skip" procedure).
> 
>> Why not "cp -r src dst", though?
> 
> I was assuming that the "-P" in the original had some purpose. My "cp
> -r" does not seem to dereference symlinks, but maybe there is something
> I am missing.
> 
> -Peff

There's a symlink in sub that needs to be preserved.

I'm cooking up a mini-series covering tar/cp -P so far and hopefully the
JP encodings later. Do I understand correctly that for Merijin's use
case on HP-UX, we want

- as few extra tools (GNU...) as possible for the run time git
- may get a few more tools installed to run the test

I still don't have a clear picture of the iconv situation: Does your
iconv library require OLD_ICONV to compile? Is there a reason you want
to disable it?

Failing so many tests with NO_ICONV is certainly not ideal, but I'm not
sure we should care to protect so many tests with a prerequisite.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFH] GSoC 2015 application

2015-02-19 Thread Matthieu Moy
Jeff King  writes:

> I do need somebody to volunteer as backup admin. This doesn't need
> to involve any specific commitment, but is mostly about what to do if I
> get hit by a bus.

If you promise me to try hard not to be hit by a bus and no one else
steps in, I can be the backup admin.

> Where I really need help now is in the "ideas" page:
>
>   http://git.github.io/SoC-2015-Ideas.html

Throwing out a few ideas for discussion, I can write something if people
agree.

* "git bisect fixed/unfixed", to allow bisecting a fix instead of a
  regression less painfully. There were already some proposed patches
  ( 
https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#git_bisect_fix.2Funfixed
 ),
  so it shouldn't be too hard. Perhaps this item can be included in the
  "git bisect --first-parent" idea (turning it into "git bisect
  improvements").

* Be nicer to the user on tracked/untracked merge conflicts

  I've had it on
  
https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#Be_nicer_to_the_user_on_tracked.2Funtracked_merge_conflicts
  for a while but never got someone to do it.

"When merging a commit which has tracked files with the same name as local 
untracked files, Git refuses to proceed. It could be nice to:

- Accept the situation without conflict when the tracked file has
  the exact same content as the local untracked file (which would
  become tracked). No data is lost, nothing can be committed
  accidentally.

- Possibly, for fast-forward merges, if a local files belongs to the
  index but not to the last commit, attempt a merge between the
  upstream version and the local one (resulting in the same content
  as if the file had just been committed, but without introducing an
  extra commit). 

Recent versions SVN do something similar: on update, it considers
added but not committed files like normal tracked files, and
attempts a merge of the upstream version with the local one (which
always succeeds when the files have identical content). Attempting a
merge for non-fast forward cases would probably not make sense: it
would mix changes coming from the merge with other changes that do
not come from a commit."
  
  This shouldn't be technically too hard, but finding which behavior is
  right, where should things be customizeable and what the default value
  for the configuration should be will probably lead to interesting
  discussions. It contains two steps, which is good (all-or-nothing
  projects are much harder to deal with). The biggest drawback is that
  the first item may be simple for a GSoC while the second could be both
  controversial and hard to implement (depending on which solution is
  taken).

> and the list of microprojects:
>
>   http://git.github.io/SoC-2015-Microprojects.html

Here are a few ideas, based on
https://git.wiki.kernel.org/index.php/SmallProjectsIdeas

-- >8 --
>From 513774754872436ea8b7eea63828b804c6a107e7 Mon Sep 17 00:00:00 2001
From: Matthieu Moy 
Date: Thu, 19 Feb 2015 10:48:06 +0100
Subject: [PATCH] 2015 microproject ideas

---
 SoC-2015-Microprojects.md | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/SoC-2015-Microprojects.md b/SoC-2015-Microprojects.md
index 8cb6a8f..1abf595 100644
--- a/SoC-2015-Microprojects.md
+++ b/SoC-2015-Microprojects.md
@@ -128,3 +128,45 @@ the user wanted.
 
 Because --graph is about connected history while --no-walk is about
 discrete points.  Cf. $gmane/216083
+
+### Move ~/.git-credentials and ~/.git-credential-cache to ~/.config/git
+
+Most of git dotfiles can be located, at the user's option, in
+~/. or in ~/.config/git/, following the [XDG
+standard](http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html).
+~/.git-credentials and ~/.git-credential-cache are still hardcoded as
+~/., and should allow using the XDG directory layout too
+(~/.git-credentials could be allowed as ~/.config/git/credential and
+~/.git-credential-cache could be allowed as ~/.cache/git/credential,
+possibly modified by $XDG_CONFIG_HOME and $XDG_CACHE_HOME).
+
+Each of these files can be a microproject of its own. The suggested
+approach is:
+
+* See how XDG was implemented for other files (run "git log --grep
+  XDG" in Git's source code) and read the XDG specification.
+
+* Implement and test the new behavior, without breaking compatibility
+  with the old behavior.
+
+* Update the documentation
+
+### Add configuration options for some commonly used command-line options
+
+This includes:
+
+* git am -3
+
+* git am -c
+
+Some people always run the command with these options, and would
+prefer to be able to activate them by default in ~/.gitconfig.
+
+### Add more builtin patterns for userdiff
+
+"git diff" shows the function name corresponding to each hunk after
+the @@ ... @@ line. For common languages (C, HTML, Ada, Matlab, ...),
+the way to find the function name is built-in Git's source code as
+regular expressio

Re: [RFC] git cat-file "literally" option

2015-02-19 Thread karthik nayak


On 02/18/2015 07:28 PM, Duy Nguyen wrote:> On Wed, Feb 18, 2015 at 7:50
> Use what sha1_object_info() uses behind the scene. Loose object
> encodes object type as a string, you could just print that string and
> skip the enum object_type conversion. You probably need special
> treatment for packed objects too. See parse_sha1_header() and
> unpack_object_header().

Thank you will look into that!

On 02/18/2015 09:17 PM, Junio C Hamano wrote:

On Wed, Feb 18, 2015 at 5:58 AM, Duy Nguyen  wrote:

... skip the enum object_type conversion. You probably need special
treatment for packed objects too.


I do not think you can store object of type "bogus" in a pack data stream
to begin with, so I wouldn't worry about packed objects.

"cat-file --literally" that does not take "-t" would not be useful, as the
output "cat-file  " does not tell what  the thing
is. Other things like sizes and existence can be inferred once you have
an interface to do "cat-file  ", so in that sense -e and -s
are not essential (this also applies to "cat-file" without --literally).

By definition, "--literally -p" would not be able to do anything fancier than
just dump the bytes (i.e. what "cat-file  " does), as the
bogus type is not something the code would know the best external
representation for.



Thanks for clearing that out. Will work on this for now.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should "git log --decorate" indicate whether the HEAD is detached?

2015-02-19 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 18.02.2015 20:49:
> Michael J Gruber  writes:
> 
>> Yep, it very well is. Also, that approach would tell you which branch is
>> checked out, though I don't consider that git log's business.
>>
>> OTOH, it's "backwards" in the sense that it marks the "ordinary" case
>> (HEAD is symref, branch is checked out) specially compared to the
>> "exceptional/dangerous" case (HEAD is ref, detached).
> 
> Both are ordinary and there is nothing exceptional or dangerous
> about your HEAD temporarily being detached during a "rebase -i"
> session, for example.

Sure, that's why I put it in quotes. That's only how it is perceived by
some users, and I suppose it's that kind of users that we are trying to
help here.

>> And status, branch
>> will point out that latter case more verbously, too.
> 
> Yeah, but as you said, that is not "log"'s business.

I still think decorations "detached HEAD" resp. "HEAD" for the two cases
are more natural, if we want to include any additional information at
all. Just think of:

deadbeef (HEAD=master, topicbranch, tag: v1)

log/rev-list is about commit objects. All the refs above resolve to the
same commit, so why are only two of them equal?

In fact, they are very unequal, since HEAD would be "ref:
refs/heads/master" whereas master would "deadbeef". They are equal in
the other (detached) case! I'm not telling you any news here, I just
want to point out how badly misleading that notation is.

So, I would suggest to "decorate the decorations", by saying something
like "detached HEAD", and maybe some version of "HEAD at master" (I'd
prefer just "HEAD") and possibly more info on the tags ("s-tag" or
"signed tag" etc).

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html