Re: Expanding Includes in .gitignore

2016-10-29 Thread Duy Nguyen
On Fri, Oct 28, 2016 at 4:32 PM, Aaron Pelly  wrote:
> Or how about a new githook that can intelligently create or return the
> details? This would be my least favourite option unless it was
> configured in an obvious place.

I wonder if smudge/clean filters can be used to recreate in-tree
.gitignore from .gitignore.d/*.
-- 
Duy


Re: Expanding Includes in .gitignore

2016-10-29 Thread Duy Nguyen
On Fri, Oct 28, 2016 at 4:04 AM, Jeff King  wrote:
> On Thu, Oct 27, 2016 at 12:48:34PM -0700, Jacob Keller wrote:
>
>> > I think the normal behavior in such "foo.d" directory is to just sort
>> > the contents lexically and read them in order, as if they were all
>> > concatenated together, and with no recursion. I.e., behave "as if" the
>> > user had run "cat $dir/*".
>>
>> Yea, this is the normal behavior, and the user is expected to order
>> their files lexically such as "00-name", "50-name" and so on. Pretty
>> traditional for a lot of newer configurations.
>
> One thing I will say about this approach is that you can implement it
> without any changes in git by doing:
>
>   path=.git/info/exclude
>   cat $path.d/* >$path
>
> and I have seen several config mechanisms basically do that (e.g.,
> Debian packaging for a program that doesn't have its own ".d" mechanism,
> but needs to grab config provided by several separate packages).

My first thought at this .git/info/exclude.d was "oh no I have to
teach untracked cache about new dependencies, or at least disable it
until it can deal with exclude.d", but this "cat" approach simplifies
things and should keep untracked cache unchanged.

There may be complication with negative patterns though. The user may
want to limit the effect of negative patterns within individual
exclude files in exclude.d so a negative pattern in exclude.d/a won't
influence anything in exclude.d/b (easier to reason, safer to compose
different exclude sets). The plain "cat" would lose file boundary info
that we need. I'm not sure. But I'll dig more into it when patches
show up.
-- 
Duy


[PATCH] git-sh-setup: Restore sourcability from outside scripts

2016-10-29 Thread Anders Kaseorg
v2.10.0-rc0~45^2~2 “i18n: git-sh-setup.sh: mark strings for
translation” broke outside scripts such as guilt that source
git-sh-setup as described in the documentation:

$ . "$(git --exec-path)/git-sh-setup"
sh: 6: .: git-sh-i18n: not found

This also affects contrib/convert-grafts-to-replace-refs.sh and
contrib/rerere-train.sh in tree.  Fix this by using git --exec-path to
find git-sh-i18n.

While we’re here, move the sourcing of git-sh-i18n below the shell
portability fixes.

Signed-off-by: Anders Kaseorg 
---

Is this a supported use of git-sh-setup?  Although the documentation is
clear that the end user should not invoke it directly, it seems to imply
that scripts may do this, and in practice it has worked until v2.10.0.

 git-sh-setup.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index a8a4576..240c7eb 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -2,9 +2,6 @@
 # to set up some variables pointing at the normal git directories and
 # a few helper shell functions.
 
-# Source git-sh-i18n for gettext support.
-. git-sh-i18n
-
 # Having this variable in your environment would break scripts because
 # you would cause "cd" to be taken to unexpected places.  If you
 # like CDPATH, define it for your interactive shell sessions without
@@ -46,6 +43,9 @@ git_broken_path_fix () {
 
 # @@BROKEN_PATH_FIX@@
 
+# Source git-sh-i18n for gettext support.
+. "$(git --exec-path)/git-sh-i18n"
+
 die () {
die_with_status 1 "$@"
 }
-- 
2.10.1



Re: [PATCH v1 10/19] read-cache: regenerate shared index if necessary

2016-10-29 Thread Christian Couder
On Tue, Oct 25, 2016 at 12:16 PM, Duy Nguyen  wrote:
> On Sun, Oct 23, 2016 at 4:26 PM, Christian Couder
>  wrote:
>> @@ -2233,7 +2263,8 @@ int write_locked_index(struct index_state *istate, 
>> struct lock_file *lock,
>> if ((v & 15) < 6)
>> istate->cache_changed |= SPLIT_INDEX_ORDERED;
>> }
>> -   if (istate->cache_changed & SPLIT_INDEX_ORDERED) {
>> +   if (istate->cache_changed & SPLIT_INDEX_ORDERED ||
>> +   too_many_not_shared_entries(istate)) {
>
> It's probably safer to keep this piece unchanged and add this
> somewhere before it
>
> if (too_many_not_shared_entries(istate))
> istate->cache_changed |= SPLIT_INDEX_ORDERED;
>
> We could keep cache_changed consistent until the end this way.

Ok, it will be in the next version.

>>  test_expect_success 'enable split index' '
>> +   git config splitIndex.maxPercentChange 100 &&
>
> An alternative name might be splitThreshold. I don't know, maybe
> maxPercentChange is better.

I think it is important to say that it is a percent in the name, so I
prefer maxPercentChange.

Thanks,
Christian.


Re: [PATCH v1 10/19] read-cache: regenerate shared index if necessary

2016-10-29 Thread Christian Couder
On Sun, Oct 23, 2016 at 6:07 PM, Ramsay Jones
 wrote:
>>
>> +int too_many_not_shared_entries(struct index_state *istate)
>
> This function is a file-loacal symbol; could you please make it
> a static function.

Ok, it will be in the next version.

Thanks,
Christian.


Re: [PATCH v1 09/19] config: add git_config_get_max_percent_split_change()

2016-10-29 Thread Christian Couder
On Tue, Oct 25, 2016 at 12:06 PM, Duy Nguyen  wrote:
> On Sun, Oct 23, 2016 at 4:26 PM, Christian Couder
>  wrote:
>> This new function will be used in a following commit to get the
>> +int git_config_get_max_percent_split_change(void)
>> +{
>> +   int val = -1;
>> +
>> +   if (!git_config_get_int("splitindex.maxpercentchange", &val)) {
>> +   if (0 <= val && val <= 100)
>> +   return val;
>> +
>> +   error("splitindex.maxpercentchange value '%d' "
>
> We should keep camelCase form for easy reading. And wrap this string with _().

Ok, it will be in the next version.

Thanks,
Christian.


Re: [PATCH v1 05/19] update-index: warn in case of split-index incoherency

2016-10-29 Thread Christian Couder
On Tue, Oct 25, 2016 at 12:00 PM, Duy Nguyen  wrote:
> On Sun, Oct 23, 2016 at 4:26 PM, Christian Couder
>  wrote:
>> When users are using `git update-index --(no-)split-index`, they
>> may expect the split-index feature to be used or not according to
>> the option they just used, but this might not be the case if the
>> new "core.splitIndex" config variable has been set. In this case
>> let's warn about what will happen and why.
>>
>> Signed-off-by: Christian Couder 
>> ---
>>  builtin/update-index.c | 11 ++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/builtin/update-index.c b/builtin/update-index.c
>> index b75ea03..a14dbf2 100644
>> --- a/builtin/update-index.c
>> +++ b/builtin/update-index.c
>> @@ -1098,12 +1098,21 @@ int cmd_update_index(int argc, const char **argv, 
>> const char *prefix)
>> }
>>
>> if (split_index > 0) {
>> +   if (git_config_get_split_index() == 0)
>> +   warning("core.splitIndex is set to false; "
>> +   "remove or change it, if you really want to "
>> +   "enable split index");
>
> Wrap this string and the one below with _() so they can be translated.

Ok, it will be in the next version.


Re: [PATCH v1 03/19] split-index: add {add,remove}_split_index() functions

2016-10-29 Thread Christian Couder
On Tue, Oct 25, 2016 at 11:58 AM, Duy Nguyen  wrote:
> On Sun, Oct 23, 2016 at 4:26 PM, Christian Couder
>  wrote:
>> +void remove_split_index(struct index_state *istate)
>> +{
>> +   if (istate->split_index) {
>> +   /*
>> +* can't discard_split_index(&the_index); because that
>> +* will destroy split_index->base->cache[], which may
>> +* be shared with the_index.cache[]. So yeah we're
>> +* leaking a bit here.
>
> In the context of update-index, this is a one-time thing and leaking
> is tolerable. But because it becomes a library function now, this leak
> can become more serious, I think.
>
> The only other (indirect) caller is read_index_from() so probably not
> bad most of the time (we read at the beginning of a command only).
> sequencer.c may discard and re-read the index many times though,
> leaking could be visible there.

So is it enough to check if split_index->base->cache[] is shared with
the_index.cache[] and then decide if discard_split_index(&the_index)
should be called?

Thanks,
Christian.


Re: Fetch/push lets a malicious server steal the targets of "have" lines

2016-10-29 Thread Jeff King
On Sat, Oct 29, 2016 at 12:08:31PM -0400, Matt McCutchen wrote:

> Let's focus on the first scenario.  There the user is just pulling and
> pushing a master branch.  Are you saying that each time the user pulls,
> they need to look over all the commits they pulled before pushing them
> back?  I think that's unrealistic, for example, on a busy project with
> centralized code review or if the user is publishing a project-specific 
> modified version of an upstream library.  The natural user expectation
> is that anything pulled from a public repository is public.

No, I'm saying if you are running "git push foo master", then you should
expect the contents of "master" to go to "foo". That _could_ have
security implications if you come up with a sequence of events where
secret things made it to "master". But it seems to me that "foo
previously lied to you about what it has" is not the weak link in that
chain. It is not thinking about what secret things are hitting the
master that you are pushing, no matter how they got there.

I agree there is a potential workflow (that you have laid out) where
such lying can cause an innocent-looking sequence of events to disclose
the secret commits. And again, I don't mind a note in the documentation
mentioning that. I just have trouble believing it's a common one in
practice.

The reason I brought up the delta thing, even though it's a much harder
attack to execute, is that it comes up in much more common workflows,
like simply fetching from a private security-sensitive repo into your
"main" public repo (which is an example you brought up, and something I
know that I have personally done in the past for git.git).

-Peff


Re: Fetch/push lets a malicious server steal the targets of "have" lines

2016-10-29 Thread Jon Loeliger
So, like, Junio C Hamano said:
> Matt McCutchen  writes:
> 
> > Then the server generates a commit X3 that lists Y2 as a parent, even
> > though it doesn't have Y2, and advances 'x' to X3.  The victim fetches
> > 'x':
> >
> >victim  server
> >
> >  Y1---Y2  (Y2)
> > /   \ \ 
> > ---O---O---X1---X2---X3   ---O---O---X1---X2---X3
> >
> > Then the server rolls back 'x' to X2:
> >
> >victim  server
> >
> >  Y1---Y2
> > /   \
> > ---O---O---X1---X2---X3   ---O---O---X1---X2
> 
> Ah, I see.  My immediate reaction is that you can do worse things in
> the reverse direction compared to this, but your scenario does sound
> bad already.

Is there an existing protocol provision, or an extension to
the protocol that would allow a distrustful client to say to
the server, "Really, you have Y2?  Prove it."  And expect the
server to respond with a SHA1 sequence back to a common SHA
(in this case the left-most O).  If so, a user could designate
some branch (Y) as "sensitive".  Or, a whole repo could be
so designated and the client then effectivey treats the server
as a semi-hostile witness.

Dunno.

jdl



Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-29 Thread Linus Torvalds
On Sat, Oct 29, 2016 at 1:25 AM, Johannes Schindelin
 wrote:
>
> Correct. We cannot change an open file handle's state ("FD_CLOEXEC") on
> Windows, but we can ask open() to open said file handle with the correct
> flag ("O_CLOEXEC", which is mapped to O_NOINHERIT on Windows.

Ok. So then I have no issues with it, and let's use O_CLOEXEC if it
exists and fcntl(FD_CLOEXEC) if O_CLOEXEC doesn't exist.

  Linus


Re: Fetch/push lets a malicious server steal the targets of "have" lines

2016-10-29 Thread Matt McCutchen
On Sat, 2016-10-29 at 09:39 -0400, Jeff King wrote:
> I'm not sure I understand how connecting to a remote server to fetch is
> a big problem. The server may learn about the existence of particular
> sha1s in your repository, but cannot get their content.
> 
> It's the subsequent push that is a problem.
> 
> In the scenarios you've described, I'm mostly inclined to say that the
> problem is not git or the protocol itself, but rather lax refspecs.
> You mentioned earlier:
> 
>   the server can just generate another ref 'xx' pointing to Y2, assuming
>   it can entice the victim to set up a corresponding local branch
>   refs/heads/for-server1/xx and push it back.  Or if the victim is for
>   some reason just mirroring back and forth:
> 
> This sounds a lot like "I told git to push a bunch of things without
> checking if they were really secret, and it turned out to push some
> secret things". IOW I think the problem is not that the server may lie
> about what it has, but that the user was not careful about what they
> pushed. I dunno. I do not mind making a note in the documentation
> explaining the implications of a server lying, but the scenarios seem
> pretty contrived to me.

Let's focus on the first scenario.  There the user is just pulling and
pushing a master branch.  Are you saying that each time the user pulls,
they need to look over all the commits they pulled before pushing them
back?  I think that's unrealistic, for example, on a busy project with
centralized code review or if the user is publishing a project-specific 
modified version of an upstream library.  The natural user expectation
is that anything pulled from a public repository is public.

But let's see what Junio says in the other subthread.

> A much more interesting one, IMHO, is a server whose receive-pack lies
> about which objects it has (possibly ones it found out about earlier via
> fetch), which provokes the client to generate deltas against objects the
> server doesn't have (and thereby leaking information about the base
> objects).
> 
> That is a problem no matter how careful your refspecs are. I suspect it
> would be a hard attack to pull off in practice, just because it's going
> to depend heavily on the content of the specific objects, what kinds of
> deltas you can convince the other side to generate, etc. That might
> merit a mention in the git-push documentation.

Sure, if I end up doing a patch, I'll include this.

Matt


Re: Fetch/push lets a malicious server steal the targets of "have" lines

2016-10-29 Thread Matt McCutchen
On Fri, 2016-10-28 at 22:31 -0700, Junio C Hamano wrote:
> Not sending to the list, where mails from Gmail/phone is known to get
> rejected.

[I guess I can go ahead and quote this to the list.]

> No. I'm saying that the scenario you gave is bad and people should be
> taught not to connect to untrustworthy sites.

To clarify, are you saying:

(1) don't connect to an untrusted server ever (e.g., we don't promise
that the server can't execute arbitrary code on the client), or

(2) don't connect to an untrusted server if the client repository has
data that needs to be kept secret from the server?

The fetch/push attack relates only to #2.  If #1, what are the other
risks you are thinking of?

Matt


Re: Fetch/push lets a malicious server steal the targets of "have" lines

2016-10-29 Thread Jeff King
On Fri, Oct 28, 2016 at 11:33:49PM -0400, Matt McCutchen wrote:

> On Fri, 2016-10-28 at 18:11 -0700, Junio C Hamano wrote:
> > Ah, I see.  My immediate reaction is that you can do worse things in
> > the reverse direction compared to this, but your scenario does sound
> > bad already.
> 
> Are you saying that clients connecting to untrusted servers already
> face worse risks that people should know about, so there is no point in
> documenting this one?  I guess I don't know about the other risks aside
> from accepting a corrupt object, which should be preventable by
> enabling fetch.fsckObjects.  It seems we need either a statement that
> connecting to untrusted servers is officially unsupported or a
> description of the specific risks.

I'm not sure I understand how connecting to a remote server to fetch is
a big problem. The server may learn about the existence of particular
sha1s in your repository, but cannot get their content.

It's the subsequent push that is a problem.

In the scenarios you've described, I'm mostly inclined to say that the
problem is not git or the protocol itself, but rather lax refspecs.
You mentioned earlier:

  the server can just generate another ref 'xx' pointing to Y2, assuming
  it can entice the victim to set up a corresponding local branch
  refs/heads/for-server1/xx and push it back.  Or if the victim is for
  some reason just mirroring back and forth:

This sounds a lot like "I told git to push a bunch of things without
checking if they were really secret, and it turned out to push some
secret things". IOW I think the problem is not that the server may lie
about what it has, but that the user was not careful about what they
pushed. I dunno. I do not mind making a note in the documentation
explaining the implications of a server lying, but the scenarios seem
pretty contrived to me.

A much more interesting one, IMHO, is a server whose receive-pack lies
about which objects it has (possibly ones it found out about earlier via
fetch), which provokes the client to generate deltas against objects the
server doesn't have (and thereby leaking information about the base
objects).

That is a problem no matter how careful your refspecs are. I suspect it
would be a hard attack to pull off in practice, just because it's going
to depend heavily on the content of the specific objects, what kinds of
deltas you can convince the other side to generate, etc. That might
merit a mention in the git-push documentation.

-Peff


[PATCH] commit: simplify building parents list

2016-10-29 Thread René Scharfe
Push pptr down into the FROM_MERGE branch of the if/else statement,
where it's actually used, and call commit_list_append() for appending
elements instead of playing tricks with commit_list_insert().  Call
copy_commit_list() in the amend branch instead of open-coding it.  Don't
bother setting pptr in the final branch as it's not used thereafter.

Signed-off-by: Rene Scharfe 
---
 builtin/commit.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index a2baa6e..8976c3d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1642,7 +1642,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
const char *index_file, *reflog_msg;
char *nl;
unsigned char sha1[20];
-   struct commit_list *parents = NULL, **pptr = &parents;
+   struct commit_list *parents = NULL;
struct stat statbuf;
struct commit *current_head = NULL;
struct commit_extra_header *extra = NULL;
@@ -1688,20 +1688,18 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
if (!reflog_msg)
reflog_msg = "commit (initial)";
} else if (amend) {
-   struct commit_list *c;
-
if (!reflog_msg)
reflog_msg = "commit (amend)";
-   for (c = current_head->parents; c; c = c->next)
-   pptr = &commit_list_insert(c->item, pptr)->next;
+   parents = copy_commit_list(current_head->parents);
} else if (whence == FROM_MERGE) {
struct strbuf m = STRBUF_INIT;
FILE *fp;
int allow_fast_forward = 1;
+   struct commit_list **pptr = &parents;
 
if (!reflog_msg)
reflog_msg = "commit (merge)";
-   pptr = &commit_list_insert(current_head, pptr)->next;
+   pptr = commit_list_append(current_head, pptr);
fp = fopen(git_path_merge_head(), "r");
if (fp == NULL)
die_errno(_("could not open '%s' for reading"),
@@ -1712,7 +1710,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
parent = get_merge_parent(m.buf);
if (!parent)
die(_("Corrupt MERGE_HEAD file (%s)"), m.buf);
-   pptr = &commit_list_insert(parent, pptr)->next;
+   pptr = commit_list_append(parent, pptr);
}
fclose(fp);
strbuf_release(&m);
@@ -1729,7 +1727,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
reflog_msg = (whence == FROM_CHERRY_PICK)
? "commit (cherry-pick)"
: "commit";
-   pptr = &commit_list_insert(current_head, pptr)->next;
+   commit_list_insert(current_head, &parents);
}
 
/* Finally, get the commit message */
-- 
2.10.2



Re: [PATCH 0/4] Make other git commands use trailer layout

2016-10-29 Thread Christian Couder
On Sat, Oct 29, 2016 at 3:12 AM, Junio C Hamano  wrote:
> Jonathan Tan  writes:
>
>> This is built off jt/trailer-with-cruft (commit 60ef86a).
>>
>> This patch set makes "commit -s", "cherry-pick -x", and
>> "format-patch --signoff" use the new trailer definition implemented in
>> jt/trailer-with-cruft, with some refactoring along the way. With this
>> patch set, the aforementioned commands would now handle trailers like
>> those described in [1].
>>
>> [1] <84f28caa-2e4b-1231-1a76-3b7e765c0...@google.com>
>
> Ooooh.  Looks delicious ;-)

Yeah, I am very happy with this goal being reached :-)

Thanks,
Christian.


Re: [PATCH 2/4] trailer: avoid unnecessary splitting on lines

2016-10-29 Thread Christian Couder
On Sat, Oct 29, 2016 at 2:05 AM, Jonathan Tan  wrote:
> trailer.c currently splits lines while processing a buffer (and also
> rejoins lines when needing to invoke ignore_non_trailer).
>
> Avoid such line splitting, except when generating the strings
> corresponding to trailers (for ease of use by clients - a subsequent
> patch will allow other components to obtain the layout of a trailer
> block in a buffer, including the trailers themselves). The main purpose
> of this is to make it easy to return pointers into the original buffer
> (for a subsequent patch), but this also significantly reduces the number
> of memory allocations required.
>
> Signed-off-by: Jonathan Tan 
> ---
>  trailer.c | 215 
> +-
>  1 file changed, 116 insertions(+), 99 deletions(-)

IMHO it is telling that this needs 17 more lines.

> @@ -954,7 +971,7 @@ void process_trailers(const char *file, int in_place, int 
> trim_empty, struct str
>  {
> LIST_HEAD(head);
> LIST_HEAD(arg_head);
> -   struct strbuf **lines;
> +   struct strbuf sb = STRBUF_INIT;

We often use "sb" as the name of strbuf variables, but I think at
least here (and maybe in other places above) we could use something a
bit more telling, like "input_buf" perhaps.

> int trailer_end;
> FILE *outfile = stdout;
>


Re: [PATCH v2 1/2] read-cache: factor out get_sha1_from_index() helper

2016-10-29 Thread Duy Nguyen
On Wed, Oct 12, 2016 at 8:47 PM,   wrote:
> From: Torsten Bögershausen 
>
> Factor out the retrieval of the sha1 for a given path in
> read_blob_data_from_index() into the function get_sha1_from_index().
>
> This will be used in the next commit, when convert.c can do the
> analyze for "text=auto" without slurping the whole blob into memory
> at once.
>
> Add a wrapper definition get_sha1_from_cache().
>
> Signed-off-by: Torsten Bögershausen 
> ---
>  cache.h  |  3 +++
>  read-cache.c | 29 ++---
>  2 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 1604e29..04de209 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -380,6 +380,7 @@ extern void free_name_hash(struct index_state *istate);
>  #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
>  #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
>  #define read_blob_data_from_cache(path, sz) 
> read_blob_data_from_index(&the_index, (path), (sz))
> +#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
>  #endif
>
>  enum object_type {
> @@ -1089,6 +1090,8 @@ static inline void *read_sha1_file(const unsigned char 
> *sha1, enum object_type *
> return read_sha1_file_extended(sha1, type, size, 
> LOOKUP_REPLACE_OBJECT);
>  }
>
> +const unsigned char *get_sha1_from_index(struct index_state *istate, const 
> char *path);
> +
>  /*
>   * This internal function is only declared here for the benefit of
>   * lookup_replace_object().  Please do not call it directly.
> diff --git a/read-cache.c b/read-cache.c
> index 38d67fa..5a1df14 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2290,13 +2290,27 @@ int index_name_is_other(const struct index_state 
> *istate, const char *name,
>
>  void *read_blob_data_from_index(struct index_state *istate, const char 
> *path, unsigned long *size)
>  {
> -   int pos, len;
> +   const unsigned char *sha1;
> unsigned long sz;
> enum object_type type;
> void *data;
>
> -   len = strlen(path);
> -   pos = index_name_pos(istate, path, len);
> +   sha1 = get_sha1_from_index(istate, path);
> +   if (!sha1)
> +   return NULL;
> +   data = read_sha1_file(sha1, &type, &sz);
> +   if (!data || type != OBJ_BLOB) {
> +   free(data);
> +   return NULL;
> +   }
> +   if (size)
> +   *size = sz;
> +   return data;
> +}
> +
> +const unsigned char *get_sha1_from_index(struct index_state *istate, const 
> char *path)

Let's try to embrace struct object_id to make our lives easier when
the time comes to convert to a new hash algorithm by returning struct
object_id * here instead of the internal hash.

> +{
> +   int pos = index_name_pos(istate, path, strlen(path));
> if (pos < 0) {
> /*
>  * We might be in the middle of a merge, in which
> @@ -2312,14 +2326,7 @@ void *read_blob_data_from_index(struct index_state 
> *istate, const char *path, un
> }
> if (pos < 0)
> return NULL;
> -   data = read_sha1_file(istate->cache[pos]->oid.hash, &type, &sz);
> -   if (!data || type != OBJ_BLOB) {
> -   free(data);
> -   return NULL;
> -   }
> -   if (size)
> -   *size = sz;
> -   return data;
> +   return istate->cache[pos]->oid.hash;
>  }
>
>  void stat_validity_clear(struct stat_validity *sv)
-- 
Duy


Re: [PATCH v2 2/2] convert.c: stream and fast search for binary

2016-10-29 Thread Duy Nguyen
On Wed, Oct 12, 2016 at 8:47 PM,   wrote:
> From: Torsten Bögershausen 
>
> When statistics are done for the autocrlf handling, the search in
> the content can be stopped, if e.g
> - a search for binary is done, and a NUL character is found
> - a search for CRLF is done, and the first CRLF is found.
>
> Similar when statistics for binary vs non-binary are gathered:
> Whenever a lone CR or NUL is found, the search can be aborted.
>
> When checking out files in "auto" mode, any file that has a "lone CR"
> or a CRLF will not be converted, so the search can be aborted early.
>
> Add the new bit, CONVERT_STAT_BITS_ANY_CR,
> which is set for either lone CR or CRLF.
>
> Many binary files have a NUL very early and it is often not necessary
> to load the whole content of a file or blob into memory.
>
> Split gather_stats() into gather_all_stats() and gather_stats_partly()
> to do a streaming handling for blobs and files in the worktree.

Maybe break this commit down a bit? the gather_all_stats and
gather_stats_partly() seem independent and can standalone. So is the
blob streaming, and get_convert_stats_wt.
-- 
Duy


Re: Is the entire working copy “at one branch”?

2016-10-29 Thread Alexei Lozovsky
Hi Stefan,

Generally with git, your entire working copy will have the same
revision (set to current branch, aka HEAD). The idea behind this
is that your working copy of a repository should always be in
consistent state.

You can check out specific files or directories from another
revision (mimicking "svn update -r1234 filename"):

$ git checkout branch-or-sha-hash -- filename

However, SVN tracks the 'revision' thing on per-file basis, while
in git this is a property of the working copy. So if you do like
above then git will be telling you that the 'filename' has been
changed (as it is certainly different from its pristine version
in HEAD):

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

modified:   filename

So it's generally not recommended to do such a thing.

Another thing that you _can do_ in git to mimick SVN is the
'standard layout'. There is a feature called "git worktree" which
allows you to have SVN-like directory structure with multiple
directories linked to different working copies:

$ mkdir my-project
$ cd my-project
$ git clone my-project-repository master
$ mkdir branches
$ cd master
$ git worktree add -b branch-1 ../branches/branch-1
$ git worktree add -b branch-2 ../branches/branch-2

After that you will have directory structure like this:

$ tree my-project
my-project
├── branches
│   ├── branch-1
│   │   ├── 1
│   │   ├── 2
│   │   └── 3
│   └── branch-2
│   ├── 1
│   ├── 2
│   └── banana
└── master
├── 1
└── 2
You can work with these working copies separately, like you
would be working with SVN. Commits in 'master' will go to the
'master' branch, commits made in 'branches/branch-1' will go
to the 'branch-1' branch.


Re: [PATCH] valgrind: support test helpers

2016-10-29 Thread René Scharfe

Am 28.10.2016 um 10:51 schrieb Johannes Schindelin:

On Fri, 28 Oct 2016, René Scharfe wrote:

Signed-off-by: Rene Scharfe 


Apart from the missing accent ("é") in your SOB: ACK.


I sign off without accent out of habit, to avoid display problems -- 
better have a plain "e" than something like "" or worse.


But only now I realized that I can fix at least my end by using an UTF-8 
locale (e.g. LANG=C.UTF-8 instead of LANG=C) -- together with PuTTY and 
its settings Window, Translation, Remote character set: UTF-8 and 
Connection, Data, Terminal-type-string: linux, which I already had.  My 
terminal just got boosted into the 21st century, woohoo! ;)


René


Re: [PATCH] valgrind: support test helpers

2016-10-29 Thread René Scharfe
Am 28.10.2016 um 14:50 schrieb Junio C Hamano:
> Hmph.  I somehow thought this was supposed to have been fixed by
> 503e224180 ("t/test-lib.sh: fix running tests with --valgrind",
> 2016-07-11) already.

Its title seems to indicate that intention.  Probably the quickest test
script that calls a helper is t0009-prio-queue.sh, and without my patch
it reports something like this, unfortunately:

  expecting success:
  test-prio-queue 2 6 3 10 9 5 7 4 5 8 1 dump >actual &&
  test_cmp expect actual
  
  ./t0009-prio-queue.sh: 4: eval: test-prio-queue: not found
  not ok 1 - basic ordering
  #
  #   test-prio-queue 2 6 3 10 9 5 7 4 5 8 1 dump >actual &&
  #   test_cmp expect actual
  #

René


Re: [ANNOUNCE] Git v2.10.2

2016-10-29 Thread Johannes Schindelin
Hi,

On Fri, 28 Oct 2016, Junio C Hamano wrote:

> The latest maintenance release Git v2.10.2 is now available at
> the usual places.

The corresponding Git for Windows version will be hopefully out on
Tuesday: https://github.com/git-for-windows/git/milestone/5

Ciao,
Johannes


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-29 Thread Johannes Schindelin
Hi,

On Fri, 28 Oct 2016, Junio C Hamano wrote:

> Linus Torvalds  writes:
> 
> > Apparently windows doesn't even support it, at least not mingw
> 
> Assuming that the above was a misunderstanding, assuming that we can
> do O_CLOEXEC (but not FD_CLOEXEC) on Windows just fine, where stray
> file descriptors held open in the children matter more, and ...

Correct. We cannot change an open file handle's state ("FD_CLOEXEC") on
Windows, but we can ask open() to open said file handle with the correct
flag ("O_CLOEXEC", which is mapped to O_NOINHERIT on Windows.

And for the record: I agree with Junio that we cannot simply drop this
O_CLOEXEC business.

Because it was introduced to fix bugs.

Thank you for your time,
Johannes


Re: [PATCHv2 27/36] attr: convert to new threadsafe API

2016-10-29 Thread Johannes Sixt

Am 29.10.2016 um 00:06 schrieb Junio C Hamano:

Probably this needs to be squashed in, now the MinGW discussion has
settled.


Yes, this looks good. Thank you very much, both of you.

As I said, I won't be able to test this until late next week.

-- Hannes



 attr.c | 2 +-
 common-main.c  | 2 ++
 compat/mingw.c | 4 
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/attr.c b/attr.c
index 082b5ed343..961218a0d5 100644
--- a/attr.c
+++ b/attr.c
@@ -50,7 +50,7 @@ static struct git_attr *(git_attr_hash[HASHSIZE]);

 #ifndef NO_PTHREADS

-static pthread_mutex_t attr_mutex = PTHREAD_MUTEX_INITIALIZER;
+static pthread_mutex_t attr_mutex;
 #define attr_lock()pthread_mutex_lock(&attr_mutex)
 #define attr_unlock()  pthread_mutex_unlock(&attr_mutex)
 void attr_start(void) { pthread_mutex_init(&attr_mutex, NULL); }
diff --git a/common-main.c b/common-main.c
index 44a29e8b13..d4699cd404 100644
--- a/common-main.c
+++ b/common-main.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "exec_cmd.h"
+#include "attr.h"

 /*
  * Many parts of Git have subprograms communicate via pipe, expect the
@@ -32,6 +33,7 @@ int main(int argc, const char **argv)
sanitize_stdfds();

git_setup_gettext();
+   attr_start();

argv[0] = git_extract_argv0_path(argv[0]);

diff --git a/compat/mingw.c b/compat/mingw.c
index 51ed76326b..3fbfda5978 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -5,7 +5,6 @@
 #include "../strbuf.h"
 #include "../run-command.h"
 #include "../cache.h"
-#include "../attr.h"

 #define HCAST(type, handle) ((type)(intptr_t)handle)

@@ -2233,9 +2232,6 @@ void mingw_startup(void)
/* initialize critical section for waitpid pinfo_t list */
InitializeCriticalSection(&pinfo_cs);

-   /* initialize critical sections in the attr code */
-   attr_start();
-
/* set up default file mode and file modes for stdin/out/err */
_fmode = _O_BINARY;
_setmode(_fileno(stdin), _O_BINARY);