[PATCH v2] Documentation: improve git ls-files -s manpage entry

2017-04-01 Thread Mostyn Bramley-Moore
List the fields in order of appearance in the command output.

Signed-off-by: Mostyn Bramley-Moore 
---
 Documentation/git-ls-files.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 1cab703..d153c17 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -57,7 +57,7 @@ OPTIONS
 
 -s::
 --stage::
-   Show staged contents' object name, mode bits and stage number in the 
output.
+   Show staged contents' mode bits, object name and stage number in the 
output.
 
 --directory::
If a whole directory is classified as "other", show just its
-- 
2.9.3



Re: SHA1 collision in production repo?! (probably not)

2017-04-01 Thread Jeff King
On Fri, Mar 31, 2017 at 02:16:29PM -0700, Junio C Hamano wrote:

> Before we forget...
> 
> -- >8 --
> From: Jeff King 
> 
> When sha1_loose_object_info() finds that a loose object file cannot
> be stat(2)ed or mmap(2)ed, it returns -1 to signal an error to the
> caller.  However, if it found that the loose object file is corrupt
> and the object data cannot be used from it, it forgets to return -1.
> 
> This can confuse the caller, who may be lead to mistakenly think
> that there is a loose object and possibly gets an incorrect type and
> size from the function.  The SHA-1 collision detection codepath, for
> example, when it gets an object over the wire and tries to see the
> data is the same as what is available as a loose object locally, can
> get confused when the loose object is correupted due to this bug.

Unfortunately this isn't quite right. I was able to reproduce the
problem that Lars saw, and this patch doesn't fix it.

So here's a two-patch series. The first fixes the problem described
above, along with a simpler test that demonstrates it. The second fixes
Lars's problem on top.

I know you're heading out for a week, so I'll post it now for review,
and then hold and repost when you get back.

  [1/2]: sha1_loose_object_info: return error for corrupted objects
  [2/2]: index-pack: detect local corruption in collision check

 builtin/index-pack.c |  2 ++
 sha1_file.c  |  2 +-
 t/t1060-object-corruption.sh | 24 
 3 files changed, 27 insertions(+), 1 deletion(-)

-Peff


[PATCH 1/2] sha1_loose_object_info: return error for corrupted objects

2017-04-01 Thread Jeff King
When sha1_loose_object_info() finds that a loose object file
cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an
error to the caller.  However, if it found that the loose
object file is corrupt and the object data cannot be used
from it, it stuffs OBJ_BAD into "type" field of the
object_info, but returns zero (i.e., success), which can
confuse callers.

This is due to 052fe5eac (sha1_loose_object_info: make type
lookup optional, 2013-07-12), which switched the return to a
strict success/error, rather than returning the type (but
botched the return).

Callers of regular sha1_object_info() don't notice the
difference, as that function returns the type (which is
OBJ_BAD in this case). However, direct callers of
sha1_object_info_extended() see the function return success,
but without setting any meaningful values in the object_info
struct, leading them to access potentially uninitialized
memory.

The easiest way to see the bug is via "cat-file -s", which
will happily ignore the corruption and report whatever
value happened to be in the "size" variable.

Signed-off-by: Jeff King 
---
So not only does it not fail, but running with --valgrind complains.

 sha1_file.c  | 2 +-
 t/t1060-object-corruption.sh | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 71063890f..368c89b5c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
if (status && oi->typep)
*oi->typep = status;
strbuf_release(&hdrbuf);
-   return 0;
+   return (status < 0) ? status : 0;
 }
 
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info 
*oi, unsigned flags)
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index 3f8705139..3a88d79c5 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -53,6 +53,13 @@ test_expect_success 'streaming a corrupt blob fails' '
)
 '
 
+test_expect_success 'getting type of a corrupt blob fails' '
+   (
+   cd bit-error &&
+   test_must_fail git cat-file -s HEAD:content.t
+   )
+'
+
 test_expect_success 'read-tree -u detects bit-errors in blobs' '
(
cd bit-error &&
-- 
2.12.2.943.g91cb50fd8



[PATCH 2/2] index-pack: detect local corruption in collision check

2017-04-01 Thread Jeff King
When we notice that we have a local copy of an incoming
object, we compare the two objects to make sure we haven't
found a collision. Before we get to the actual object
bytes, though, we compare the type and size from
sha1_object_info().

If our local object is corrupted, then the type will be
OBJ_BAD, which obviously will not match the incoming type,
and we'll report "SHA1 COLLISION FOUND" (with capital
letters and everything). This is confusing, as the problem
is not a collision but rather local corruption. We should
reoprt that instead (just like we do if reading the rest of
the object content fails a few lines later).

Note that we _could_ just ignore the error and mark it as a
non-collision. That would let you "git fetch" to replace a
corrupted object. But it's not a very reliable method for
repairing a repository. The earlier want/have negotiation
tries to get the other side to omit objects we already have,
and it would not realize that we are "missing" this
corrupted object. So we're better off complaining loudly
when we see corruption, and letting the user take more
drastic measures to repair (like making a full clone
elsewhere and copying the pack into place).

Note that the test sets transfer.unpackLimit in the
receiving repository so that we use index-pack (which is
what does the collision check). Normally for such a small
push we'd use unpack-objects, which would simply try to
write the loose object, and discard the new one when we see
that there's already an old one.

Signed-off-by: Jeff King 
---
I was surprised to see that unpack-objects doesn't do the same collision
check. I think it relies on the loose-object precedence. But that seems
like it misses a case: if you have a packed version of an object and
receive a small fetch or push, we may unpack a duplicate object to its
loose format without doing any kind of collision check.

We may want to tighten that up. In the long run, I'd love to see
unpack-objects go away in favor of teaching index-pack how to write
loose objects. The two had very similar code once upon a time, but
index-pack has grown a lot of feature and bugfixes over the years.

 builtin/index-pack.c |  2 ++
 t/t1060-object-corruption.sh | 17 +
 2 files changed, 19 insertions(+)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 88d205f85..9f17adb37 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -809,6 +809,8 @@ static void sha1_object(const void *data, struct 
object_entry *obj_entry,
unsigned long has_size;
read_lock();
has_type = sha1_object_info(sha1, &has_size);
+   if (has_type < 0)
+   die(_("cannot read existing object info %s"), 
sha1_to_hex(sha1));
if (has_type != type || has_size != size)
die(_("SHA1 COLLISION FOUND WITH %s !"), 
sha1_to_hex(sha1));
has_data = read_sha1_file(sha1, &has_type, &has_size);
diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index 3a88d79c5..ac1f189fd 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -21,6 +21,14 @@ test_expect_success 'setup corrupt repo' '
cd bit-error &&
test_commit content &&
corrupt_byte HEAD:content.t 10
+   ) &&
+   git init no-bit-error &&
+   (
+   # distinct commit from bit-error, but containing a
+   # non-corrupted version of the same blob
+   cd no-bit-error &&
+   test_tick &&
+   test_commit content
)
 '
 
@@ -108,4 +116,13 @@ test_expect_failure 'clone --local detects misnamed 
objects' '
test_must_fail git clone --local misnamed misnamed-checkout
 '
 
+test_expect_success 'fetch into corrupted repo with index-pack' '
+   (
+   cd bit-error &&
+   test_must_fail git -c transfer.unpackLimit=1 \
+   fetch ../no-bit-error 2>stderr &&
+   test_i18ngrep ! -i collision stderr
+   )
+'
+
 test_done
-- 
2.12.2.943.g91cb50fd8


Re: [RFC PATCH 0/5] Localise error headers

2017-04-01 Thread Jeff King
On Thu, Mar 30, 2017 at 05:18:59PM +0200, Michael J Gruber wrote:

> Jeff King venit, vidit, dixit 21.01.2017 15:20:
> > On Wed, Jan 11, 2017 at 10:08:46AM -0800, Junio C Hamano wrote:
> > 
> >> Jeff King  writes:
> >>
> >>> Yes, I would think die_errno() is a no-brainer for translation, since
> >>> the strerror() will be translated.
> >>>
>  apply.c:die(_("internal error"));
> 
>  That is funny, too. I think we should substitute that with
> 
>  die("BUG: untranslated, but what went wrong instead")
> >>>
> >>> Yep. We did not consistently use "BUG:" in the early days. I would say
> >>> that "BUG" lines do not need to be translated. The point is that nobody
> >>> should ever see them, so it seems like there is little point in giving
> >>> extra work to translators.
> >>
> >> In addition, "BUG: " is relatively recent introduction to our
> >> codebase.  Perhaps having a separate BUG() function help the
> >> distinction further?
> > 
> > Yes, I think so. I have often been tempted to dump core on BUGs for
> > further analysis. You can do that by string-matching "BUG:" from the
> > beginning of a die message, but it's kind of gross. :)
> > 
> > -Peff
> 
> I read back the whole thread, and I'm still not sure if there's
> consensus and how to go forward. Should we let the topic die? I don't
> care too much personally, I just thought the mixed tranlations look
> "unprofessional".

I don't have a strong preference either way. I also don't care
personally about the output (as I do not localize at all). My main
concern was keeping the code simple for developers. But if consistent
translation is important for people in other languages, I'm OK with
whatever we need to do.

-Peff


Re: [PATCH] read-cache: avoid git_path() race in freshen_shared_index()

2017-04-01 Thread Jeff King
On Thu, Mar 30, 2017 at 04:24:40PM +0700, Duy Nguyen wrote:

> On Thu, Mar 30, 2017 at 12:56 AM, Jeff King  wrote:
> > But in the end it doesn't really matter. I think code like:
> >
> >   const char *filename = git_path(...);
> >
> > or
> >
> >   nontrivial_function(git_path(...));
> >
> > is an anti-pattern. It _might_ be safe, but it's really hard to tell
> > without following the complete lifetime of the return value. I've been
> > tempted to suggest we should abolish git_path() entirely. But it's so
> > darn useful for things like unlink(git_path(...)), or other direct
> > system calls.
> 
> Yeah. I thought we killed most of those (was it your patches?).

Yes, after fixing a bug where static buffer reuse caused git to randomly
delete a ref, I rage-converted most of the dangerous looking cases.

> I had a quick look at "git grep -w git_path" again. The ones in
> builtin/am.c, builtin/grep.c and submodule.c look very much like that
> anti-pattern. The one in read_index_from() probably should be replaced
> with git_pathdup() as well. Sorry no patches (I'm very slow these
> days).

Yeah, I think a number of them are actually OK if you dig (e.g., passing
it to am_state_init() immediately duplicates the result), but it's a bad
pattern if you have to dig to see if it's right. It's hard to tell when
a sub-function may reuse the buffer. For instance, git-init passes the
result to adjust_shared_perm(), which might lazily load the config from
disk. I don't know if that calls git_path() or not, but it's an awful
lot of code to run.

A lot of the cases look like they could be fixed by using git_path_foo()
instead of git_path("FOO"). (And in many cases we even have
git_path_foo() defined already!).

My favorite is the one in add_worktree(), which calls strbuf_addstr() on
the result of git_path(0. That one's _not_ dangerous, but surely it
would be simpler to just write directly into the strbuf. :)

-Peff


Re: [PATCH 3/3] completion: offer ctags symbol names for 'git log -S', '-G' and '-L:'

2017-04-01 Thread Jeff King
On Thu, Mar 30, 2017 at 12:06:56PM +0200, SZEDER Gábor wrote:

> >   1. You still have to come up with the filename yourself for "-L".
> 
> I was already quite satisfied that both the symbol name and the 
> filename can be TAB completed...  but right, in most cases the
> function name uniquely determines the filename, and even when it
> doesn't, it still significantly limits the possibilities.  Hmhm.

I find that I often forget which file a function is defined in,
especially in Git's code base (where it sometimes feels somewhat
random :) ).

> OTOH, the proof-of-concept patch at the bottom shows how we could
> start doing filename completion based on the ctags file, and I think
> it's really convenient to use.  Alas, it doesn't work when the
> funcname is not on its own, e.g. ends with that disambiguating '\(:'
> from above, and then Bash falls back to its own filename completion.
> However, if I may extrapolate from my ~/.bash_history, this would
> still help the great majority of the cases.

Yeah, I think that would go a long way to solving my problem.

> > I have a script (below) which makes this easier (and I complete its
> > argument using the tags file).  It's probably too gross to even go into
> > contrib, but I thought I'd share.
> 
> Perhaps 'git log -L' could be enhanced to just DWIM when it gets only
> '-L:func:', and show the log of that function, wherever it is defined.
> So instead of complaining about the missing filename, it could run
> 'grep ' with a bit of magic to find the filename where the
> matching function is declared, and search in the history of that file.
> 
> But then again, 'git log -L' could be enhanced in so many ways...

Yes, that sounds even nicer.

-Peff


Re: Very promising results with libpcre2

2017-04-01 Thread Ævar Arnfjörð Bjarmason
On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason  writes:
>
>> That enables the new JIT support in pcre v2:
>>
>>   s/iterrx fixed   prx
>> rx  2.19--  -33%  -44%
>> fixed   1.47   49%--  -17%
>> prx 1.22   79%   20%--
>
> The numbers with JIT does look "interesting".
>
> I couldn't quite tell if there are major incompatibilities in the
> regex language itself between two versions from their documentation,
> but assuming that there isn't (modulo bugfixes and enhancements) and
> assuming that we are going to use their standard matcher, it may be
> OK to just use the newer one without linking both.

There's no incompatibilities in the regex language itself (modulo bugs
etc). So yeah, I'll prepare some patch to use v2.


Re: [PATCH v5 4/6] dir_iterator: add tests for dir_iterator API

2017-04-01 Thread Jeff King
On Thu, Mar 30, 2017 at 03:25:43PM -0300, Daniel Ferreira (theiostream) wrote:

> On Thu, Mar 30, 2017 at 4:46 AM, Michael Haggerty  
> wrote:
> > Is there a special reason to write the date to the file as opposed to, say
> >
> > touch dir/b
> >
> > ? (Some people use `: >dir/b` for this purpose, though I've never found
> > out why.) If you write the date to the file, the reader will be
> > distracted unnecessarily wondering whether the contents are important to
> > the test.
> >
> 
> There's no reason. They will be `touch`ed instead of written in a next 
> version.
> 
> `:` is a bash builtin that that returns an exit code of zero and
> produces no output. On my Mac at least:
> 
> bash-3.2$ type :
> : is a shell builtin
> bash-3.2$ type touch
> touch is /usr/bin/touch
> 
> I suppose there are reasons to try to keep the most of a shell
> script's logic within the shell itself, without involving external
> binaries.

I think we actually prefer just:

  >dir/b

in our tests. The advantages over touch are:

  1. It is clear that the output will be empty afterwards (whereas with
 touch, it might just update the timestamp on an existing file).

  2. It's faster, since it doesn't require an extra process.

It's equivalent to ": >dir/b". I think you'll find all three forms in
our test suite, but ">dir/b" is the most common.

-Peff


Re: [PATCH] name-hash: fix buffer overrun

2017-04-01 Thread Johannes Schindelin
Hi Junio,

On Fri, 31 Mar 2017, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Ah, of course. Avoid GNUism to spell HT as "\t" in a sed script.

Sorry about that, I suggested this snippet to Kevin, it is my fault for
not remembering BSD sed's idiosynchracies.

Ciao,
Johannes

P.S.: enjoy your time off!


Re: [PATCH] [GSOC] prune_worktrees(): reimplement with dir_iterator

2017-04-01 Thread Robert Stanca
On Sat, Apr 1, 2017 at 5:29 AM, Junio C Hamano  wrote:
>
> Robert Stanca  writes:
>
> > Replaces recursive traversing of opendir with dir_iterator
>
> The original code for this one, and also the other one, is not
> recursive traversing.  This one enumerates all the _direct_
> subdirectories of ".git/worktrees", and feeds them to
> prune_worktree() without recursing.  The other one scans all the
> files _directly_ underneath ".git/objects/pack" to find the ones
> that begin with the packtmp prefix, and unlinks them without
> recursing.  Neither of them touches anything in subdirectory of
> ".git/worktrees/" nor ".git/objects/pack/".
>
> Using dir_iterator() to make it recursive is not just overkill but
> is a positively wrong change, isn't it?


Thanks for the review, and yes the commit message is misleading (my
bad there). I understood that remove_temp_files has no recursion
component to it and it removes all ".tmp-id-pack-" files from
/objects/pack , but shouldn't dir-iterator work even if there's no
recursion level?
My understanding of dir-iterator is that it is used for directory
iteration (recursive or not) and where are no subdirectories it's
basically recursive with level = 0 .


[PATCH] Documentation: make 3-way merge warning more generic

2017-04-01 Thread brian m. carlson
The documentation for merge strategies noted that if a change was
reverted on only one branch of a merge, the resultant merge would
contain the change.  However, similar surprising behavior can occur
where cherry-picking only one commit from a series to the other branch
can cause conflicts.

Adjust the text to state in the first sentence that only the heads and
the merge base are considered for three-way merge, instead of hiding
this in the second sentence after introducing a scenario which might not
apply to the user.  This makes it easier for users to understand by
first introducing the general rule (which applies to many scenarios),
and only then providing a specific example.

Signed-off-by: brian m. carlson 
---

I ran into this scenario with a co-worker who is a fairly advanced Git
user.  He created a series of five commits that modified a file on our
development branch and requested the cherry-pick of one of those onto
our maintenance branch.  When we merged the maintenance branch into the
development branch, he was surprised that there were conflicts.  He
expected Git to realize that the commit was already present and ignore
it.

I pointed out the documentation to him, but realized that it didn't
cover this case well, so I decided to reword it such that it covers this
case a little better, as well as addressing the original issue.

Feedback on how I did in that regard would be welcome.

 Documentation/merge-strategies.txt | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/merge-strategies.txt 
b/Documentation/merge-strategies.txt
index 2eb92b9327..54da2ffa33 100644
--- a/Documentation/merge-strategies.txt
+++ b/Documentation/merge-strategies.txt
@@ -123,9 +123,9 @@ subtree::
ancestor tree.
 
 With the strategies that use 3-way merge (including the default, 'recursive'),
-if a change is made on both branches, but later reverted on one of the
-branches, that change will be present in the merged result; some people find
-this behavior confusing.  It occurs because only the heads and the merge base
-are considered when performing a merge, not the individual commits.  The merge
-algorithm therefore considers the reverted change as no change at all, and
-substitutes the changed version instead.
+only the heads and the merge base are considered when performing a merge, not
+the individual commits.  This means that if a change is made on both branches,
+but later reverted on one of the branches, that change will be present in the
+merged result; some people find this behavior confusing.  The merge algorithm
+considers the reverted change as no change at all, and substitutes the changed
+version instead.
-- 
2.12.2.564.g063fe858b8



Re: [PATCH] status: show in-progress info for short status

2017-04-01 Thread brian m. carlson
On Fri, Mar 31, 2017 at 03:59:17PM +0200, Michael J Gruber wrote:
> Ordinary (long) status shows information about bisect, revert, am,
> rebase, cherry-pick in progress, and so does git-prompt.sh. status
> --short currently shows none of this information.
> 
> Introduce an `--inprogress` argument to git status so that, when used with
> `--short --branch`, in-progress information is shown next to the branch
> information. Just like `--branch`, this comes with a config option.

I don't have a strong opinion on this feature, but I would note that to
be consistent with other options, we'd probably want to write this as
--in-progress (with an extra dash).
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH 1/2] [GSOC] Convert signed flags to unsigned

2017-04-01 Thread Robert Stanca
 Unsigned int is a closer representation of bitflags rather than signed int 
that uses 1 special bit for sign.This shouldn't make much difference because 
rev_list_info.flags uses only 2 bits(BISECT_SHOW_ALL and REV_LIST_QUIET)

Signed-off-by: Robert Stanca 
---
 bisect.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bisect.h b/bisect.h
index acd12ef80..a979a7f11 100644
--- a/bisect.h
+++ b/bisect.h
@@ -16,7 +16,7 @@ extern struct commit_list *filter_skipped(struct commit_list 
*list,
 
 struct rev_list_info {
struct rev_info *revs;
-   int flags;
+   unsigned int flags;
int show_timestamp;
int hdr_termination;
const char *header_prefix;
-- 
2.12.2.575.gb14f27f91.dirty



[PATCH 2/2] [GSOC] show_bisect_vars(): Use unsigned int instead of signed int for flags

2017-04-01 Thread Robert Stanca
 As rev_list_info's flag is unsigned int , var flags should have proper 
type.Also var cnt could be unsigned as there's no negative number of commits 
(all-reaches)

Signed-off-by: Robert Stanca 
---
 builtin/rev-list.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 0aa93d589..eb4af53ab 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -211,7 +211,7 @@ static void print_var_int(const char *var, int val)
 
 static int show_bisect_vars(struct rev_list_info *info, int reaches, int all)
 {
-   int cnt, flags = info->flags;
+   unsigned int cnt, flags = info->flags;
char hex[GIT_SHA1_HEXSZ + 1] = "";
struct commit_list *tried;
struct rev_info *revs = info->revs;
-- 
2.12.2.575.gb14f27f91.dirty



[RFC] [GSoC] Proposal Draft for GSoC 2017 : Incremental Rewrite of git-submodules

2017-04-01 Thread Prathamesh Chavan
Hello everyone,

This is the first draft of my project proposal. I decided to change my project
since the project I initially intended to do and also proposed was slightly
a smaller project for 13 weeks. Also when I came across this project,
and also found out that there are left over bits (git submodule $cmd
$pathspec may want to error out when the $pathspec does not match any
submodules) which also can be included in this conversion process, I was
motivated to change my project and hence made my proposal regarding it.

Also I wish to complete the previously proposed task, probably after finishing
GSoC first.

Here is a Google doc for my proposal,
https://docs.google.com/document/d/1krxVLooWl--75Pot3dazhfygR3wCUUWZWzTXtK1L-xU

Below I have also included the text only version of my proposal, so that
all the discussion related to it would also be there on the public-inbox of git.
It would be great to have your suggestion, so that I can improve it futher.

Thanks,
Prathamesh Chavan
---

Incremental Rewrite of git-submodules
01.04.2017

About Me

Name  Prathamesh Chavan
UniversityIndian Institute of Technology, Kharagpur
Major Computer Science and Engineering(Dual Degree)
Email pc44...@gmail.com
IRC   pratham_pc
Blog  pratham-pc.github.io
Contact No.   +91-993-235-8333
Time Zone IST (UTC +5:30)


Background

I am a second-year student from the Department of Computer Science and
Engineering IIT Kharagpur. I’m an active member at Kharagpur open
source society and also of Kharagpur Linux Users Group since my first
year. I got introduced to open source, Linux and git since the
beginning of my first year.
I use to try to complete any task on my laptop without the use of the
cursor and hence eventually got familiar with shell script and in the
same manner, I still find something new every day. I’m also quite
familiar with C language. I always wanted to get involved in the
development process. As Google Summer of Code is a great way to
achieve it, hence I would like to participate.
This is my first attempt to GSoC. Since I use Git on regular basis
and would also continue to use it, I would love to be a part of its
contributors and hence would also be able to contribute to the project
even after GSoC.

The Project

Abstract

To manage the external dependencies, git submodule commands have been
frequently used by the developers in their projects. But as most of the
subcommands of git submodule are still written in shell scripts (All, but
git submodule init), my project intends to convert the subcommands into
C code, thus making them builtins. This will increase Git's portability
and hence the efficiency of working with the git-submodule commands.
The function cmd_init has been ported to its built-in successfully and
similar operation needs to be done to the remaining functions.

Problems encountered while using Shell Scripts

Git submodule subcommands are currently implemented by using shell script
'git-submodule.sh'. There are several reasons why we'll prefer not to
use the shell script:

1. Less Efficient

Since being a part of shell script, the commands of git submodule do not
have access to the low-level internal API of git, which are designed to
carry out the task more efficiently.
Also, since the subcommands in  ‘git-submodule.sh’ do not have direct
access to the native low-level internal API's of git, carrying out even
trivial processes make git be spawned into a separate process with the
help of plumbing functions. Also since everything comes from external
commands even in the shell scripts, everything needs to undergo through
a fork and exec. Whenever fork takes place, the git command requires
another fully functioning POSIX shell. This results in making the git
submodule commands slow and relatively inefficient, and also reduces git's
portability, since it is difficult to port such commands to a non-POSIX
an environment like Windows.

2. Spawning makes the process slower in many cases

Most of the operations in the shell scripts require the scripts to call
the external executables, even for performing trivial tasks. When we
take a look at the file 'git-submodule.sh', then its is seen that
many git-executables like : 'git config', 'git checkout' 'git add',
'git rev-list', 'git fetch', 'git rev-parse', 'git describe', 'git
hash-object', 'git log' and since these and other shell command comes
from other executables, every command undergoes a through a fork and
exec, thus making the processes slow.

3. Additional dependencies are introduced

When we take a look at the 'git-submodule.sh' file, we can notice that
it is dependent upon various other executables like: 'test', 'printf',
'echo', 'sed', 'grep', 'mkdir', 'rm'. This leads to the requirement of
adding these various dependencies to a non-POSIX environment.

4. Coverage checking is not implemented for shell scripts:

Since currently kcov is not implemented, checking the coverage 

Re: [PATCH v5 4/6] dir_iterator: add tests for dir_iterator API

2017-04-01 Thread Junio C Hamano
Jeff King  writes:

> I think we actually prefer just:
>
>   >dir/b
>
> in our tests. The advantages over touch are:
>
>   1. It is clear that the output will be empty afterwards (whereas with
>  touch, it might just update the timestamp on an existing file).
>
>   2. It's faster, since it doesn't require an extra process.
>
> It's equivalent to ": >dir/b". I think you'll find all three forms in
> our test suite, but ">dir/b" is the most common.

Another one.

3. Avoiding "touch" when we only want to ensure existence of a
   file and sticking to ">dir/b" (with or without ":" no-op)
   lets us reserve "touch" for the intended use of the command.
   When seeing a "touch", the reader is signalled that the
   script cares about the timestamp of the target file updated.

Among the three, this one is the most important reason from the
point of view of making the codebase maintainable in the longer
term.  By forcing the writer to think when choosing between
">target" and "touch target", we reduce the mental load of the
readers.  As is true for most things in computer field, read happens
more often than write while maintaining the code, and that is why
this matters.



Re: [RFC PATCH 0/5] Localise error headers

2017-04-01 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Mar 30, 2017 at 05:18:59PM +0200, Michael J Gruber wrote:
>
>> I read back the whole thread, and I'm still not sure if there's
>> consensus and how to go forward. Should we let the topic die? I don't
>> care too much personally, I just thought the mixed tranlations look
>> "unprofessional".
>
> I don't have a strong preference either way. I also don't care
> personally about the output (as I do not localize at all). My main
> concern was keeping the code simple for developers. But if consistent
> translation is important for people in other languages, I'm OK with
> whatever we need to do.

 (0) I do not mind the status quo, and I do not mind telling an end
 user who comes here with a translated message from die() to try
 running it again in C locale, either.

 (1) I do not think messages (including prefix) from die(),
 warning(), and error() are sacred, even when they are given by
 plumbing commands.  If the list concensus is that we want to
 see all translated in the ideal endgame, I think it is OK not
 to special case the plumbing.

 (2) I found your

 vreportf(_("fatal: "), err, params);

 a reasonable approach, if the we all agree with (1) as our
 goal, and want a way to gradually get us closer to the
 "everything translated" endgame.

I do not know what is professional and what is not in this context,
though ;-).

Michael, thanks for pinging the thread.


Re: [PATCH 1/2] sha1_loose_object_info: return error for corrupted objects

2017-04-01 Thread Junio C Hamano
Jeff King  writes:

> When sha1_loose_object_info() finds that a loose object file
> cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an
> error to the caller.  However, if it found that the loose
> object file is corrupt and the object data cannot be used
> from it, it stuffs OBJ_BAD into "type" field of the
> object_info, but returns zero (i.e., success), which can
> confuse callers.

Thanks for a nice analysis and a fix.



Re: [PATCH 2/2] index-pack: detect local corruption in collision check

2017-04-01 Thread Junio C Hamano
Jeff King  writes:

> is not a collision but rather local corruption. We should
> reoprt that instead (just like we do if reading the rest of
> the object content fails a few lines later).

s/reoprt/report/ (locally amended while queuing).

> We may want to tighten that up. In the long run, I'd love to see
> unpack-objects go away in favor of teaching index-pack how to write
> loose objects. The two had very similar code once upon a time, but
> index-pack has grown a lot of feature and bugfixes over the years.

This sounds like a nice future to aspire to reach.  

Besides having to maintain two separate executables, another
downside of the current arrangement is that we have to make the
decision between streaming to a single pack and exploding into loose
objects too early and base our decision solely on the object count.
If we are moving to a single receiver, it is conceivable to switch
to a scheme based on the size of the incoming pack (e.g. spool the
first N MB and if we run out we write out loose objects, otherwise
create a new pack, and dump the spooled part and stream the rest
into it while indexing).

Thanks.


Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-01 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Mar 31, 2017 at 01:26:31PM -0400, David Turner wrote:
>
>> Unfortunately, in order to push some large repos, the http postbuffer
>> must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
>> we just malloc a larger buffer.
>
> I'm still not sure why a 2GB post-buffer is necessary. It sounds like
> something is broken in your setup. Large pushes should be sent chunked.
>
> I know broken setups are a fact of life, but this feels like a really
> hacky work-around.

Yeah, I tend to share that "Huh? We do not want to do this, but what
forces us to?" puzzlement.

Tentatively demoted to 'pu' (out of 'jch'--candidates to go to 'next').


Re: [PATCH] name-hash: fix buffer overrun

2017-04-01 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Fri, 31 Mar 2017, Junio C Hamano wrote:
>
>> Junio C Hamano  writes:
>> 
>> > Ah, of course. Avoid GNUism to spell HT as "\t" in a sed script.
>
> Sorry about that, I suggested this snippet to Kevin, it is my fault for
> not remembering BSD sed's idiosynchracies.

Heh, don't fret about that.  We all make mistakes and that is why we
review on the list so that patches get exposure to more sets of
eyes.  We may be interested to learn from our common mistakes, but
for that purpose, "whose fault is it?" is far less interesting than
"how it came about?", i.e. what made that mistake common and if/how
we can help people avoid it (removing cuttable-and-pastable bad
examples is one way to do so).

Thanks.


Re: [PATCH v2] Documentation: improve git ls-files -s manpage entry

2017-04-01 Thread Junio C Hamano
Thanks.


Re: Very promising results with libpcre2

2017-04-01 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamano  wrote:
>> Ævar Arnfjörð Bjarmason  writes:
>>
>>> That enables the new JIT support in pcre v2:
>>>
>>>   s/iterrx fixed   prx
>>> rx  2.19--  -33%  -44%
>>> fixed   1.47   49%--  -17%
>>> prx 1.22   79%   20%--
>>
>> The numbers with JIT does look "interesting".
>>
>> I couldn't quite tell if there are major incompatibilities in the
>> regex language itself between two versions from their documentation,
>> but assuming that there isn't (modulo bugfixes and enhancements) and
>> assuming that we are going to use their standard matcher, it may be
>> OK to just use the newer one without linking both.
>
> There's no incompatibilities in the regex language itself (modulo bugs
> etc). So yeah, I'll prepare some patch to use v2.

Just to make sure that we are on the same page.  While I do not see
the need to link with both variants and allow users to choose
between them at runtime, I do not know if the whole world is ready
to drop pcre1 and use pcre2 (the latter of which has only been
around for a bit over two years).

So we'd probably want to do 

 (1) keep USE_LIBPCRE and enable v1 when set;
 (2) add USE_LIBPCRE2 and enable v2 when set;
 (3) make sure to error out when both are set.

or something like that.  It is tempting to allow us to say

make USE_LIBPCRE=2

but the existing builds are likely to be depending on "is it set to
anything? then use PCRE1" behaviour, so we unfortunately cannot take
that route.

Thanks.


Assalamu`Alaikum.

2017-04-01 Thread Mohammad Ouattara



Dear Sir/Madam.

Assalamu`Alaikum.

I am Dr mohammad ouattara, I have  ($10.6 Million us dollars) to transfer into 
your account,

I will send you more details about this deal and the procedures to follow when 
I receive a positive response from you, 

Have a great day,

Dr mohammad ouattara.


Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows

2017-04-01 Thread Jakub Narębski
W dniu 01.04.2017 o 08:08, Jeff King pisze:
> On Fri, Mar 31, 2017 at 03:24:48PM +0200, Jakub Narębski wrote:
> 
>>> I suspect in the normal case that git is doing line-ending conversion,
>>> but it's suppressed when textconv is in use.
>>
>> I would not consider this a bug if not for the fact that there is no ^M
>> without using iconv as textconv.
> 
> I don't think it's a bug, though. You have told Git that you will
> convert the contents (whatever their format) into the canonical format,
> but your program to do so includes a CR.

Well, I have not declared file binary with "binary = true" in diff driver
definition, isn't it?

> 
> We _could_ further process with other canonicalizations, but I'm not
> sure that is a good idea (line-endings sound reasonably harmless, but
> almost certainly we should not be doing clean/smudge filtering). And I'm
> not sure if there would be any compatibility fallouts.

Yes, gitattributes(5) defines interaction between 'text'/'eol', 'ident'
and 'filter' attributes, but nothing about 'diff' and 'text'/'eol'.

> 
> So I think the behavior is perhaps not what you want, but it's not an
> unreasonable one. And the solution is to define your textconv such that
> it produces clean LF-only output. Perhaps:
> 
>   [diff.whatever]
>   textconv = "iconv ... | tr -d '\r'"

Well, either this (or equivalent using dos2unix), or using 'whitespace'
attribute (or 'core.whitespace') with cr-at-eol.


P.S. What do you think about Git supporting 'encoding' attribute (or
'core.encoding' config) plus 'core.outputEncoding' in-core?

Best,
-- 
Jakub Narębski


Re: [PATCH] [GSOC] prune_worktrees(): reimplement with dir_iterator

2017-04-01 Thread Junio C Hamano
Robert Stanca  writes:

> On Sat, Apr 1, 2017 at 5:29 AM, Junio C Hamano  wrote:
>>
>> Using dir_iterator() to make it recursive is not just overkill but
>> is a positively wrong change, isn't it?
>
> Thanks for the review, and yes the commit message is misleading (my
> bad there). I understood that remove_temp_files has no recursion
> component to it and it removes all ".tmp-id-pack-" files from
> /objects/pack , but shouldn't dir-iterator work even if there's no
> recursion level?

The point is what should happen when somebody (perhaps a newer
version of Git, or a third-party add-on that works with Git) adds a
subdirectory in .git/objects/pack/ or .git/worktrees/.  The answer
is that files and directories in such a subdirectory must not be
touched, because prune_worktrees() or remove_temporary_files() do
not know what these files and directories are.

The dir-iterator API does not allow you to say "only scan the
directory without recursing into the subdirectories" so your code
ends up recursing into them, without knowing what the files and
directories inside are (or are not).  As Peff mentioned in his
review on the other one, if we add a knob to dir-iterator API to
allow you to tell it not to recurse, then it would make it possible
to do a faithful conversion from the original, but without such a
change, you are changing the behaviour.


Re: [PATCH] git-bisect.txt: add missing word

2017-04-01 Thread Junio C Hamano
Quentin Pradet  writes:

> Signed-off-by: Quentin Pradet 
> ---
>  Documentation/git-bisect.txt | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

The updated text is more grammatically correct.  It would have been
better if you avoided wrapping the lines unnecessarily, though.

Thanks.

> diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
> index bdd915a66..90148bb07 100644
> --- a/Documentation/git-bisect.txt
> +++ b/Documentation/git-bisect.txt
> @@ -137,9 +137,9 @@ respectively, in place of "good" and "bad". (But note 
> that you cannot
>  mix "good" and "bad" with "old" and "new" in a single session.)
>  
>  In this more general usage, you provide `git bisect` with a "new"
> -commit has some property and an "old" commit that doesn't have that
> -property. Each time `git bisect` checks out a commit, you test if that
> -commit has the property. If it does, mark the commit as "new";
> +commit that has some property and an "old" commit that doesn't have
> +that property. Each time `git bisect` checks out a commit, you test if
> +that commit has the property. If it does, mark the commit as "new";
>  otherwise, mark it as "old". When the bisection is done, `git bisect`
>  will report which commit introduced the property.


Re: Bug in "git am" when the body starts with spaces

2017-04-01 Thread Linus Torvalds
On Fri, Mar 31, 2017 at 5:52 PM, Linus Torvalds
 wrote:
>
> The continuation logic is oddly complex, and I can't follow the logic.
> But it is completely broken in how it thinks empty lines are somehow
> "continuations".

The attached patch seems to work for me. Comments?

The logic is fairly simple: if we encounter an empty line, and we have
pending in-body headers, we flush the pending headers, and mark us as
no longer in header mode.

The code used to just ignore empty lines when in header mode, which is
garbage, exactly because it would happily continue accumulating more
headers.

There may be other cases this misses, but this at least seems to fix
the case I encountered.

Linus
 mailinfo.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mailinfo.c b/mailinfo.c
index a489d9d0f..68037758f 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -757,8 +757,13 @@ static int handle_commit_msg(struct mailinfo *mi, struct 
strbuf *line)
assert(!mi->filter_stage);
 
if (mi->header_stage) {
-   if (!line->len || (line->len == 1 && line->buf[0] == '\n'))
+   if (!line->len || (line->len == 1 && line->buf[0] == '\n')) {
+   if (mi->inbody_header_accum.len) {
+   flush_inbody_header_accum(mi);
+   mi->header_stage = 0;
+   }
return 0;
+   }
}
 
if (mi->use_inbody_headers && mi->header_stage) {


Re: Very promising results with libpcre2

2017-04-01 Thread Ævar Arnfjörð Bjarmason
On Sat, Apr 1, 2017 at 8:24 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason  writes:
>
>> On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamano  wrote:
>>> Ævar Arnfjörð Bjarmason  writes:
>>>
 That enables the new JIT support in pcre v2:

   s/iterrx fixed   prx
 rx  2.19--  -33%  -44%
 fixed   1.47   49%--  -17%
 prx 1.22   79%   20%--
>>>
>>> The numbers with JIT does look "interesting".
>>>
>>> I couldn't quite tell if there are major incompatibilities in the
>>> regex language itself between two versions from their documentation,
>>> but assuming that there isn't (modulo bugfixes and enhancements) and
>>> assuming that we are going to use their standard matcher, it may be
>>> OK to just use the newer one without linking both.
>>
>> There's no incompatibilities in the regex language itself (modulo bugs
>> etc). So yeah, I'll prepare some patch to use v2.
>
> Just to make sure that we are on the same page.  While I do not see
> the need to link with both variants and allow users to choose
> between them at runtime, I do not know if the whole world is ready
> to drop pcre1 and use pcre2 (the latter of which has only been
> around for a bit over two years).
>
> So we'd probably want to do
>
>  (1) keep USE_LIBPCRE and enable v1 when set;
>  (2) add USE_LIBPCRE2 and enable v2 when set;
>  (3) make sure to error out when both are set.
>
> or something like that.  It is tempting to allow us to say
>
> make USE_LIBPCRE=2
>
> but the existing builds are likely to be depending on "is it set to
> anything? then use PCRE1" behaviour, so we unfortunately cannot take
> that route.

We're on the same page, all of this makes sense, there'll be some OS's
/ environments which for years won't be packaging pcre2 but will have
pcre1.

I am very tempted though to support them in parallel, if only for ease
of performance testing and to be able to roll out support for
grep.patternType=perl meaning pcre1 for now, but add a
grep.patternType=pcre2 for testing (and make grep.patternType=pcre1
work, with grep.patternType=pcre being synonymous with
grep.patternType=perl, i.e. whatever the default is).


Re: [PATCH 1/2] [GSOC] Convert signed flags to unsigned

2017-04-01 Thread Junio C Hamano
Robert Stanca  writes:

> Subject: Re: [PATCH 1/2] [GSOC] Convert signed flags to unsigned

Try

git shortlog --no-merges -40

to learn how commits are typically titled.  And then imagine this
patch makes into our codebase and appear in the output.

Can you tell what this commit is about from that single line title?
No.  You wouldn't even know that it is only touching bisect.h and
nothing else.

What do your readers want "shortlog" output to tell them about your
commit?  What are the most important thing (other than giving you an
excuse to say "I have completed a microproject and now I am eligible
to apply to GSoC" ;-)?  Your proposed commit log message, especially
its title, must be written to help future readers of the project
history.

Perhaps

bisect.h: make flags field in rev_list_info unsigned

would help them better.

>  Unsigned int is a closer representation of bitflags rather than signed int 
> that uses 1 special bit for sign.This shouldn't make much difference because 
> rev_list_info.flags uses only 2 bits(BISECT_SHOW_ALL and REV_LIST_QUIET)

Overlong lines, without space after full-stop before the second
sentence, without full-stop at the end of the sentence.

>
> Signed-off-by: Robert Stanca 
> ---
>  bisect.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/bisect.h b/bisect.h
> index acd12ef80..a979a7f11 100644
> --- a/bisect.h
> +++ b/bisect.h
> @@ -16,7 +16,7 @@ extern struct commit_list *filter_skipped(struct 
> commit_list *list,
>  
>  struct rev_list_info {
>   struct rev_info *revs;
> - int flags;
> + unsigned int flags;

Have you checked how this field is used?  For example, there is this
line somewhere in rev-list.c

int cnt, flags = info->flags;

and the reason why the code copies the value to a local variable
"flags" must be because it is going to use it, just like it and
other codepaths use info->flags, no?  It makes the code inconsistent
by leaving the local variable a signed int, I suspect.


Re: [PATCH 2/2] [GSOC] show_bisect_vars(): Use unsigned int instead of signed int for flags

2017-04-01 Thread Junio C Hamano
Robert Stanca  writes:

>  As rev_list_info's flag is unsigned int , var flags should have proper 
> type.Also var cnt could be unsigned as there's no negative number of commits 
> (all-reaches)
>
> Signed-off-by: Robert Stanca 
> ---

OK.  I would have squashed these two commits into one, though.


Re: [PATCH 2/2] [GSOC] show_bisect_vars(): Use unsigned int instead of signed int for flags

2017-04-01 Thread Robert Stanca
I am used to 1change per patch so it's easier  to redo specific
patches...if there are small changes(10 lines max) can i send them as
1 patch?

On Sat, Apr 1, 2017 at 10:13 PM, Junio C Hamano  wrote:
> Robert Stanca  writes:
>
>>  As rev_list_info's flag is unsigned int , var flags should have proper 
>> type.Also var cnt could be unsigned as there's no negative number of commits 
>> (all-reaches)
>>
>> Signed-off-by: Robert Stanca 
>> ---
>
> OK.  I would have squashed these two commits into one, though.


Re: [PATCH] Documentation: make 3-way merge warning more generic

2017-04-01 Thread Junio C Hamano
"brian m. carlson"  writes:

>  With the strategies that use 3-way merge (including the default, 
> 'recursive'),
> -if a change is made on both branches, but later reverted on one of the
> -branches, that change will be present in the merged result; some people find
> -this behavior confusing.  It occurs because only the heads and the merge base
> -are considered when performing a merge, not the individual commits.  The 
> merge
> -algorithm therefore considers the reverted change as no change at all, and
> -substitutes the changed version instead.
> +only the heads and the merge base are considered when performing a merge, not
> +the individual commits.  This means that if a change is made on both 
> branches,
> +but later reverted on one of the branches, that change will be present in the
> +merged result; some people find this behavior confusing.  The merge algorithm
> +considers the reverted change as no change at all, and substitutes the 
> changed
> +version instead.

I agree that it makes sense to say 3-way merge considers only three
points upfront.  

I do not think "this means" is helpful to the readers, though.  Drop
"This means that" and instead rewrite the remainder of the paragraph
after "present in the merged result", perhaps?

If a change is made on both branches but later reverted on
one of the branches, the net effect the branch that reverted
the change has to the project is nothing, while the net
effect of the other branch is to make that change.  The
3-way merge, i.e. "if one branch did something while the
other branch didn't do that something, merge result is to do
that something", rule keeps the change in the merge result.

We do not need to say "some people find this confusing" buried in a
long paragraph, which would not even serve the purpose of attracting
readers' eyes by shouting "THIS MAY BE DIFFICULT, PAY ATTENTION".
The last sentence in the original (and your version) only repeats
the same thing without saying what the real 3-way merge rule is, and
I think a rewrite like the above that makes it more explicit is
easier to understand.




Re: [PATCH 1/2] [GSOC] Convert signed flags to unsigned

2017-04-01 Thread Robert Stanca
Regarding the first part , i can resend those 2 patches rewriting the
commit message if you want.

On Sat, Apr 1, 2017 at 10:12 PM, Junio C Hamano  wrote:
> Robert Stanca  writes:
>
>> Subject: Re: [PATCH 1/2] [GSOC] Convert signed flags to unsigned
>
> Try
>
> git shortlog --no-merges -40
>
> to learn how commits are typically titled.  And then imagine this
> patch makes into our codebase and appear in the output.
>
> Can you tell what this commit is about from that single line title?
> No.  You wouldn't even know that it is only touching bisect.h and
> nothing else.
>
> What do your readers want "shortlog" output to tell them about your
> commit?  What are the most important thing (other than giving you an
> excuse to say "I have completed a microproject and now I am eligible
> to apply to GSoC" ;-)?  Your proposed commit log message, especially
> its title, must be written to help future readers of the project
> history.
>
> Perhaps
>
> bisect.h: make flags field in rev_list_info unsigned
>
> would help them better.
>
>>  Unsigned int is a closer representation of bitflags rather than signed int 
>> that uses 1 special bit for sign.This shouldn't make much difference because 
>> rev_list_info.flags uses only 2 bits(BISECT_SHOW_ALL and REV_LIST_QUIET)
>
> Overlong lines, without space after full-stop before the second
> sentence, without full-stop at the end of the sentence.
>
>>
>> Signed-off-by: Robert Stanca 
>> ---
>>  bisect.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/bisect.h b/bisect.h
>> index acd12ef80..a979a7f11 100644
>> --- a/bisect.h
>> +++ b/bisect.h
>> @@ -16,7 +16,7 @@ extern struct commit_list *filter_skipped(struct 
>> commit_list *list,
>>
>>  struct rev_list_info {
>>   struct rev_info *revs;
>> - int flags;
>> + unsigned int flags;
>
> Have you checked how this field is used?  For example, there is this
> line somewhere in rev-list.c
>
> int cnt, flags = info->flags;
>
> and the reason why the code copies the value to a local variable
> "flags" must be because it is going to use it, just like it and
> other codepaths use info->flags, no?  It makes the code inconsistent
> by leaving the local variable a signed int, I suspect.


Re: [RFC PATCH] git-news: obtain latest news for your favorite VCS

2017-04-01 Thread Christian Couder
On Sat, Apr 1, 2017 at 1:59 AM, Stefan Beller  wrote:

[...]

> diff --git a/git-news.sh b/git-news.sh
> new file mode 100755
> index 00..1707dc633e
> --- /dev/null
> +++ b/git-news.sh
> @@ -0,0 +1,4 @@
> +#!/bin/sh
> +#
> +
> +/usr/bin/sensible-browser https://git.github.io/rev_news/

Thanks for this entertaining patch :-)


Re: [PATCH v2 00/20] Separate `ref_cache` into a separate module

2017-04-01 Thread Ramsay Jones


On 31/03/17 17:01, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
>> This version literally only contains a few commit message changes and
>> one minor comment changes relative to v1. The code is identical. I
>> wasn't sure whether it is even worth sending this patch series to the
>> ML again; Junio, if you'd prefer I just send a link to a published
>> branch in such cases, please let me know.
> 
> The review on the list is not about letting me pick it up, but is
> about giving reviewing contributors a chance to comment.  I think
> for a series this important ;-) it is good that you are giving it
> multiple exposures so that people who were offline the last time can
> have a chance to look at it, even if the update is minimum.
> 
>> This patch series is also available from my GitHub fork [2] as branch
>> "separate-ref-cache". These patches depend on Duy's
>> nd/files-backend-git-dir branch.
> 
> I am getting the impression that the files-backend thing as well as
> this topic are ready for 'next'.  Please stop me if I missed something
> in these topics (especially the other one) that needs updating
> before that happens.

Hmm, are these branches 'tangled' with nd/prune-in-worktree?
(I think they were at one point, but maybe not now?).

I sent Duy some comments on that branch (an unused function and
a 'plain integer used as NULL pointer' warning).

ATB,
Ramsay Jones




Re: Very promising results with libpcre2

2017-04-01 Thread Jeffrey Walton
> Just to make sure that we are on the same page.  While I do not see
> the need to link with both variants and allow users to choose
> between them at runtime, I do not know if the whole world is ready
> to drop pcre1 and use pcre2 (the latter of which has only been
> around for a bit over two years).
>
> So we'd probably want to do
>
>  (1) keep USE_LIBPCRE and enable v1 when set;
>  (2) add USE_LIBPCRE2 and enable v2 when set;
>  (3) make sure to error out when both are set.
>
> or something like that.  It is tempting to allow us to say
>
> make USE_LIBPCRE=2
>
> but the existing builds are likely to be depending on "is it set to
> anything? then use PCRE1" behaviour, so we unfortunately cannot take
> that route.

Yeah, that's the question I kinda had.

> make USE_LIBPCRE=2

I'd prefer a configure option for consistency. Maybe:

--with-pcre  # Original PCRE
--with-pcre1  # Alias
--with-pcre2  # PCRE2

I prefer it because I usually do the following to see the interesting
things that's going on:

./configure --help

Often, I find a `--with-ssl` or similar. If `--with-ssl` fails, then I
go to the README and INSTALL to fine tune it.

By the way, if you are tweaking Configure, then consider adding a
--with-perl=X, too. Its consistent, it side steps the hard coded
/usr/bin/perl, and it signals to users its tunable.

Jeff


Re: [PATCH 2/2] [GSOC] show_bisect_vars(): Use unsigned int instead of signed int for flags

2017-04-01 Thread Junio C Hamano
Robert Stanca  writes:

> I am used to 1change per patch so it's easier  to redo specific
> patches...if there are small changes(10 lines max) can i send them as
> 1 patch?

It's not number of lines.  One line per patch does not make sense if
you need to make corresponding changes to two places, one line each,
in order to make the end result consistent.  If you change a type of
a structure field, and that field is assigned to a variable
somewhere, you would change the type of both that field and the
variable that receives its value at the same time in a single
commit, as that would be the logical unit of a smallest change that
still makes sense (i.e. either half of that change alone would not
make sense).





Re: [PATCH v2 00/20] Separate `ref_cache` into a separate module

2017-04-01 Thread Junio C Hamano
Ramsay Jones  writes:

>> I am getting the impression that the files-backend thing as well as
>> this topic are ready for 'next'.  Please stop me if I missed something
>> in these topics (especially the other one) that needs updating
>> before that happens.
>
> Hmm, are these branches 'tangled' with nd/prune-in-worktree?
> (I think they were at one point, but maybe not now?).

Michael's mh/separate-ref-cache builds directly on top of
nd/files-backend-git-dir topic.

nd/prune-in-worktree builds directly on top of
nd/worktree-kill-parse-ref, which in turn builds directly on top of
nd/files-backend-git-dir.

In that sense, Michael's series and Duy's later two series are
"tangled" (i.e. shares some common commits that are still not in
'master').  If nd/files-backend-git-dir that is shared among them is
ever rebased, all of them need to be rebased on top of it
consistently.

But if either of nd/prune-in-worktree and nd/worktree-kill-parse-ref
needs to be rerolled, that can be done independently from Michael's
series, as long as nd/files-backend-git-dir is solid and unchanging.




Re: Very promising results with libpcre2

2017-04-01 Thread Junio C Hamano
Jeffrey Walton  writes:

>> Just to make sure that we are on the same page.  While I do not see
>> the need to link with both variants and allow users to choose
>> between them at runtime, I do not know if the whole world is ready
>> to drop pcre1 and use pcre2 (the latter of which has only been
>> around for a bit over two years).
>>
>> So we'd probably want to do
>>
>>  (1) keep USE_LIBPCRE and enable v1 when set;
>>  (2) add USE_LIBPCRE2 and enable v2 when set;
>>  (3) make sure to error out when both are set.
>>
>> or something like that.  It is tempting to allow us to say
>>
>> make USE_LIBPCRE=2
>>
>> but the existing builds are likely to be depending on "is it set to
>> anything? then use PCRE1" behaviour, so we unfortunately cannot take
>> that route.
>
> Yeah, that's the question I kinda had.
>
>> make USE_LIBPCRE=2
>
> I'd prefer a configure option for consistency. Maybe:
> ...

It is way too early to worry about how ./configure support for this
will look like to the end user.

Because our Makefile is designed to be usable without configure, the
order we do things will be to first decide how we are going to use
Makefile variables to configure the feature (i.e. what you saw that
I said to Ævar).

Once we know the decision, then we arrange autoconf to spit out the
chosen Makefile variables using --with-pcre or whatever input.  This
step cannot start before we know the decision of the former.  The
one who writes --with-pcre support in ./configure would not know if
USE_LIBPCRE=YesPlease or something else is the desired output until
then.



Re: Very promising results with libpcre2

2017-04-01 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I am very tempted though to support them in parallel, if only for ease
> of performance testing and to be able to roll out support for
> grep.patternType=perl meaning pcre1 for now, but add a
> grep.patternType=pcre2 for testing (and make grep.patternType=pcre1
> work, with grep.patternType=pcre being synonymous with
> grep.patternType=perl, i.e. whatever the default is).

Perhaps.  As long as the code doesn't get too ugly, I think we would
survive ;-)



Re: Bug in "git am" when the body starts with spaces

2017-04-01 Thread Jeff King
On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote:

> On Fri, Mar 31, 2017 at 5:52 PM, Linus Torvalds
>  wrote:
> >
> > The continuation logic is oddly complex, and I can't follow the logic.
> > But it is completely broken in how it thinks empty lines are somehow
> > "continuations".
> 
> The attached patch seems to work for me. Comments?
> 
> The logic is fairly simple: if we encounter an empty line, and we have
> pending in-body headers, we flush the pending headers, and mark us as
> no longer in header mode.

Hmm. I think this may work. At first I thought it was too strict in
always checking inbody_header_accum.len, because we want this to kick in
always, whether there's whitespace continuation or not. But that
accumulator has to collect preemptively, before it knows if there's
continuation. So it will always be non-empty if we've seen _any_ header,
and it will remain non-empty as long as we keep parsing (because any
time we flush, we do so in order to handle another line).

IOW, I think this implements the state-machine thing I wrote in my
earlier email, because the state "are we inside in-body header parsing"
is always reflected by having a non-empty accumulator. It is a bit
non-obvious though.

-Peff


Re: [PATCH v4 2/5] dir_iterator: iterate over dir after its contents

2017-04-01 Thread Daniel Ferreira (theiostream)
Why exactly would it not be applicable to read_directory_recursively()?

On Thu, Mar 30, 2017 at 8:08 AM, Duy Nguyen  wrote:
> On Thu, Mar 30, 2017 at 1:39 PM, Michael Haggerty  
> wrote:
>> * DIR_ITERATOR_RECURSE -- recurse into subdirectories
>>
>> would make the set of possible options complete. If this option is not
>> set, then the iteration would be over the entries in a single directory
>> without traversing its subdirectories.
>
> And would make it possible to use dir-iterator everywhere (except
> read_directory_recursiveky). That would be really nice :)
> --
> Duy


Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows

2017-04-01 Thread Torsten Bögershausen
On 2017-03-31 21:44, Jakub Narębski wrote:
> W dniu 31.03.2017 o 14:38, Torsten Bögershausen pisze:
>> On 30.03.17 21:35, Jakub Narębski wrote:
>>> Hello,
>>>
>>> Recently I had to work on a project which uses legacy 8-bit encoding
>>> (namely cp1250 encoding) instead of utf-8 for text files (LaTeX
>>> documents).  My terminal, that is Git Bash from Git for Windows is set
>>> up for utf-8.
>>>
>>> I wanted for "git diff" and friends to return something sane on said
>>> utf-8 terminal, instead of mojibake.  There is 'encoding'
>>> gitattribute... but it works only for GUI ('git gui', that is).
>>>
>>> Therefore I have (ab)used textconv facility to convert from cp1250 of
>>> file encoding to utf-8 encoding of console.
>>>
>>> I have set the following in .gitattributes file:
>>>
>>>   ## LaTeX documents in cp1250 encoding
>>>   *.tex text diff=mylatex
>>>
>>> The 'mylatex' driver is defined as:
>>>
>>>   [diff "mylatex"]
>>> xfuncname = "^(((sub)*section|chapter|part)\\*{0,1}\\{.*)$"
>>> wordRegex = "[a-zA-Z]+|[{}]|.|[^\\{}[:space:]]+"
>>> textconv  = \"C:/Program Files/Git/usr/bin/iconv.exe\" -f cp1250 -t 
>>> utf-8
>>> cachetextconv = true
>>>
>>> And everything would be all right... if not the fact that Git appends
>>> spurious ^M to added lines in the `git diff` output.  Files use CRLF
>>> end-of-line convention (the native MS Windows one).
>>>
>>>   $ git diff test.tex
>>>   diff --git a/test.tex b/test.tex
>>>   index 029646e..250ab16 100644
>>>   --- a/test.tex
>>>   +++ b/test.tex
>>>   @@ -1,4 +1,4 @@
>>>   -\documentclass{article}
>>>   +\documentclass{mwart}^M
>>>   
>>>\usepackage[cp1250]{inputenc}
>>>\usepackage{polski}
>>>
>>> What gives?  Why there is this ^M tacked on the end of added lines,
>>> while it is not present in deleted lines, nor in content lines?
>>>
>>> Puzzled.
>>>
>>> P.S. Git has `i18n.commitEncoding` and `i18n.logOutputEncoding`; pity
>>> that it doesn't supports in core `encoding` attribute together with
>>> having `i18n.outputEncoding`.
>>
>> Is there a chance to give us a receipt how to reproduce it?
>> A complete test script or ?
>> (I don't want to speculate, if the invocation of iconv is the problem,
>>  where stdout is not in "binary mode", or however this is called under 
>> Windows)
> 
> I'm sorry, I though I posted whole recipe, but I missed some details
> in the above description of the case.
> 
> First, files are stored on filesystem using CRLF eol (DOS end-of-line
> convention).  Due to `core.autocrlf` they are converted to LF in blobs,
> that is in the index and in the repository.
> 
> Second, a textconv with filter preserving end-of-line needs to be
> configured.  I have used `iconv`, but I suspect that the problem would
> happen also for `cat`.
> 
> In the .gitattributes file, or .git/info/attributes add, for example:
> 
>   *.tex text diff=myconv
> 
> In the .git/config configure the textconv filter, for example:
> 
>   [diff "myconv"]
>  textconv  = iconv.exe -f cp1250 -t utf-8
> 
> Create a file which filename matches the attribute line, and which
> uses CRLF end of line convention, and add it to Git (adding it to
> the index):
> 
>   $ printf "foo\r\n" >foo.tex
>   $ git add foo.tex
> 
> Modify file (also with CRLF):
> 
>   $ printf "bar\r\n" >foo.tex
> 
> Check the difference
> 
>   $ git diff foo.tex
> 
> HTH
> 

There seems to be a bug in Git, when it comes to "git diff".
Before we feed the content of the working tree into the
diff machinery, a call to convert_to_git() should be made.
But it seems as there is something missing, the expected
"+fox" becomes a "+foxQ"

#!/bin/sh

test_description='CRLF with diff filter'

. ./test-lib.sh

test_expect_success 'setup' '
git config core.autocrlf input &&
printf "foo\r\n" >foo.tex &&
  git add foo.tex &&
echo >.gitattributes &&
git checkout -b master &&
git add .gitattributes &&
git commit -m "Add foo.txt" &&
cat >.git/config <<-\EOF
[diff "myconv"]
   textconv  = sed -e "s/f/g"
EOF
'

test_expect_success 'check EOL in diff' '
printf "fox\r\n" >foo.tex &&
cat >expect <<-\EOF &&
diff --git a/foo.tex b/foo.tex
index 257cc56..88c2893 100644
--- a/foo.tex
+++ b/foo.tex
@@ -1 +1 @@
-foo
+fox
EOF
git diff foo.tex | tr "\015" Q >actual &&
test_cmp expect actual
'

test_done



[PATCH v6 1/5] dir_iterator: add tests for dir_iterator API

2017-04-01 Thread Daniel Ferreira
Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir_iterator.

Create t/t0065-dir-iterator.sh, which tests that dir_iterator does
iterate through a whole directory tree.

Signed-off-by: Daniel Ferreira 
---
 Makefile |  1 +
 t/helper/.gitignore  |  1 +
 t/helper/test-dir-iterator.c | 28 +++
 t/t0065-dir-iterator.sh  | 54 
 4 files changed, 84 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

diff --git a/Makefile b/Makefile
index a5a11e7..d0245f3 100644
--- a/Makefile
+++ b/Makefile
@@ -607,6 +607,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
+TEST_PROGRAMS_NEED_X += test-dir-iterator
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index d6e8b36..9c9656d 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -3,6 +3,7 @@
 /test-config
 /test-date
 /test-delta
+/test-dir-iterator
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 000..06f03fc
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,28 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+int cmd_main(int argc, const char **argv) {
+   struct strbuf path = STRBUF_INIT;
+   struct dir_iterator *diter;
+
+   if (argc < 2) {
+   return 1;
+   }
+
+   strbuf_add(&path, argv[1], strlen(argv[1]));
+
+   diter = dir_iterator_begin(path.buf);
+
+   while (dir_iterator_advance(diter) == ITER_OK) {
+   if (S_ISDIR(diter->st.st_mode))
+   printf("[d] ");
+   else
+   printf("[f] ");
+
+   printf("(%s) %s\n", diter->relative_path, diter->path.buf);
+   }
+
+   return 0;
+}
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
new file mode 100755
index 000..b857c07
--- /dev/null
+++ b/t/t0065-dir-iterator.sh
@@ -0,0 +1,54 @@
+#!/bin/sh
+
+test_description='Test directory iteration.'
+
+. ./test-lib.sh
+
+cat >expect-sorted-output <<-\EOF &&
+[d] (a) ./dir/a
+[d] (a/b) ./dir/a/b
+[d] (a/b/c) ./dir/a/b/c
+[d] (d) ./dir/d
+[d] (d/e) ./dir/d/e
+[d] (d/e/d) ./dir/d/e/d
+[f] (a/b/c/d) ./dir/a/b/c/d
+[f] (a/e) ./dir/a/e
+[f] (b) ./dir/b
+[f] (c) ./dir/c
+[f] (d/e/d/a) ./dir/d/e/d/a
+EOF
+
+test_expect_success 'dir-iterator should iterate through all files' '
+   mkdir -p dir &&
+   mkdir -p dir/a/b/c/ &&
+   >dir/b &&
+   >dir/c &&
+   mkdir -p dir/d/e/d/ &&
+   >dir/a/b/c/d &&
+   >dir/a/e &&
+   >dir/d/e/d/a &&
+
+   test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output &&
+   rm -rf dir &&
+
+   test_cmp expect-sorted-output actual-pre-order-sorted-output
+'
+
+cat >expect-pre-order-output <<-\EOF &&
+[d] (a) ./dir/a
+[d] (a/b) ./dir/a/b
+[d] (a/b/c) ./dir/a/b/c
+[f] (a/b/c/d) ./dir/a/b/c/d
+EOF
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+   mkdir -p dir/a/b/c/ &&
+   >dir/a/b/c/d &&
+
+   test-dir-iterator ./dir >actual-pre-order-output &&
+   rm -rf dir &&
+
+   test_cmp expect-pre-order-output actual-pre-order-output
+'
+
+test_done
--
2.7.4 (Apple Git-66)



[PATCH v6 0/5] [GSoC] remove_subtree(): reimplement using iterators

2017-04-01 Thread Daniel Ferreira
This is the sixth version of a patch series that implements the GSoC
microproject of converting a recursive call to readdir() to use dir_iterator.

v1: 
https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsah16a...@mail.gmail.com/T/#t
v2: 
https://public-inbox.org/git/cacsjy8dxh-qpbblfafwpawusba9gvxa7x+mxljevykhk1zo...@mail.gmail.com/T/#t
v3: 
https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3jbfjkhwcdv_lphkcd71zrwm...@mail.gmail.com/T/#t
v4: 
https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnm...@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
v5: 
https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnm...@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453

Back in v5, Michael had a number of suggestions, all of which were applied
to this version (including a slightly modified version of his "biggish rewrite"
project to make dir_iterator's state machine simpler). The only suggestion that
did not make it into this series was that of not traversing into subdirectories,
since I believe it would be better off in another series that actually required
that feature (that is, I do not want a series to implement a feature it will
not need). The same goes for Junio's thought on a flag to list *only* 
directories
and no files on the v4 discussion.

Junio and Peff's comments about how to write to files in the tests were also
considered, and the tests were adjusted.

I chose to squash both the state machine refactor and the addition of the
new flags in a single commit. I do not know whether you will feel this is
the right choice but it seemed natural, since most of the state machine's
new logic would not even make sense without encompassing the new features.
I am, of course, open for feedback on this decision.

To Michael and Duy, thanks -- really -- for the encouraging comments! :)
I never regarded this microproject purely as a means to fulfill a GSoC
requirement, but as a way to get to learn more about Git, so I'm surely
not giving it up.

Once again, thanks for all the previous reviews,
Daniel.

Daniel Ferreira (5):
  dir_iterator: add tests for dir_iterator API
  remove_subtree(): test removing nested directories
  dir_iterator: add helpers to dir_iterator_advance
  dir_iterator: refactor state machine model
  remove_subtree(): reimplement using iterators

 Makefile|   1 +
 dir-iterator.c  | 190 ++--
 dir-iterator.h  |  28 --
 entry.c |  38 +++-
 refs/files-backend.c|   2 +-
 t/helper/.gitignore |   1 +
 t/helper/test-dir-iterator.c|  32 +++
 t/t0065-dir-iterator.sh | 109 +++
 t/t2000-checkout-cache-clash.sh |  11 +++
 9 files changed, 312 insertions(+), 100 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

--
2.7.4 (Apple Git-66)



[PATCH v6 4/5] dir_iterator: refactor state machine model

2017-04-01 Thread Daniel Ferreira
Perform major refactor of dir_iterator_advance(). dir_iterator has
ceased to rely on a convoluted state machine mechanism of two loops and
two state variables (level.initialized and level.dir_state). This serves
to ease comprehension of the iterator mechanism and ease addition of new
features to the iterator.

Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18). This is useful for
recursively removing a directory and calling rmdir() on a directory only
after all of its contents have been wiped.

Add an option for the dir_iterator API to iterate over the root
directory (the one it was initialized with) as well.

Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned new features to be enabled. The new default behavior
(i.e. flags set to 0) does not iterate over directories. Flag
DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
iterates over the root directory. These flags do not conflict with each
other and may be used simultaneously.

Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
the flags parameter introduced.

Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
test "post-order" and "iterate-over-root" modes.

Signed-off-by: Daniel Ferreira 
---
 dir-iterator.c   | 149 +++
 dir-iterator.h   |  28 ++--
 refs/files-backend.c |   2 +-
 t/helper/test-dir-iterator.c |   6 +-
 t/t0065-dir-iterator.sh  |  61 +-
 5 files changed, 179 insertions(+), 67 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index ce8bf81..4b23889 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-   int initialized;
-
DIR *dir;
 
/*
@@ -20,8 +18,11 @@ struct dir_iterator_level {
 * iteration and also iterated into):
 */
enum {
-   DIR_STATE_ITER,
-   DIR_STATE_RECURSE
+   DIR_STATE_PUSHED,
+   DIR_STATE_PRE_ITERATION,
+   DIR_STATE_ITERATING,
+   DIR_STATE_POST_ITERATION,
+   DIR_STATE_EXHAUSTED
} dir_state;
 };
 
@@ -48,15 +49,20 @@ struct dir_iterator_int {
 * that will be included in this iteration.
 */
struct dir_iterator_level *levels;
+
+   /* Holds the flags passed to dir_iterator_begin(). */
+   unsigned flags;
 };
 
 static inline void push_dir_level(struct dir_iterator_int *iter, struct 
dir_iterator_level *level)
 {
-   level->dir_state = DIR_STATE_RECURSE;
ALLOC_GROW(iter->levels, iter->levels_nr + 1,
   iter->levels_alloc);
+
+   /* Push a new level */
level = &iter->levels[iter->levels_nr++];
-   level->initialized = 0;
+   level->dir = NULL;
+   level->dir_state = DIR_STATE_PUSHED;
 }
 
 static inline int pop_dir_level(struct dir_iterator_int *iter)
@@ -75,18 +81,35 @@ static inline int set_iterator_data(struct dir_iterator_int 
*iter, struct dir_it
}
 
/*
-* We have to set these each time because
-* the path strbuf might have been realloc()ed.
+* Check if we are dealing with the root directory as an
+* item that's being iterated through.
 */
-   iter->base.relative_path =
-   iter->base.path.buf + iter->levels[0].prefix_len;
+   if (level->dir_state != DIR_STATE_ITERATING &&
+   iter->levels_nr == 1) {
+   iter->base.relative_path = ".";
+   }
+   else {
+   iter->base.relative_path =
+   iter->base.path.buf + iter->levels[0].prefix_len;
+   }
iter->base.basename =
iter->base.path.buf + level->prefix_len;
-   level->dir_state = DIR_STATE_ITER;
 
return 0;
 }
 
+/*
+ * This function uses a state machine with the following states:
+ * -> DIR_STATE_PUSHED: the directory has been pushed to the
+ * iterator traversal tree.
+ * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
+ * dirpath has already been returned if pre-order traversal is set.
+ * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
+ * through it.
+ * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
+ * We are ready to close it.
+ * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
+ */
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
struct dir_iterator_int *iter =
@@ -97,7 +120,18 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
   

[PATCH v6 2/5] remove_subtree(): test removing nested directories

2017-04-01 Thread Daniel Ferreira
Test removing a nested directory when an attempt is made to restore the
index to a state where it does not exist. A similar test could be found
previously in t/t2000-checkout-cache-clash.sh, but it did not check for
nested directories, which could allow a faulty implementation of
remove_subtree() pass the tests.

Signed-off-by: Daniel Ferreira 
---
 t/t2000-checkout-cache-clash.sh | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
index de3edb5..ac10ba3 100755
--- a/t/t2000-checkout-cache-clash.sh
+++ b/t/t2000-checkout-cache-clash.sh
@@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with 
--prefix' '
git checkout-index -a -f --prefix=there/
 '

+test_expect_success 'git checkout-index -f should remove nested subtrees' '
+   echo content >path &&
+   git update-index --add path &&
+   rm path &&
+   mkdir -p path/with/nested/paths &&
+   echo content >path/file1 &&
+   echo content >path/with/nested/paths/file2 &&
+   git checkout-index -f -a &&
+   test ! -d path
+'
+
 test_done
--
2.7.4 (Apple Git-66)



[PATCH v6 5/5] remove_subtree(): reimplement using iterators

2017-04-01 Thread Daniel Ferreira
Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira 
---
 entry.c | 38 --
 1 file changed, 12 insertions(+), 26 deletions(-)

diff --git a/entry.c b/entry.c
index c6eea24..309b9ad 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,8 @@
 #include "blob.h"
 #include "dir.h"
 #include "streaming.h"
+#include "iterator.h"
+#include "dir-iterator.h"

 static void create_directories(const char *path, int path_len,
   const struct checkout *state)
@@ -44,33 +46,17 @@ static void create_directories(const char *path, int 
path_len,
free(buf);
 }

-static void remove_subtree(struct strbuf *path)
+static void remove_subtree(const char *path)
 {
-   DIR *dir = opendir(path->buf);
-   struct dirent *de;
-   int origlen = path->len;
-
-   if (!dir)
-   die_errno("cannot opendir '%s'", path->buf);
-   while ((de = readdir(dir)) != NULL) {
-   struct stat st;
-
-   if (is_dot_or_dotdot(de->d_name))
-   continue;
-
-   strbuf_addch(path, '/');
-   strbuf_addstr(path, de->d_name);
-   if (lstat(path->buf, &st))
-   die_errno("cannot lstat '%s'", path->buf);
-   if (S_ISDIR(st.st_mode))
-   remove_subtree(path);
-   else if (unlink(path->buf))
-   die_errno("cannot unlink '%s'", path->buf);
-   strbuf_setlen(path, origlen);
+   struct dir_iterator *diter = dir_iterator_begin(path, 
DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR);
+
+   while (dir_iterator_advance(diter) == ITER_OK) {
+   if (S_ISDIR(diter->st.st_mode)) {
+   if (rmdir(diter->path.buf))
+   die_errno("cannot rmdir '%s'", diter->path.buf);
+   } else if (unlink(diter->path.buf))
+   die_errno("cannot unlink '%s'", diter->path.buf);
}
-   closedir(dir);
-   if (rmdir(path->buf))
-   die_errno("cannot rmdir '%s'", path->buf);
 }

 static int create_file(const char *path, unsigned int mode)
@@ -282,7 +268,7 @@ int checkout_entry(struct cache_entry *ce,
return 0;
if (!state->force)
return error("%s is a directory", path.buf);
-   remove_subtree(&path);
+   remove_subtree(path.buf);
} else if (unlink(path.buf))
return error_errno("unable to unlink old '%s'", 
path.buf);
} else if (state->not_new)
--
2.7.4 (Apple Git-66)



[PATCH v6 3/5] dir_iterator: add helpers to dir_iterator_advance

2017-04-01 Thread Daniel Ferreira
Create inline helpers to dir_iterator_advance(). Make
dir_iterator_advance()'s code more legible and allow some behavior to
be reusable.

Signed-off-by: Daniel Ferreira 
---
 dir-iterator.c | 65 +-
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..ce8bf81 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,43 @@ struct dir_iterator_int {
struct dir_iterator_level *levels;
 };

+static inline void push_dir_level(struct dir_iterator_int *iter, struct 
dir_iterator_level *level)
+{
+   level->dir_state = DIR_STATE_RECURSE;
+   ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+  iter->levels_alloc);
+   level = &iter->levels[iter->levels_nr++];
+   level->initialized = 0;
+}
+
+static inline int pop_dir_level(struct dir_iterator_int *iter)
+{
+   return --iter->levels_nr;
+}
+
+static inline int set_iterator_data(struct dir_iterator_int *iter, struct 
dir_iterator_level *level)
+{
+   if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+   if (errno != ENOENT)
+   warning("error reading path '%s': %s",
+   iter->base.path.buf,
+   strerror(errno));
+   return -1;
+   }
+
+   /*
+* We have to set these each time because
+* the path strbuf might have been realloc()ed.
+*/
+   iter->base.relative_path =
+   iter->base.path.buf + iter->levels[0].prefix_len;
+   iter->base.basename =
+   iter->base.path.buf + level->prefix_len;
+   level->dir_state = DIR_STATE_ITER;
+
+   return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
struct dir_iterator_int *iter =
@@ -84,11 +121,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 * over; now prepare to iterate into
 * it.
 */
-   level->dir_state = DIR_STATE_RECURSE;
-   ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-  iter->levels_alloc);
-   level = &iter->levels[iter->levels_nr++];
-   level->initialized = 0;
+   push_dir_level(iter, level);
continue;
} else {
/*
@@ -104,7 +137,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 * This level is exhausted (or wasn't opened
 * successfully); pop up a level.
 */
-   if (--iter->levels_nr == 0)
+   if (pop_dir_level(iter) == 0)
return dir_iterator_abort(dir_iterator);

continue;
@@ -129,7 +162,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
iter->base.path.buf, 
strerror(errno));

level->dir = NULL;
-   if (--iter->levels_nr == 0)
+   if (pop_dir_level(iter) == 0)
return dir_iterator_abort(dir_iterator);
break;
}
@@ -138,23 +171,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
continue;

strbuf_addstr(&iter->base.path, de->d_name);
-   if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-   if (errno != ENOENT)
-   warning("error reading path '%s': %s",
-   iter->base.path.buf,
-   strerror(errno));
-   continue;
-   }

-   /*
-* We have to set these each time because
-* the path strbuf might have been realloc()ed.
-*/
-   iter->base.relative_path =
-   iter->base.path.buf + 
iter->levels[0].prefix_len;
-   iter->base.basename =
-   iter->base.path.buf + level->prefix_len;
-   level->dir_state = DIR_STATE_ITER;
+   if (set_iterator_data(iter, level))
+   continue;

return ITER_OK;
}
--
2.7.4 (Apple Git-66)



Re: [PATCH v6 4/5] dir_iterator: refactor state machine model

2017-04-01 Thread Daniel Ferreira (theiostream)
Gah, I just realized I failed to correct refs/files-backend.c's
behavior and kept 0 instead of DIR_ITERATOR_PRE_ORDER_TRAVERSAL as its
flag. I'll correct this on a v7, but I'll wait for the rest of your
reviews before sending that revision.

On Sun, Apr 2, 2017 at 1:35 AM, Daniel Ferreira  wrote:
> Perform major refactor of dir_iterator_advance(). dir_iterator has
> ceased to rely on a convoluted state machine mechanism of two loops and
> two state variables (level.initialized and level.dir_state). This serves
> to ease comprehension of the iterator mechanism and ease addition of new
> features to the iterator.
>
> Create an option for the dir_iterator API to iterate over subdirectories
> only after having iterated through their contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18). This is useful for
> recursively removing a directory and calling rmdir() on a directory only
> after all of its contents have been wiped.
>
> Add an option for the dir_iterator API to iterate over the root
> directory (the one it was initialized with) as well.
>
> Add the "flags" parameter to dir_iterator_create, allowing for the
> aforementioned new features to be enabled. The new default behavior
> (i.e. flags set to 0) does not iterate over directories. Flag
> DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
> so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
> directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
> iterates over the root directory. These flags do not conflict with each
> other and may be used simultaneously.
>
> Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
> the flags parameter introduced.
>
> Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
> test "post-order" and "iterate-over-root" modes.
>
> Signed-off-by: Daniel Ferreira 
> ---
>  dir-iterator.c   | 149 
> +++
>  dir-iterator.h   |  28 ++--
>  refs/files-backend.c |   2 +-
>  t/helper/test-dir-iterator.c |   6 +-
>  t/t0065-dir-iterator.sh  |  61 +-
>  5 files changed, 179 insertions(+), 67 deletions(-)
>
> diff --git a/dir-iterator.c b/dir-iterator.c
> index ce8bf81..4b23889 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -4,8 +4,6 @@
>  #include "dir-iterator.h"
>
>  struct dir_iterator_level {
> -   int initialized;
> -
> DIR *dir;
>
> /*
> @@ -20,8 +18,11 @@ struct dir_iterator_level {
>  * iteration and also iterated into):
>  */
> enum {
> -   DIR_STATE_ITER,
> -   DIR_STATE_RECURSE
> +   DIR_STATE_PUSHED,
> +   DIR_STATE_PRE_ITERATION,
> +   DIR_STATE_ITERATING,
> +   DIR_STATE_POST_ITERATION,
> +   DIR_STATE_EXHAUSTED
> } dir_state;
>  };
>
> @@ -48,15 +49,20 @@ struct dir_iterator_int {
>  * that will be included in this iteration.
>  */
> struct dir_iterator_level *levels;
> +
> +   /* Holds the flags passed to dir_iterator_begin(). */
> +   unsigned flags;
>  };
>
>  static inline void push_dir_level(struct dir_iterator_int *iter, struct 
> dir_iterator_level *level)
>  {
> -   level->dir_state = DIR_STATE_RECURSE;
> ALLOC_GROW(iter->levels, iter->levels_nr + 1,
>iter->levels_alloc);
> +
> +   /* Push a new level */
> level = &iter->levels[iter->levels_nr++];
> -   level->initialized = 0;
> +   level->dir = NULL;
> +   level->dir_state = DIR_STATE_PUSHED;
>  }
>
>  static inline int pop_dir_level(struct dir_iterator_int *iter)
> @@ -75,18 +81,35 @@ static inline int set_iterator_data(struct 
> dir_iterator_int *iter, struct dir_it
> }
>
> /*
> -* We have to set these each time because
> -* the path strbuf might have been realloc()ed.
> +* Check if we are dealing with the root directory as an
> +* item that's being iterated through.
>  */
> -   iter->base.relative_path =
> -   iter->base.path.buf + iter->levels[0].prefix_len;
> +   if (level->dir_state != DIR_STATE_ITERATING &&
> +   iter->levels_nr == 1) {
> +   iter->base.relative_path = ".";
> +   }
> +   else {
> +   iter->base.relative_path =
> +   iter->base.path.buf + iter->levels[0].prefix_len;
> +   }
> iter->base.basename =
> iter->base.path.buf + level->prefix_len;
> -   level->dir_state = DIR_STATE_ITER;
>
> return 0;
>  }
>
> +/*
> + * This function uses a state machine with the following states:
> + * -> DIR_STATE_PUSHED: the directory has been pushed to the
> + * iterator traversal tree.
> + * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
> + * dirpath has

Re: [PATCH v6 4/5] dir_iterator: refactor state machine model

2017-04-01 Thread Michael Haggerty
On 04/02/2017 06:46 AM, Daniel Ferreira (theiostream) wrote:
> Gah, I just realized I failed to correct refs/files-backend.c's
> behavior and kept 0 instead of DIR_ITERATOR_PRE_ORDER_TRAVERSAL as its
> flag. I'll correct this on a v7, but I'll wait for the rest of your
> reviews before sending that revision.

Even if I make that change, a lot of tests fail, both when I run the
tests against your commit 4/5 and when I run them against 5/5.

I tested by applying your patch series to the current upstream master
branch. When I apply the patches, there is one conflict (related to
#includes), in entry.c. Maybe the failure is because you are working
from a different branch point. What commit are you branching your series
off of?

You should definitely run the full test suite against each of your
commits before you submit. One convenient way to do so is with my tool
`git-test` [1], which runs the tests and keeps track of their results to
avoid testing the same commit over and over again:

# One-time setup (adjust the "-j" values based on what's
# fastest on your computer):
git test add "make -j8 && make -j16 test"

# Run tests against a range of commits:
git test run master..mybranch

Even better is to run the tests in a separate linked worktree, so that
you can continue working in your main repository while the tests are
running. The `git-test` README explains how.

It also speeds things up (and gives better output) if you use `prove` to
run the tests, and run the tests on a ramdisk if possible. To do so, I
include the following lines in my `config.mak`:

TMP := /var/tmp
ROOT := $(TMP)/git-tests-$(shell git rev-parse --show-toplevel |
sha1sum | head -c10)
DEFAULT_TEST_TARGET = prove
GIT_TEST_OPTS = -q --root=$(ROOT) --tee
GIT_PROVE_OPTS = --timer --jobs 16 --state=fresh,hot,slow,save

(On my system `/var/tmp` is a ramdisk.)

Finally, when I submit a patch series, I usually also push a copy to my
GitHub fork of the project and include a reference to that branch in my
cover letter. That makes it easier for casual readers to fetch the code
and/or look at it online, and also makes it 100% clear what branch point
I've chosen. This is by no means obligatory but I find it helpful.

There doesn't seem to be much point reviewing broken code, so I'll wait
for your feedback and/or fix.

Michael

[1] https://github.com/mhagger/git-test



Bug: 'git config --local user.email=' fails silently?

2017-04-01 Thread Knut Omang
Hi,

>From the documentation I would have expected 

git config --local user.email=alt.email@alt.domain

to create a section 

[user]
email=alt.email@alt.domain

in the local .git/config.

Instead it returns status 1 with no error message.

Is this intentional?

If I add the [user] section manually, it works as expected/desired:
my commits in that tree uses the alternate mail address.

git --version
git version 2.7.4

Thanks,
Knut Omang

-- 
Knut Omang, Ph.D
Principal Software Engineer
Oracle Linux and Virtualization Engineering
ORACLE Norway | Olaf Helsets vei 6 | 0694 Oslo
Email: knut.om...@oracle.com | Phone:  +47 41 46 22 10