date:20150317

Re: [PATCH v3 1/4] git-credential-store: support multiple credential files

2015-03-17 Thread Paul Tan

Hi all, thanks for providing your feedback.

On Sun, Mar 15, 2015 at 6:14 AM, Junio C Hamano  wrote:
> I am not sure if this is not a premature over-engineering---I am not
> convinced that such a future need will be fulfilled by passing just
> a single default_fn this version already passes, or it needs even
> more parameters that this version does not pass yet, and the
> interface to the function needs to be updated at that point when you
> need it _anyways_. One thing that we all agree is that we don't need
> the extra parameter within the context of what the current code does.

After considering everyone's responses, I've decided to remove the
argument in the v4 patch. As Junio says, when there is a policy change
the code can be modified anyway.

Regards,
Paul
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] sha1fd_check: die when we cannot open the file

2015-03-17 Thread Jeff King

Right now we return a NULL "struct sha1file" if we encounter
an error. However, the sole caller (write_idx_file) does not
check the return value, and will segfault if we hit this
case.

One option would be to handle the error in the caller.
However, there's really nothing for it to do but die. This
code path is hit during "git index-pack --verify"; after we
verify the packfile, we check that the ".idx" we would
generate from it is byte-wise identical to what is on disk.
We hit the error (and segfault) if we can't open the .idx
file (a likely cause of this is that somebody else ran "git
repack -ad" while we were verifying). Since we can't
complete the requested verification, we really have no
choice but to die.

Furthermore, the rest of the sha1fd_* functions simply die
on errors. So if were to open the file successfully, for
example, and then hit a read error, sha1write would call
die() for us. So pushing the die() down into sha1fd_check
keeps the interface consistent.

Signed-off-by: Jeff King 
---
 csum-file.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index b00b215..a172199 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -130,14 +130,10 @@ struct sha1file *sha1fd_check(const char *name)
 
sink = open("/dev/null", O_WRONLY);
if (sink < 0)
-   return NULL;
+   die_errno("unable to open /dev/null");
check = open(name, O_RDONLY);
-   if (check < 0) {
-   int saved_errno = errno;
-   close(sink);
-   errno = saved_errno;
-   return NULL;
-   }
+   if (check < 0)
+   die_errno("unable to open '%s'", name);
f = sha1fd(sink, name);
f->check_fd = check;
return f;
-- 
2.3.3.520.g3cfbb5d
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5/GSoC/MICRO] revision: forbid combining --graph and --no-walk

2015-03-17 Thread Dongcan Jiang

Because "--graph" is about connected history while --no-walk
is about discrete points, it does not make sense to allow
giving these two options at the same time. [1]

This change makes a few calls to "show --graph" fail in
t4052, but asking to show one commit with graph is a
nonsensical thing to do. Thus, tests on "show --graph" in
t4052 have been removed. [2,3] Same tests on "show" without
--graph option have already been tested in 4052.

3 testcases have been added to test this patch.

[1]: http://article.gmane.org/gmane.comp.version-control.git/216083
[2]: http://article.gmane.org/gmane.comp.version-control.git/264950
[3]: http://article.gmane.org/gmane.comp.version-control.git/265107

Helped-By: Eric Sunshine 
Helped-By: René Scharfe 
Helped-By: Junio C Hamano 
Signed-off-by: Dongcan Jiang 
---
 Documentation/rev-list-options.txt |  2 ++
 revision.c |  2 ++
 t/t4052-stat-output.sh | 14 +++---
 t/t4202-log.sh |  4 
 t/t6014-rev-list-all.sh|  4 
 t/t7007-show.sh|  4 
 6 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 4ed8587..136abdf 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -679,6 +679,7 @@ endif::git-rev-list[]
given on the command line. Otherwise (if `sorted` or no argument
was given), the commits are shown in reverse chronological order
by commit time.
+   Cannot be combined with `--graph`.
 
 --do-walk::
Overrides a previous `--no-walk`.
@@ -781,6 +782,7 @@ you would get an output like this:
on the left hand side of the output.  This may cause extra lines
to be printed in between commits, in order for the graph history
to be drawn properly.
+   Cannot be combined with `--no-walk`.
 +
 This enables parent rewriting, see 'History Simplification' below.
 +
diff --git a/revision.c b/revision.c
index 66520c6..6cd91dd 100644
--- a/revision.c
+++ b/revision.c
@@ -2339,6 +2339,8 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
 
if (revs->reflog_info && revs->graph)
die("cannot combine --walk-reflogs with --graph");
+   if (revs->no_walk && revs->graph)
+   die("cannot combine --no-walk with --graph");
if (!revs->reflog_info && revs->grep_filter.use_reflog_filter)
die("cannot use --grep-reflog without --walk-reflogs");
 
diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
index b68afef..54f10cf 100755
--- a/t/t4052-stat-output.sh
+++ b/t/t4052-stat-output.sh
@@ -99,7 +99,7 @@ do
test_cmp "$expect" actual
'
 
-   test "$cmd" != diff || continue
+   test "$cmd" != diff && test "$cmd" != show || continue
 
test_expect_success "$cmd --graph $verb COLUMNS (big change)" '
COLUMNS=200 git $cmd $args --graph >output
@@ -127,7 +127,7 @@ do
test_cmp "$expect" actual
'
 
-   test "$cmd" != diff || continue
+   test "$cmd" != diff && test "$cmd" != show || continue
 
test_expect_success "$cmd --graph $verb not enough COLUMNS (big 
change)" '
COLUMNS=40 git $cmd $args --graph >output
@@ -155,7 +155,7 @@ do
test_cmp "$expect" actual
'
 
-   test "$cmd" != diff || continue
+   test "$cmd" != diff && test "$cmd" != show || continue
 
test_expect_success "$cmd --graph $verb statGraphWidth config" '
git -c diff.statGraphWidth=26 $cmd $args --graph >output
@@ -196,7 +196,7 @@ do
test_cmp expect actual
'
 
-   test "$cmd" != diff || continue
+   test "$cmd" != diff && test "$cmd" != show || continue
 
test_expect_success "$cmd --stat-width=width --graph with big change" '
git $cmd $args --stat-width=40 --graph >output
@@ -236,7 +236,7 @@ do
test_cmp expect actual
'
 
-   test "$cmd" != diff || continue
+   test "$cmd" != diff && test "$cmd" != show || continue
 
test_expect_success "$cmd --stat=width --graph with big change is 
balanced" '
git $cmd $args --stat-width=60 --graph >output &&
@@ -270,7 +270,7 @@ do
test_cmp "$expect" actual
'
 
-   test "$cmd" != diff || continue
+   test "$cmd" != diff && test "$cmd" != show || continue
 
test_expect_success "$cmd --graph $verb COLUMNS (long filename)" '
COLUMNS=200 git $cmd $args --graph >output
@@ -299,7 +299,7 @@ do
test_cmp "$expect" actual
'
 
-   test "$cmd" != diff || continue
+   test "$cmd" != diff && test "$cmd" != show || continue
 
test_expect_success COLUMNS_CAN_BE_1 \
"$cmd --graph $verb prefix greater than COLUMNS (big change)" '
diff --git a/t/t4202-log.sh b/t/t42

Git with Lader logic

2015-03-17 Thread Bharat Suvarna

Hi 

I am trying to find a way of using version control on PLC programmers like 
Allen Bradley PLC. I can't find a way of this.

Could you please give me an idea if it will work with Plc programs. Which are 
basically Ladder logic.

Sent from my iPad--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

What's cooking in git.git (Mar 2015, #06; Tue, 17)

2015-03-17 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ak/git-done-help-cleanup (2015-03-06) 1 commit
  (merged to 'next' on 2015-03-10 at 971382b)
 + git: make was_alias and done_help non-static

 Code simplification.


* es/rebase-i-count-todo (2015-03-06) 2 commits
  (merged to 'next' on 2015-03-10 at fff95d5)
 + rebase-interactive: re-word "item count" comment
 + rebase-interactive: suppress whitespace preceding item count

 "git rebase -i" recently started to include the number of
 commits in the insn sheet to be processed, but on a platform
 that prepends leading whitespaces to "wc -l" output, the numbers
 are shown with extra whitespaces that aren't necessary.


* mg/doc-status-color-slot (2015-03-10) 1 commit
  (merged to 'next' on 2015-03-12 at e53910a)
 + config,completion: add color.status.unmerged

 Documentation fixes.


* mg/sequencer-commit-messages-always-verbatim (2015-03-06) 1 commit
  (merged to 'next' on 2015-03-10 at 6a09295)
 + sequencer: preserve commit messages

 "git cherry-pick" used to clean-up the log message even when it is
 merely replaying an existing commit.  It now replays the message
 verbatim unless you are editing the message of resulting commits.


* mg/status-v-v (2015-03-06) 3 commits
  (merged to 'next' on 2015-03-10 at 4fa5af0)
 + commit/status: show the index-worktree diff with -v -v
 + t7508: test git status -v
 + t7508: .gitignore 'expect' and 'output' files

 "git status" now allows the "-v" to be given twice to show the
 differences that are left in the working tree not to be committed.


* rs/deflate-init-cleanup (2015-03-05) 1 commit
  (merged to 'next' on 2015-03-10 at 053157f)
 + zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}

 Code simplification.


* rs/zip-text (2015-03-05) 1 commit
  (merged to 'next' on 2015-03-10 at 2f3fa92)
 + archive-zip: mark text files in archives

 "git archive" can now be told to set the 'text' attribute in the
 resulting zip archive.


* sg/completion-remote (2015-03-06) 2 commits
  (merged to 'next' on 2015-03-10 at e1cd42b)
 + completion: simplify __git_remotes()
 + completion: add a test for __git_remotes() helper function

 Code simplification.

--
[New Topics]

* js/completion-ctags-pattern-substitution-fix (2015-03-14) 1 commit
  (merged to 'next' on 2015-03-17 at 4a68861)
 + contrib/completion: escape the forward slash in __git_match_ctag

 The code that reads from the ctags file in the completion script
 (in contrib/) did not spell ${param/pattern/string} substitution
 correctly, which happened to work with bash but not with zsh.

 Will merge to 'master'.


* jc/a-lone-dash-stands-for-previous-branch (2015-03-16) 1 commit
 - "-" and "@{-1}" on various programs

 Lose special case code to make a lone dash "-" mean the previous
 branch aka "@{-1}" from a handful subcommands, and instead support
 the notation throughout the system by reimplementing it at the
 revisions layer.

 Needs tests, documentation updates, etc.


* jc/submitting-patches-mention-send-email (2015-03-15) 1 commit
 - SubmittingPatches: encourage users to use format-patch and send-email

 Recommend format-patch and send-email for those who want to submit
 patches to this project.

--
[Stalled]

* nd/untracked-cache (2015-03-12) 24 commits
 - git-status.txt: advertisement for untracked cache
 - untracked cache: guard and disable on system changes
 - mingw32: add uname()
 - t7063: tests for untracked cache
 - update-index: test the system before enabling untracked cache
 - update-index: manually enable or disable untracked cache
 - status: enable untracked cache
 - untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
 - untracked cache: mark index dirty if untracked cache is updated
 - untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
 - untracked cache: avoid racy timestamps
 - read-cache.c: split racy stat test to a separate function
 - untracked cache: invalidate at index addition or removal
 - untracked cache: load from UNTR index extension
 - untracked cache: save to an index extension
 - ewah: add convenient wrapper ewah_serialize_strbuf()
 - untracked cache: don't open non-existent .gitignore
 - untracked cache: mark what dirs should be recursed/saved
 - untracked cache: record/validate dir mtime and reuse cached output
 - untracked cache: make a wrapper around {open,read,close}dir()
 - untracked cache: invalidate dirs recursively if .gitignore changes
 - untracked cache: initial untracked cache validation
 - untracked cache: record .gitignore information and dir hierarchy
 -

Re: [PATCH v4/GSoC/MICRO] revision: forbid combining --graph and --no-walk

2015-03-17 Thread Eric Sunshine

On Tue, Mar 17, 2015 at 7:18 PM, Junio C Hamano  wrote:
> Dongcan Jiang  writes:
>> diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
>> index b68afef..a989e8f 100755
>> --- a/t/t4052-stat-output.sh
>> +++ b/t/t4052-stat-output.sh
>> @@ -99,7 +99,7 @@ do
>>   test_cmp "$expect" actual
>>   '
>>
>> - test "$cmd" != diff || continue
>> + test "$cmd" != diff -a "$cmd" != show || continue
>
> I think we avoid -a and -o used with test (don't we have a write-up
> on this somewhere?).

The very last item in the shell script section of CodingGuidelines discusses it.

> Write it like this
>
> test "$cmd" != diff &&
> test "$cmd" != show || continue
>
> or perhaps like this
>
> case "$cmd" in diff|show) continue ;; esac
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4/GSoC/MICRO] revision: forbid combining --graph and --no-walk

2015-03-17 Thread Junio C Hamano

Dongcan Jiang  writes:

> Because "--graph" is about connected history while --no-walk
> is about discrete points, it does not make sense to allow
> giving these two options at the same time. [1]
>
> This change makes a few calls to "show --graph" fail in
> t4052, but asking to show one commit with graph is a
> nonsensical thing to do. Thus, tests on "show --graph" in
> t4052 have been removed. [2,3] Same tests on "show" without
> --graph option have already been tested in 4052.

This looks almost perfect.
>
> diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
> index b68afef..a989e8f 100755
> --- a/t/t4052-stat-output.sh
> +++ b/t/t4052-stat-output.sh
> @@ -99,7 +99,7 @@ do
>   test_cmp "$expect" actual
>   '
>  
> - test "$cmd" != diff || continue
> + test "$cmd" != diff -a "$cmd" != show || continue

I think we avoid -a and -o used with test (don't we have a write-up
on this somewhere?).  Write it like this

test "$cmd" != diff &&
test "$cmd" != show || continue

or perhaps like this

case "$cmd" in diff|show) continue ;; esac

Other than that I do not see anything objectionable.

Thanks, good job.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/14] numparse module: systematically tighten up integer parsing

2015-03-17 Thread Duy Nguyen

On Tue, Mar 17, 2015 at 11:00 PM, Michael Haggerty  wrote:
> Michael Haggerty (14):
>   numparse: new module for parsing integral numbers
>   cacheinfo_callback(): use convert_ui() when handling "--cacheinfo"
>   write_subdirectory(): use convert_ui() for parsing mode
>   handle_revision_opt(): use skip_prefix() in many places
>   handle_revision_opt(): use convert_i() when handling "-"
>   strtoul_ui(), strtol_i(): remove functions
>   handle_revision_opt(): use convert_ui() when handling "--abbrev="
>   builtin_diff(): detect errors when parsing --unified argument
>   opt_arg(): val is always non-NULL
>   opt_arg(): use convert_i() in implementation
>   opt_arg(): report errors parsing option values
>   opt_arg(): simplify pointer handling
>   diff_opt_parse(): use convert_i() when handling "-l"
>   diff_opt_parse(): use convert_i() when handling --abbrev=

Thank you for doing it. I was about to write another number parser and
you did it :D Maybe you can add another patch to convert the only
strtol in upload-pack.c to parse_ui. This place should accept positive
number in base 10, plus sign is not accepted.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] not making corruption worse

2015-03-17 Thread Junio C Hamano

Jeff King  writes:

> But it strikes me as weird that we consider the _tips_ of history to be
> special for ignoring breakage. If the tip of "bar" is broken, we omit
> it. But if the tip is fine, and there's breakage three commits down in
> the history, then doing a clone is going to fail horribly, as
> pack-objects realizes it can't generate the pack. So in practice, I'm
> not sure how much you're buying with the "don't mention broken refs"
> code.

I think this is a trade-off between strictness and convenience.  Is
it preferrable that every time you try to clone a repository you get
reminded that one of its refs point at a bogus object and you
instead have to do "git fetch $there" with a refspec that excludes
the broken one, or is it OK to allow clones and fetches silently
succeed as if nothing is broken?

If the breakage in the reachability chain is somewhere that affects
a branch that is actively in use by the project, with or without
hiding a broken tip, you will be hit by object transfer failure and
you need to really go in there and fix things anyway.  If it is just
a no-longer-used experimental branch that lost necessary objects,
it may be more convenient if the system automatically ignored it.

In some parts of the system there is a movement to make this trade
off tweakable (hint: what happened to the knobs to fsck that allow
certain kinds of broken objects in the object store?  did the topic
go anywhere?). This one so far lacked such a knob to tweak, and I
view your paranoia bit as such a knob.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Retry if fdopen() fails due to ENOMEM

2015-03-17 Thread Junio C Hamano

Michael Haggerty  writes:

> On 03/06/2015 06:08 AM, Torsten Bögershausen wrote:
>> On 03/05/2015 05:07 PM, Michael Haggerty wrote:
>>> One likely reason for fdopen() to fail is the lack of memory for
>>> allocating a FILE structure. When that happens, try freeing some
>>> memory and calling fdopen() again in the hope that it will work the
>>> second time.
>>>
>>> This change was suggested by Jonathan Nieder [1]
>>>
>>> In the first patch it is unsatisfying that try_to_free_routine() is
>>> called with a magic number (1000) rather than sizeof(FILE). But the C
>>> standard doesn't guarantee that FILE is a complete type, so I can't
>>> think of a better approach. Suggestions, anybody?
>> 
>> it's not the sizeof(FILE) which is critical, it is the size of the buffer
>> associated with a FILE
>> 
>> http://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdio.h.html
>> 
>> BUFSIZ may be  your friend, and if it is not defined, 4096 may be a
>> useful default.
>
> Good point. If this patch series is not dropped as being useless, I will
> make this change.

OK, it has been a week since anybody mentioned this series.  What's
the verdict?  Taking what you said in $gmane/265228 into account, I
am taking the lack of reroll or follow-up as a sign of lost interest,
and if that is the case I'd drop the series before it hits 'next'.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Add revision range support on "-" and "@{-1}"

2015-03-17 Thread Junio C Hamano

Kenny Lee Sin Cheong  writes:

> On Tue, Mar 17 2015 at 02:49:48 AM, Junio C Hamano  wrote:
>> Junio C Hamano  writes:
>>
>>  if (try to see if it is a revision or regvision range) {
>>  /* if failed ... */
>>  if (starts with '-') {
>>  do the option thing;
>> continue;
>>  }
>>  /* args must be pathspecs from here on */
>> check the  '--' disambiguation;
>> add pathspec to prune-data;
>>  } else {
>>  got_rev_arg = 1;
>>  }
>>
>> but I didn't trace the logic myself to see if that would work.
>
> You're right. I was actually going to try and check all possible
> suffixes of "-" but your solution saves us from doing that, and it
> didn't break any tests.

"It didn't break any tests" does not tell us much, though.

I also notice that handle_revision_arg() would die() by calling it
directly or indirectly via verify_non_filename(), etc., but the
caller actually is expecting it to silently return non-zero when it
finds an argument that cannot be interpreted as a revision or as a
revision range.  

If we feed the function a string that has ".." in it, with
cant_be_filename unset, and if that string _can_ be parsed as a
valid range (e.g. "master..next"), we would check if a file whose
name is that string and die, e.g.

$ >master..next ; git log master..next
fatal: ambigous argument 'master..next': both revision and filename

If we swap the order to do the "revision" first before "option",
however, we would end up getting the same for a name that begins
with "-" and has ".." in it.  I see no guarantee that future
possible option name cannot be misinterpreted as a range to trigger
this check.

But "git cmd -$option" for any value of $option does not have to be
disambiguated when there is a file whose name is "-$option".  The
existing die()'s in the handle_revision_arg() function _will_ break
that promise.  Currently, because we check the options first,
handle_revision_arg() does not cause us any problem, but swapping
the order will have fallouts.

If we want to really do the swapping (and I think that is the only
sensible way if we wanted to allow "-" and any extended SHA-1 that
begins with "-" as "the previous branch"), I think the "OK, it looks
like a revision (or revision range); as we didn't see dashdash, it
must not be a filename" check has to be moved to the caller, perhaps
like this:

if (try to see if it is a revision or a revision range) {
/* failed */
...
} else {
/* it can be read as a revision or a revision range */
if (!seen_dashdash)
verify_non_filename(arg);
got_rev_arg = 1;
}

The "missing" cases should also silently return failure and have the
caller deal with that.

> On a similar note, would it be relevant to add similar changes to
> rev-parse?

If the goal is "to allow '-' everywhere '@{-1}' is allowed, and used
as such", then yes, of course, such an update is needed.

But I am not sure if that is a worthwhile goal to aim for in the
first place, though.  You would need to accept -@{two.days.ago} as a
"short-hand" for @{-1}@{two.days.ago}, etc., which does not look
very readable way in the first place.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Need help deciding between subtree and submodule

2015-03-17 Thread Robert Dailey

At my workplace, the team is using Atlassian Stash + git

We have a "Core" library that is our common code between various
projects. To avoid a single monolithic repository and to allow our
apps and tools to be modularized into their own repos, I have
considered moving Core to a subtree or submodule.

I tried subtree and this is definitely far more transparent and simple
to the team (simplicity is very important), however I notice it has
problems with unnecessary conflicts when you do not do `git subtree
push` for each `git subtree pull`. This is unnecessary overhead and
complicates the log graph which I don't like.

Submodule functionally works but it is complicated. We make heavy use
of pull requests for code reviews (they are required due to company
policy). Instead of a pull request being atomic and containing any app
changes + accompanying Core changes, we now need to create two pull
requests and manage them in proper order. Things also become more
difficult when branching. All around it just feels like submodule
would interfere and add more administration overhead on a day to day
basis, affecting productivity.

Is there a third option here I'm missing? If only that issue with
subtree could be addressed (the conflicts), it would be perfect enough
for us I think. I have done all the stackoverflow reading and research
I can manage at this point. I would really love some feedback from the
actual git community on what would be a practical solution and
structure here from a company perspective.

Thanks in advance!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Add revision range support on "-" and "@{-1}"

2015-03-17 Thread Kenny Lee Sin Cheong

On Tue, Mar 17 2015 at 02:49:48 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>   if (try to see if it is a revision or regvision range) {
>   /* if failed ... */
>   if (starts with '-') {
>   do the option thing;
> continue;
>   }
>   /* args must be pathspecs from here on */
> check the  '--' disambiguation;
> add pathspec to prune-data;
>   } else {
>   got_rev_arg = 1;
>   }
>
> but I didn't trace the logic myself to see if that would work.

You're right. I was actually going to try and check all possible
suffixes of "-" but your solution saves us from doing that, and it
didn't break any tests.

On a similar note, would it be relevant to add similar changes to
rev-parse? While trying to write some test, I noticed that rev-parse
doesn't support "-". If I'm not mistaking it assumes everything that starts 
with "-"
must be an option. But since it is a plumbing tool I don't know if it
would be worth it or even an improvement.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git submodule: update=!command

2015-03-17 Thread Ryan Lortie

kara,

On Tue, Mar 17, 2015, at 17:05, Junio C Hamano wrote:
> If you check the output from
> 
> git diff 30a52c1d^ 30a52c1d
> 
> and find it appropriately address the problem you originally had,
> that would be wonderful, and if you can suggest further improvement,
> that is equally good.

Indeed, the new version of the docs looks much better.  I'm particularly
happy about the change to the format to make it easier to visually scan
for the possible update modes.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git submodule: update=!command

2015-03-17 Thread Junio C Hamano

Ryan Lortie  writes:

> On Tue, Mar 17, 2015, at 16:49, Junio C Hamano wrote:
>> With more recent versions of Git, namely, the versions after
>> 30a52c1d (Merge branch 'ms/submodule-update-config-doc' into maint,
>> 2015-03-13), the documentation pages already have updated
>> descriptions around this area.
>
> sigh.
>
> That's what I get for forgetting to type 'git pull' before writing a
> patch.
>
> Sorry for the noise!

Nothing to apologise or sigh about.  You re-confirmed that the old
documentation was lacking, which led to an earlier discussion which
in turn led to Michal to update the documentation.  If you check the
output from

git diff 30a52c1d^ 30a52c1d

and find it appropriately address the problem you originally had,
that would be wonderful, and if you can suggest further improvement,
that is equally good.

Thanks for participating in our effort to collectively make Git
better.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git submodule: update=!command

2015-03-17 Thread Ryan Lortie

On Tue, Mar 17, 2015, at 16:49, Junio C Hamano wrote:
> With more recent versions of Git, namely, the versions after
> 30a52c1d (Merge branch 'ms/submodule-update-config-doc' into maint,
> 2015-03-13), the documentation pages already have updated
> descriptions around this area.

sigh.

That's what I get for forgetting to type 'git pull' before writing a
patch.

Sorry for the noise!

Cheers
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git submodule: update=!command

2015-03-17 Thread Junio C Hamano

Ryan Lortie  writes:

> 'man git-submodule' contains mention (in one place) that:
>
> Setting the key submodule.$name.update to !command
> will cause command to be run.
>
> This is not documented in 'man gitmodules' (which documents the other
> possible values for the 'update' key)

Yes, that is deliberate, because you cannot use !command in .gitmodules
that is tracked for security reasons.

With more recent versions of Git, namely, the versions after
30a52c1d (Merge branch 'ms/submodule-update-config-doc' into maint,
2015-03-13), the documentation pages already have updated
descriptions around this area.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git submodule: update=!command

2015-03-17 Thread Ryan Lortie

On Tue, Mar 17, 2015, at 15:50, Jeff King wrote:
> Yeah, spelling out the security model more explicitly would be good.

Please see the attached patch.

Cheers


0001-docs-clarify-command-submodule-update-policy.patch
Description: Binary data

Re: [PATCH v4] rev-list: refuse --first-parent combined with --bisect

2015-03-17 Thread Junio C Hamano

Christian Couder  writes:

> On Tue, Mar 17, 2015 at 7:33 PM, Junio C Hamano  wrote:
>> Christian Couder  writes:
>>
>>> Yes, but the user is supposed to not change the "bad" pointer for no
>>> good reason.
>>
>> That is irrelevant, no?  Nobody is questioning that the user is
>> supposed to judge if a commit is "good" or "bad" correctly.
>
> So if there is already a bad commit and the user gives another
> bad commit, that means that the user knows that it will replace the
> existing bad commit with the new one and that it's done for this
> purpose.

ECANNOTQUITEPARSE.  The user may say "git bisect bad $that" and we
do not question $that is bad. Git does not know better than the
user.

But that does not mean Git does not know better than the user how
the current bad commit and $that commit are related.  The user is
not interested in "replacing" at all.  The user is telling just one
single fact, that is, "$that is bad".

>> I am not quite sure if I am correctly getting what you meant to say,
>> but if you meant "only when --alternate is given, we should do the
>> merge-base thing; we should keep losing the current 'bad' and
>> replace it with the new one without the --alternate option", I would
>> see that as an exercise of a bad taste.
>
> What I wanted to say is that if we change "git bisect bad ",
> so that now it means "add a new bad commit" instead of the previous
> "replace the current bad commit, if any, with this one", then experienced
> users might see that change as a regression in the user interface and
> it might even break scripts.

Huh?  

Step back a bit.  The place you need to start from is to admit the
fact that what "git bisect bad " currently does is
broken.

Try creating this history yourself

a---b---c---d---e---f

and start bisection this way:

$ git bisect start f c
$ git bisect bad a

Immediately after the second command, "git bisect" moans

Some good revs are not ancestor of the bad rev.
git bisect cannot work properly in this case.
Maybe you mistake good and bad revs?

when it notices that the good rev (i.e. 'c') is no longer an
ancestor of the 'bad', which now points at 'a'.

But that is because "git bisect bad" _blindly_ moved 'bad' that used
to point at 'f' to 'a', making a good rev (i.e. 'c') an ancestor of
the bad rev, without even bothering to check.

Now, if we fixed this bug and made the bisect_state function more
careful (namely, when accepting "bad", make sure it is not beyond
any existing "good", or barf like the above, _without_ moving the
bad pointer), the user interface and behaviour would be changed.  Is
that a regression?  No, it is a usability fix and a progress.

Simply put, bisect_state function can become more careful and
intelligent to help users.

I view this "user goes out of way to tell us a commit that is known
to be bad as bad, even though it is not what we offered to test and
is not an ancestor of the commit that currently marked as bad" case
the same way.  We by now hopefully understand that blindly replacing
the current 'bad' is suboptimal.  By teaching bisect_state to do the
"merge-base thing", we would be fixing that.

Why is it a regression?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Promoting Git developers

2015-03-17 Thread Junio C Hamano

Christian Couder  writes:

> On Mon, Mar 16, 2015 at 6:06 PM, Stefan Beller  wrote:
>> On Mon, Mar 16, 2015 at 2:20 AM, David Kastrup  wrote:
>>>
>>> "Git Annotate"?
>>
>> "Git Praise" as opposed to blame?
>> "Git Who" as a pun on the subcommand structure which doesn't always
>> follows grammar?
>
> Yeah these suggestions above are nice, thanks for them, but "Git Rev News"
> also look a bit like "git rev-list" and "git rev-parse" which are plumbing Git
> commands, so it gives a somewhat "hardcore" look to the news letter which
> I like.

Call that "Git Rev List" then to be more direct?

I myself liked the "Review" (spelled in full word) as its non-nerdy
sound, as a suitable name for a publication that bridges between
hard-core developers and slightly-more-serious-than-casual
observers, but its not my call (and as I often say, I am not good at
naming things) ;-).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Promoting Git developers

2015-03-17 Thread Christian Couder

On Sun, Mar 15, 2015 at 11:18 PM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> I wrote something about a potential Git Rev News news letter:
>
> I read it.  Sounds promising.

Thanks!

[...]

> I obviously do not know how the actual contents would look like at
> this point, but depending on the quality of the publication I might
> be able to steal some descriptions when keeping the notes on topics
> in flight that appear in my "What's cooking" report.  And it can go
> the other way around, too.  The publication may want to peek my
> "What's cooking" report for hints on how to characterize each topic
> and assess its impact to the evolution of Git.

Yeah, it would be a very nice thing if we could steal these kind of
things from each other's work!

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally'

2015-03-17 Thread Junio C Hamano

Karthik Nayak  writes:

> Subject: Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 
> 'cat-file --literally'

> Modify sha1_loose_object_info() to support 'cat-file --literally'
> by accepting flags and also make changes to copy the type to
> object_info::typename.

It would be more descriptive to mention the central point on the
title regarding what it takes to "support cat-file --literally".

For example:

  sha1_file.c: support reading from a loose object of a bogus type

  Update sha1_loose_object_info() to optionally allow it to read
  from a loose object file of unknown/bogus type; as the function
  usually returns the type of the object it read in the form of enum
  for known types, add an optional "typename" field to receive the
  name of the type in textual form.

By the way, I think your 1/2 would not even compile unless you have
this change; the patches in these two patch series must be swapped,
no?

> diff --git a/sha1_file.c b/sha1_file.c
> index 69a60ec..e31e9e2 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1564,6 +1564,36 @@ int unpack_sha1_header(git_zstream *stream, unsigned 
> char *map, unsigned long ma
>   return git_inflate(stream, 0);
>  }
>  
> +static int unpack_sha1_header_literally(git_zstream *stream, unsigned char 
> *map,
> + unsigned long mapsize,
> + struct strbuf *header)
> +{
> + unsigned char buffer[32], *cp;
> + unsigned long bufsiz = sizeof(buffer);
> + int status;
> +
> + /* Get the data stream */
> + memset(stream, 0, sizeof(*stream));
> + stream->next_in = map;
> + stream->avail_in = mapsize;
> + stream->next_out = buffer;
> + stream->avail_out = bufsiz;
> +
> + git_inflate_init(stream);
> +
> + do {
> + status = git_inflate(stream, 0);
> + strbuf_add(header, buffer, stream->next_out - buffer);
> + for (cp = buffer; cp < stream->next_out; cp++)
> + if (!*cp)
> + /* Found the NUL at the end of the header */
> + return 0;
> + stream->next_out = buffer;
> + stream->avail_out = bufsiz;
> + } while (status == Z_OK);
> + return -1;
> +}
> +

There is nothing "literal" about this function.

The only difference between the original and this one is that this
uses a strbuf, instead of a buffer of a fixed size allocated by the
caller, to return the early part of the inflated data.

I wonder if it incurs too much overhead to the existing callers if
you reimplementated unpack_sha1_header() as a thin wrapper around
this function, something like

int unpack_sha1_header(git_zstream *stream,
unsigned char *map, unsigned long mapsize,
void *buffer, unsigned long bufsiz)
{
int status;
struct strbuf header = STRBUF_INIT;

status = unpack_sha1_header_to_strbuf(stream, map, mapsize, 
&header);
if (bufsiz < header.len)
die("object header too long");
memcpy(buffer, header.buf, header.len);
return status;
}

Note that I am *not* suggesting to do this blindly.  If there is no
downside from performance point of view, however, the above would be
a way to avoid code duplication.

Another way to avoid code duplication may be to implement
unpack_sha1_header_to_strbuf() in terms of unpack_sha1_header(),
perhaps like this:

static int unpack_sha1_header_to_strbuf(...)
{
unsigned char buffer[32], *cp;
unsigned long bufsiz = sizeof(buffer);
int status = unpack_sha1_header(stream, map, mapsize, buffer, 
bufsiz);

if (status)
return status;
do {
strbuf_add(header, buffer, stream->next_out - buffer);
for (cp = buffer; cp < stream->next_out; cp++)
if (!*cp)
/* Found the NUL at the end of the 
header */
return 0;
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status == Z_OK);
return -1;
}

which may be more in line with how we read data from loose objects.

> +int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
> +int flags)

Unless we are taking advantage of the fact that MSB is special in
signed integral types, I would prefer to see us use an unsigned type
to store these bits in a variable of an integral type.  That would
let the readers assume that they have fewer tricky things possibly
going on in the code (also see the footnote to $gmane/263751).

> @@ -1652,12 +1674,4

Re: Promoting Git developers

2015-03-17 Thread Christian Couder

On Mon, Mar 16, 2015 at 6:06 PM, Stefan Beller  wrote:
> On Mon, Mar 16, 2015 at 2:20 AM, David Kastrup  wrote:
>>
>> "Git Annotate"?
>
> "Git Praise" as opposed to blame?
> "Git Who" as a pun on the subcommand structure which doesn't always
> follows grammar?

Yeah these suggestions above are nice, thanks for them, but "Git Rev News"
also look a bit like "git rev-list" and "git rev-parse" which are plumbing Git
commands, so it gives a somewhat "hardcore" look to the news letter which
I like.

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Promoting Git developers

2015-03-17 Thread Christian Couder

On Tue, Mar 17, 2015 at 10:43 AM, Thomas Ferris Nicolaisen
 wrote:
> On Sun, Mar 15, 2015 at 9:46 AM, Christian Couder
>  wrote:
>>
>> I wrote something about a potential Git Rev News news letter:
>>
>> https://github.com/git/git.github.io/pull/15
>>
>
> I would love to have/use something like this in the GitMinutes
> podcast. Perhaps in addition to the very random interview format that
> I have now, I could do a more regular episodes about Git news, where I
> incorporate this.

Yeah, no problem.

> I also volunteer to help with the production, if you'll allow list
> lurkers like myself to contribute ;)

Yes of course, you are very welcome!

Peff, could you also give the rights on the repo to Thomas?

Thanks both,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git submodule: update=!command

2015-03-17 Thread Jeff King

On Tue, Mar 17, 2015 at 03:28:57PM -0400, Ryan Lortie wrote:

> The first is a question about git's basic policy with respect to things
> like this.  I hope that it's safe to assume that running 'git' commands
> on repositories downloaded from potentially-hostile places will never
> result in the authors of those repositories being able to run code on my
> machine.

Definitely, our policy is that downloading a git repository should not
result in arbitrary code being run. If there is a case of that, it would
be a serious security bug.

I am not an expert on submodules, but I think the security module there
is:

  1. You can do whatever you like in submodule.*.update entries in
 .git/config, including arbitrary code. Nobody but the user can
 write to it.

  2. The submodule code may migrate entries from .gitmodules into
 .git/config, but does so with an allow-known-good whitelist (see
 git-submodule.sh lines 622-637).

So AFAICT there's no bug here, and the system is working as designed.
It might be worth mentioning that restriction in the submodule
documentation, if only to prevent non-malicious people from wondering
why adding "!foo" does not work in .gitmodules.

> If that is true then, the second request would be to spell this out more
> explicitly in the relevant documentation.  I'm happy to write a patch to
> do that, if it is deemed appropriate.

Yeah, spelling out the security model more explicitly would be good.
There is also some subtlety around hooks. Doing:

  git clone user@host:/path/to/repo.git local

should never run code controlled by "repo.git" as "user@host". But
doing:

  ssh user@host 'cd /path/to/repo.git && git log'

will respect the .git/config in repo.git, which may include arbitrary
commands.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] rev-list: refuse --first-parent combined with --bisect

2015-03-17 Thread Christian Couder

On Tue, Mar 17, 2015 at 7:33 PM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> On Mon, Mar 16, 2015 at 10:05 PM, Junio C Hamano  wrote:
>>
>>> However, you can say "git bisect bad " (and "git bisect good
>>> " for that matter) on a rev that is unrelated to what the
>>> current bisection state is.  E.g. after you mark the child of 8 as
>>> "bad", the bisected graph would become
>>>
>>>G...1---2---3---4---6---8---B
>>>
>>> and you would be offered to test somewhere in the middle, say, 4.
>>> But it is perfectly OK for you to respond with "git bisect bad 7",
>>> if you know 7 is bad.
>>>
>>> I _think_ the current code blindly overwrites the "bad" pointer,
>>> making the bisection state into this graph if you do so.
>>>
>>>G...1---2---3---4
>>> \
>>>  5---B
>>
>> Yes, we keep only one "bad" pointer.
>>
>>> This is very suboptimal.  The side branch 4-to-7 could be much
>>> longer than the original trunk 4-to-the-tip, in which case we would
>>> have made the suspect space _larger_, not smaller.
>>
>> Yes, but the user is supposed to not change the "bad" pointer for no
>> good reason.
>
> That is irrelevant, no?  Nobody is questioning that the user is
> supposed to judge if a commit is "good" or "bad" correctly.

So if there is already a bad commit and the user gives another
bad commit, that means that the user knows that it will replace the
existing bad commit with the new one and that it's done for this
purpose.

> And nobody sane is dreaming that "Git could do better and detect
> user's mistakes when the user says 'bad' for a commit that is
> actually 'good'"; if Git can do that, then it should be able to do
> the bisect without any user input (including "bisect run") at all
> ;-).
>
>>> We certainly should be able to take advantage of the fact that the
>>> current "bad" commit (i.e. the child of 8) and the newly given "bad"
>>> commit (i.e. 7) are both known to be bad and mark 4 as "bad" instead
>>> when that happens, instead of doing the suboptimal thing the code
>>> currently does.
>>
>> Yeah, we could do that, but we would have to allow it only if a
>> special option is passed on the command line, for example:
>> git bisect bad --alternate 
>
> I am not quite sure if I am correctly getting what you meant to say,
> but if you meant "only when --alternate is given, we should do the
> merge-base thing; we should keep losing the current 'bad' and
> replace it with the new one without the --alternate option", I would
> see that as an exercise of a bad taste.

What I wanted to say is that if we change "git bisect bad ",
so that now it means "add a new bad commit" instead of the previous
"replace the current bad commit, if any, with this one", then experienced
users might see that change as a regression in the user interface and
it might even break scripts.

That's why I suggested to use a new option to mean
"add a new bad commit", though --alternate might not be the best
name for this option.

> Because the merge-base thing is using both the current and the new
> one, such a use is not "alternate" in the first place.
>
> If the proposal were "with a new option, the user can say 'oh, I
> made a mistake earlier and said that a commit that is not bad as
> 'bad'.  Let me replace the commit currently marked as 'bad' with
> this one.", I would find it very sensible, actually.

What I find sensible is to not break the semantics of the current
interface.

> I can see that such an operation can be called "alternate", but
> "--fix" might be shorter-and-sweeter-and-to-the-point.
>
> In the "normal" case, the commit we offer the user to check (and
> respond with "git bisect bad" without any commit parameter) is
> always an ancestor of the current 'bad', so the merge-base with
> 'bad' and the commit that was just checked would always be the
> current commit.  Using the merge-base thing will be transparent to
> the end users in the normal case, and when the user has off-line
> knowledge that some other commit that is not an ancestor of the
> current 'bad' commit is bad, the merge-base thing will give a better
> behaviour than the current implementation that blindly replaces.

Yes, I agree that it could be an improvement to make it possible for the
user to specify another bad commit. I just think it should be done with
a new option if there is already a bad commit...

>> and/or we could make "git bisect bad" accept any number of bad
>> commitishs.

... or by allowing any number of bad commits after "git bisect bad".

> Yes, that is exactly what I meant.
>
> The way I understand the Philip's point is that the user may have
> a-priori knowledge that a breakage from the same cause appears in
> both tips of these branches.  In such a case, we can start bisection
> after marking the merge-base of two 'bad' commits, e.g. 4 in the
> illustration in the message you are responding to, instead of
> including 5, 6, and 8 in the suspect set.

Yeah, I agree that we can do better in

Re: [PATCH 00/14] numparse module: systematically tighten up integer parsing

2015-03-17 Thread Michael Haggerty

On 03/17/2015 07:48 PM, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
>> When I looked around, I found scores of sites that call atoi(),
>> strtoul(), and strtol() carelessly. And it's understandable. Calling
>> any of these functions safely is so much work as to be completely
>> impractical in day-to-day code.
>>
>> git-compat-util.h has two functions, strtoul_ui() and strtol_i(), that
>> try to make parsing integers a little bit easier.
> 
> Exactly.  They were introduced to prevent sloppy callers from
> passing NULL to the &end parameter to underlying strtoul(3).  The
> first thing that came to my mind while reading your message up to
> this point was "why not use them more, tighten them, adding
> different variants if necessary, instead of introducing an entirely
> new set of functions in a new namespace?"

Here were my thoughts:

* I wanted to change the interface to look less like
  strtol()/strtoul(), so it seemed appropriate to make the names
  more dissimilar.

* The functions seemed long enough that they shouldn't be inline,
  and I wanted to put them in their own module rather than put
  them in git-compat-util.h.

* It wasn't obvious to me how to generalize the names, strtoul_ui()
  and strtol_i(), to the eight functions that I wanted.

That being said, I'm not married to the names. Suggestions are
welcome--but we would need names for 8 functions, not 4 [1].

Michael

> For example:
> 
>> * Am I making callers too strict? In many cases where a positive
>>   integer is expected (e.g., "--abbrev="), I have been replacing
>>   code like
>>
>>   result = strtoul(s, NULL, 10);
>>
>>   with
>>
>>   if (convert_i(s, 10, &result))
>>   die("...");
> 
> I think strictness would be good and those who did "--abbrev='  20'"
> deserve what they get from the additional strictness, but 
> 
>   if (strtol_i(s, 10, &result))
> 
> would have been more familiar.

[1] It could be that when we're done, it will turn out that some of the
eight variants are not needed. If so, we can delete them then.

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

git submodule: update=!command

2015-03-17 Thread Ryan Lortie

karaj,

'man git-submodule' contains mention (in one place) that:

Setting the key submodule.$name.update to !command
will cause command to be run.

This is not documented in 'man gitmodules' (which documents the other
possible values for the 'update' key) nor in 'man git-config' which also
mentions the 'update' key (but refers readers to the two other pages).

This feature is scary.  The idea that arbitrary code could be executed
on my machine when I run innocent-looking git commands, based on the
content of the .gitmodules file is enough to  give pause to anybody.

Fortunately, it seems that (for now?) this is not really the case.  'git
submodule init' will copy the values of the 'update' key from
.gitmodules to your local git config, but only if they are one of
"none", "checkout", "merge" or "rebase".

So, I guess I'm asking two things.

The first is a question about git's basic policy with respect to things
like this.  I hope that it's safe to assume that running 'git' commands
on repositories downloaded from potentially-hostile places will never
result in the authors of those repositories being able to run code on my
machine.

If that is true then, the second request would be to spell this out more
explicitly in the relevant documentation.  I'm happy to write a patch to
do that, if it is deemed appropriate.

Thanks in advance.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] t5312: test object deletion code paths in a corrupted repository

2015-03-17 Thread Jeff King

On Tue, Mar 17, 2015 at 07:34:02PM +0100, Johannes Sixt wrote:

> Am 17.03.2015 um 08:28 schrieb Jeff King:
> >+test_expect_success 'create history reachable only from a bogus-named ref' '
> >+test_tick && git commit --allow-empty -m master &&
> >+base=$(git rev-parse HEAD) &&
> >+test_tick && git commit --allow-empty -m bogus &&
> >+bogus=$(git rev-parse HEAD) &&
> >+git cat-file commit $bogus >saved &&
> >+echo $bogus >.git/refs/heads/bogus:name &&
> 
> This causes headaches on Windows: It creates an empty file, named "bogus",
> with all the data diverted to the alternate data stream named "name".
> Needless to say that this...

Ah, yes. Windows. Our usual workaround would be to put it straight into
packed-refs, but in this case, the test really does need the badly named
ref in the file system. But...

> >+test_expect_success 'clean up bogus ref' '
> >+rm .git/refs/heads/bogus:name
> >+'
> 
> does not remove the file "bogus", but only the alternate data stream (if at
> all---I forgot to check). How about .git/refs/heads/bogus..nam.e?

Yes, that works. The colon is what originally brought my attention to
this case, but anything that fails git-check-ref-format is fine. I've
squashed this in:

diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index 167031e..1001a69 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -21,7 +21,7 @@ test_expect_success 'create history reachable only from a 
bogus-named ref' '
test_tick && git commit --allow-empty -m bogus &&
bogus=$(git rev-parse HEAD) &&
git cat-file commit $bogus >saved &&
-   echo $bogus >.git/refs/heads/bogus:name &&
+   echo $bogus >.git/refs/heads/bogus..name &&
git reset --hard HEAD^
 '
 
@@ -47,7 +47,7 @@ test_expect_failure 'destructive repack keeps packed object' '
 
 # subsequent tests will have different corruptions
 test_expect_success 'clean up bogus ref' '
-   rm .git/refs/heads/bogus:name
+   rm .git/refs/heads/bogus..name
 '
 
 test_expect_success 'create history with missing tip commit' '


I assumed the final "." in your example wasn't significant (it is not to
git), but let me know if I've run afoul of another weird restriction. :)

Thanks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/14] numparse module: systematically tighten up integer parsing

2015-03-17 Thread Junio C Hamano

Michael Haggerty  writes:

> When I looked around, I found scores of sites that call atoi(),
> strtoul(), and strtol() carelessly. And it's understandable. Calling
> any of these functions safely is so much work as to be completely
> impractical in day-to-day code.
>
> git-compat-util.h has two functions, strtoul_ui() and strtol_i(), that
> try to make parsing integers a little bit easier.

Exactly.  They were introduced to prevent sloppy callers from
passing NULL to the &end parameter to underlying strtoul(3).  The
first thing that came to my mind while reading your message up to
this point was "why not use them more, tighten them, adding
different variants if necessary, instead of introducing an entirely
new set of functions in a new namespace?"

For example:

> * Am I making callers too strict? In many cases where a positive
>   integer is expected (e.g., "--abbrev="), I have been replacing
>   code like
>
>   result = strtoul(s, NULL, 10);
>
>   with
>
>   if (convert_i(s, 10, &result))
>   die("...");

I think strictness would be good and those who did "--abbrev='  20'"
deserve what they get from the additional strictness, but 

if (strtol_i(s, 10, &result))

would have been more familiar.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GSoC] Applying for conversion scripts to builtins

2015-03-17 Thread Junio C Hamano

Johannes Schindelin  writes:

> Therefore, I would wager a bet that just the mere conversion of a
> shell script into even a primitive `run_command()`-based builtin would
> help performance on Windows in a noticeable manner.

As you correctly allege, if a patch rewrote a shell-scripted
porcelain by using series of run_command() and doing nothing else, I
would have asked "is that an improvement?", without knowing that.

> Of course, it would be *even nicer* to avoid the spawning altogether.

Yeah, that, too ;-)

> The biggest benefit of avoiding needless parsing, however, is not
> performance. It is avoiding quoting issues. This is particularly so on
> Windows, where Git is sometimes called from outside a shell
> environment, where we have to deal with inconsistent quoting because
> it is every Windows program's own job to parse the command-line,
> including the quoting.
>
> Concrete example: on Windows, we have file locking issues because
> files that are in use cannot be deleted. For that reason, we have
> Windows-specific code that is "nice" by trying harder to delete files,
> giving programs a little time to let their locks go. This locking
> issue happens also when a virus scanner "uses"...

These are definitely good advices from the area expert.

Thanks for a bunch of good input.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] rev-list: refuse --first-parent combined with --bisect

2015-03-17 Thread Junio C Hamano

Christian Couder  writes:

> On Mon, Mar 16, 2015 at 10:05 PM, Junio C Hamano  wrote:
>
>> However, you can say "git bisect bad " (and "git bisect good
>> " for that matter) on a rev that is unrelated to what the
>> current bisection state is.  E.g. after you mark the child of 8 as
>> "bad", the bisected graph would become
>>
>>G...1---2---3---4---6---8---B
>>
>> and you would be offered to test somewhere in the middle, say, 4.
>> But it is perfectly OK for you to respond with "git bisect bad 7",
>> if you know 7 is bad.
>>
>> I _think_ the current code blindly overwrites the "bad" pointer,
>> making the bisection state into this graph if you do so.
>>
>>G...1---2---3---4
>> \
>>  5---B
>
> Yes, we keep only one "bad" pointer.
>
>> This is very suboptimal.  The side branch 4-to-7 could be much
>> longer than the original trunk 4-to-the-tip, in which case we would
>> have made the suspect space _larger_, not smaller.
>
> Yes, but the user is supposed to not change the "bad" pointer for no
> good reason.

That is irrelevant, no?  Nobody is questioning that the user is
supposed to judge if a commit is "good" or "bad" correctly.

And nobody sane is dreaming that "Git could do better and detect
user's mistakes when the user says 'bad' for a commit that is
actually 'good'"; if Git can do that, then it should be able to do
the bisect without any user input (including "bisect run") at all
;-).

>> We certainly should be able to take advantage of the fact that the
>> current "bad" commit (i.e. the child of 8) and the newly given "bad"
>> commit (i.e. 7) are both known to be bad and mark 4 as "bad" instead
>> when that happens, instead of doing the suboptimal thing the code
>> currently does.
>
> Yeah, we could do that, but we would have to allow it only if a
> special option is passed on the command line, for example:
> git bisect bad --alternate 

I am not quite sure if I am correctly getting what you meant to say,
but if you meant "only when --alternate is given, we should do the
merge-base thing; we should keep losing the current 'bad' and
replace it with the new one without the --alternate option", I would
see that as an exercise of a bad taste.

Because the merge-base thing is using both the current and the new
one, such a use is not "alternate" in the first place.

If the proposal were "with a new option, the user can say 'oh, I
made a mistake earlier and said that a commit that is not bad as
'bad'.  Let me replace the commit currently marked as 'bad' with
this one.", I would find it very sensible, actually.

I can see that such an operation can be called "alternate", but
"--fix" might be shorter-and-sweeter-and-to-the-point.

In the "normal" case, the commit we offer the user to check (and
respond with "git bisect bad" without any commit parameter) is
always an ancestor of the current 'bad', so the merge-base with
'bad' and the commit that was just checked would always be the
current commit.  Using the merge-base thing will be transparent to
the end users in the normal case, and when the user has off-line
knowledge that some other commit that is not an ancestor of the
current 'bad' commit is bad, the merge-base thing will give a better
behaviour than the current implementation that blindly replaces.

> and/or we could make "git bisect bad" accept any number of bad
> commitishs.

Yes, that is exactly what I meant.

The way I understand the Philip's point is that the user may have
a-priori knowledge that a breakage from the same cause appears in
both tips of these branches.  In such a case, we can start bisection
after marking the merge-base of two 'bad' commits, e.g. 4 in the
illustration in the message you are responding to, instead of
including 5, 6, and 8 in the suspect set.

You need to be careful, though.  An obvious pitfall is what you
should do when there is a criss-cross merge.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] t5312: test object deletion code paths in a corrupted repository

2015-03-17 Thread Johannes Sixt


Am 17.03.2015 um 08:28 schrieb Jeff King:

+test_expect_success 'create history reachable only from a bogus-named ref' '
+   test_tick && git commit --allow-empty -m master &&
+   base=$(git rev-parse HEAD) &&
+   test_tick && git commit --allow-empty -m bogus &&
+   bogus=$(git rev-parse HEAD) &&
+   git cat-file commit $bogus >saved &&
+   echo $bogus >.git/refs/heads/bogus:name &&


This causes headaches on Windows: It creates an empty file, named 
"bogus", with all the data diverted to the alternate data stream named 
"name". Needless to say that this...



+test_expect_success 'clean up bogus ref' '
+   rm .git/refs/heads/bogus:name
+'


does not remove the file "bogus", but only the alternate data stream (if 
at all---I forgot to check). How about .git/refs/heads/bogus..nam.e?


-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FW: Time Date on file

2015-03-17 Thread Dennis Kaarsemaker

On di, 2015-03-17 at 13:19 -0400, Patrice Monette wrote:
> 
> I did not find any config, but is there one configuration somewhere to
> preserve the real date creation by author somewhere?

No, there is no such configuration as git does not track timestamps of
files.
-- 
Dennis Kaarsemaker
www.kaarsemaker.net

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] diff-lib.c: adjust position of i-t-a entries in diff

2015-03-17 Thread Junio C Hamano

Duy Nguyen  writes:

> On Mon, Mar 16, 2015 at 09:05:45AM -0700, Junio C Hamano wrote:
>> The offending one came from eec3e7e4 (cache-tree: invalidate i-t-a
>> paths after generating trees, 2012-12-16), which was a fix to an
>> earlier bug where a cache-tree written out of an index with i-t-a
>> entries had incorrect information and still claimed it is fully
>> valid after write-tree rebuilt it.  The test probably should add
>> another path without i-t-a bit, run the same "diff --cached" with
>> updated expectation before write-tre, and run the "diff --cached"
>> again to make sure it produces a result that match the updated
>> expectation.
>
> Would adding another non-i-t-a entry help? Before this patch
> "diff --cached" after write-tree shows the i-t-a entry only when
> eec3e7e4 is applied. But with this patch we don't show i-t-a entry any
> more, before or after write-tree, eec3e7e4 makes no visible difference.
>
> We could even revert eec3e7e4 and the outcome of "diff --cached" would
> be the same because we just sort of move the "invalidation" part from
> cache-tree to do_oneway_diff(). Not invalidating would speed up "diff
> --cached" when i-t-a entries are present. Still it may be a good idea
> to invalidate i-t-a paths to be on the safe side. Perhaps a patch like
> this to resurrect the test?

My unerstanding of what eec3e7e4 (cache-tree: invalidate i-t-a paths
after generating trees, 2012-12-16) fixed was that in this sequence:

- You prepare an index.

- You write-tree out of the index, which involves:

  - updating the cache-tree to match the shape of the resulting
from writing the index out.

  - create tree objects matching all levels of the cache-tree as
needed on disk.

  - report the top-level tree object name

   - run "diff-index --cached", which can and will take advantage of
 the fact that everything in a subtree below a known-to-be-valid
 cache-tree entry does not have to be checked one-by-one.  If a
 cache-tree says "everything under D/ in the index would hash to
 tree object T" and the HEAD has tree object T at D/, then the
 diff machinery will bypass the entire section in the index
 under D/, which is a valid optimization.

 However, when there is an i-t-a entry, we excluded that entry
 from the tree object computation, its presence did not
 contribute to the tree object name, but still marked the
 cache-tree entries that contain it as valid by mistake.  This
 old bug was what the commit fixed, so an invocation of "diff
 --cached" after a write-tree, even if the index contains an
 i-t-a entry, will not see cache-tree entries that are marked
 valid when they are not.  Instead, "diff --cached" will bypass
 the optimization and makes comparison one-by-one for the index
 entries.

So reverting the fix obviously is not the right thing to do.  If the
tests show different results from two invocations of "diff --cached"
with your patch applied, there is something that is broken by your
patch, because the index and the HEAD does not change across
write-tree in that test.

If on the other hand the tests show the same result from these two
"diff --cached" and the result is different from what the test
expects, that means your patch changed the world order, i.e. an
i-t-a entry used to be treated as if it were adding an empty blob to
the index but it is now treated as non-existent, then that is a good
thing and the only thing we need to update is what the test expects.
I am guessing that instead of expecting dir/bar to be shown, it now
should expect no output?

Does adding an non-i-t-a entry help?  It does not hurt, and it makes
the test uses a non-empty output, making its effect more visible,
which may or may not count as helping.


 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

FW: Time Date on file

2015-03-17 Thread Patrice Monette

I have address the following question to github support and the refer me to git 
documentation but I`m not too familiar with Git right now.  Here`s my question:

I did not find any config, but is there one configuration somewhere to preserve 
the real date creation by author somewhere?

Thanks in advance,

Patrice Monette
IT / Programmer ;  TI / Programmeur

E-mail:   patr...@steltec.ca
Phone:   450-971-5995Ext: 2221
Fax:   450-971-1865

-Original Message-
From: James Dennes (GitHub Staff) [mailto:supp...@github.com] 
Sent: February-16-15 4:46 PM
To: Patrice Monette
Subject: Re: Time Date on file

Hi there,

> I did not find any config, but is there one configuration somewhere to 
> preserve the real date creation by author somewhere?

Not that I'm aware of, however this would be something specific to Git, not 
just GitHub. You'd need to check the Git documentation.

Thanks,
James

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] rev-list: refuse --first-parent combined with --bisect

2015-03-17 Thread Christian Couder

On Mon, Mar 16, 2015 at 10:05 PM, Junio C Hamano  wrote:
> "Philip Oakley"  writes:
>
>> From: "Junio C Hamano" 
>>
>>> Hence, if you have a history that looks like this:
>>>
>>>
>>>   G...1---2---3---4---6---8---B
>>>\
>>> 5---7---B
>>>
>>> it follows that 4 must also be "bad".  It used to be good long time
>>> ago somewhere before 1, and somewhere along way on the history,
>>> there was a single breakage event that we are hunting for.  That
>>> single event cannot be 5, 6, 7 or 8 because breakage at say 5 would
>>> not explain why the tip of the upper branch is broken---its breakage
>>> has no way to propagate there.  The breakage must have happened at 4
>>> or before that commit.
>>
>> Is it not worth at least confirming the assertion that 4 is bad before
>> proceding, or at least an option to confirm that in complex scenarios
>> where the fault may be devious.
>
> That raises a somewhat interesting tangent.
>
> Christian seems to be forever interested in bisect, so I'll add him
> to the Cc list ;-)
>
> There is no way to give multiple "bad" from the command line.  You
> can say "git bisect start rev rev rev..." but that gives only one
> bad and everything else is good.  And once you specify one of the
> above two bad ones (say, the child of 8), then we will not even
> offer the other one (i.e. the child of 7) as a candidate to be
> tested.  So in that sense, "confirm that 4 is bad before proceeding"
> is a moot point.
>
> However, you can say "git bisect bad " (and "git bisect good
> " for that matter) on a rev that is unrelated to what the
> current bisection state is.  E.g. after you mark the child of 8 as
> "bad", the bisected graph would become
>
>G...1---2---3---4---6---8---B
>
> and you would be offered to test somewhere in the middle, say, 4.
> But it is perfectly OK for you to respond with "git bisect bad 7",
> if you know 7 is bad.
>
> I _think_ the current code blindly overwrites the "bad" pointer,
> making the bisection state into this graph if you do so.
>
>G...1---2---3---4
> \
>  5---B

Yes, we keep only one "bad" pointer.

> This is very suboptimal.  The side branch 4-to-7 could be much
> longer than the original trunk 4-to-the-tip, in which case we would
> have made the suspect space _larger_, not smaller.

Yes, but the user is supposed to not change the "bad" pointer for no
good reason. For example maybe a mistake was made and the first commit
marked as "bad" was not actually bad.

> We certainly should be able to take advantage of the fact that the
> current "bad" commit (i.e. the child of 8) and the newly given "bad"
> commit (i.e. 7) are both known to be bad and mark 4 as "bad" instead
> when that happens, instead of doing the suboptimal thing the code
> currently does.

Yeah, we could do that, but we would have to allow it only if a
special option is passed on the command line, for example:

git bisect bad --alternate 

and/or we could make "git bisect bad" accept any number of bad commitishs.

That could give additional bonus points to the GSoC student who would
implement it :-)

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/14] opt_arg(): report errors parsing option values

2015-03-17 Thread Michael Haggerty

If an argument is there, but it can't be parsed as a non-positive
number, then die() rather than returning 0.

Signed-off-by: Michael Haggerty 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 77c7acb..03cdabf 100644
--- a/diff.c
+++ b/diff.c
@@ -3368,7 +3368,7 @@ static int opt_arg(const char *arg, int arg_short, const 
char *arg_long, int *va
if (!c)
return 1; /* optional argument was missing */
if (convert_i(arg, 10, val))
-   return 0;
+   die("The value for -%c must be a non-negative integer", 
arg_short);
return 1;
}
if (c != '-')
@@ -3381,7 +3381,7 @@ static int opt_arg(const char *arg, int arg_short, const 
char *arg_long, int *va
if (!*eq)
return 1; /* '=' and optional argument were missing */
if (convert_i(eq + 1, 10, val))
-   return 0;
+   die("The value for --%s must be a non-negative integer", 
arg_long);
return 1;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/14] handle_revision_opt(): use convert_ui() when handling "--abbrev="

2015-03-17 Thread Michael Haggerty

This adds error checking, where previously there was none. It also
disallows '+' and '-', leading whitespace, and trailing junk.

Signed-off-by: Michael Haggerty 
---
 revision.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/revision.c b/revision.c
index 4908e66..a6f7c2e 100644
--- a/revision.c
+++ b/revision.c
@@ -1963,7 +1963,8 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
} else if (!strcmp(arg, "--abbrev")) {
revs->abbrev = DEFAULT_ABBREV;
} else if (skip_prefix(arg, "--abbrev=", &arg)) {
-   revs->abbrev = strtoul(arg, NULL, 10);
+   if (convert_ui(arg, 10, &revs->abbrev))
+   die("--abbrev requires a non-negative integer 
argument");
if (revs->abbrev < MINIMUM_ABBREV)
revs->abbrev = MINIMUM_ABBREV;
else if (revs->abbrev > 40)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/14] diff_opt_parse(): use convert_i() when handling "-l"

2015-03-17 Thread Michael Haggerty

die() with an error message if the argument is not a non-negative
integer. This change tightens up parsing: '+' and '-', leading
whitespace, and trailing junk are all disallowed now.

Signed-off-by: Michael Haggerty 
---
 diff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index bc1e3c3..be389ae 100644
--- a/diff.c
+++ b/diff.c
@@ -3799,7 +3799,8 @@ int diff_opt_parse(struct diff_options *options, const 
char **av, int ac)
else if (!strcmp(arg, "-z"))
options->line_termination = 0;
else if ((argcount = short_opt('l', av, &optarg))) {
-   options->rename_limit = strtoul(optarg, NULL, 10);
+   if (convert_i(optarg, 10, &options->rename_limit))
+   die("-l requires a non-negative integer argument");
return argcount;
}
else if ((argcount = short_opt('S', av, &optarg))) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/14] opt_arg(): val is always non-NULL

2015-03-17 Thread Michael Haggerty

opt_arg() is never called with val set to NULL, so remove the code for
handling that eventuality (which anyway wasn't handled consistently in
the function).

Signed-off-by: Michael Haggerty 
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index a350677..6e3f498 100644
--- a/diff.c
+++ b/diff.c
@@ -3367,7 +3367,7 @@ static int opt_arg(const char *arg, int arg_short, const 
char *arg_long, int *va
c = *++arg;
if (!c)
return 1;
-   if (val && isdigit(c)) {
+   if (isdigit(c)) {
char *end;
int n = strtoul(arg, &end, 10);
if (*end)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/14] handle_revision_opt(): use convert_i() when handling "-"

2015-03-17 Thread Michael Haggerty

This tightens up the parsing a bit:

* Leading whitespace is no longer allowed
* '+' and '-' are no longer allowed

It also removes the need to check separately that max_count is
non-negative.

Signed-off-by: Michael Haggerty 
---
 revision.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/revision.c b/revision.c
index 25838fc..4908e66 100644
--- a/revision.c
+++ b/revision.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "numparse.h"
 #include "tag.h"
 #include "blob.h"
 #include "tree.h"
@@ -1709,8 +1710,7 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
return argcount;
} else if ((*arg == '-') && isdigit(arg[1])) {
/* accept -, like traditional "head" */
-   if (strtol_i(arg + 1, 10, &revs->max_count) < 0 ||
-   revs->max_count < 0)
+   if (convert_i(arg + 1, 10, &revs->max_count))
die("'%s': not a non-negative integer", arg + 1);
revs->no_walk = 0;
} else if (!strcmp(arg, "-n")) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/14] diff_opt_parse(): use convert_i() when handling --abbrev=

2015-03-17 Thread Michael Haggerty

die() with an error message if the argument is not a non-negative
integer. This change tightens up parsing: '+' and '-', leading
whitespace, and trailing junk are all disallowed now.

Signed-off-by: Michael Haggerty 
---
 diff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index be389ae..1cc5428 100644
--- a/diff.c
+++ b/diff.c
@@ -3830,7 +3830,8 @@ int diff_opt_parse(struct diff_options *options, const 
char **av, int ac)
else if (!strcmp(arg, "--abbrev"))
options->abbrev = DEFAULT_ABBREV;
else if (skip_prefix(arg, "--abbrev=", &arg)) {
-   options->abbrev = strtoul(arg, NULL, 10);
+   if (convert_i(arg, 10, &options->abbrev))
+   die("--abbrev requires an integer argument");
if (options->abbrev < MINIMUM_ABBREV)
options->abbrev = MINIMUM_ABBREV;
else if (40 < options->abbrev)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/14] opt_arg(): use convert_i() in implementation

2015-03-17 Thread Michael Haggerty

This shortens the code and avoids the old code's careless truncation
from unsigned long to int.

Signed-off-by: Michael Haggerty 
---
 diff.c | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/diff.c b/diff.c
index 6e3f498..77c7acb 100644
--- a/diff.c
+++ b/diff.c
@@ -3366,16 +3366,10 @@ static int opt_arg(const char *arg, int arg_short, 
const char *arg_long, int *va
if (c == arg_short) {
c = *++arg;
if (!c)
-   return 1;
-   if (isdigit(c)) {
-   char *end;
-   int n = strtoul(arg, &end, 10);
-   if (*end)
-   return 0;
-   *val = n;
-   return 1;
-   }
-   return 0;
+   return 1; /* optional argument was missing */
+   if (convert_i(arg, 10, val))
+   return 0;
+   return 1;
}
if (c != '-')
return 0;
@@ -3384,16 +3378,10 @@ static int opt_arg(const char *arg, int arg_short, 
const char *arg_long, int *va
len = eq - arg;
if (!len || strncmp(arg, arg_long, len))
return 0;
-   if (*eq) {
-   int n;
-   char *end;
-   if (!isdigit(*++eq))
-   return 0;
-   n = strtoul(eq, &end, 10);
-   if (*end)
-   return 0;
-   *val = n;
-   }
+   if (!*eq)
+   return 1; /* '=' and optional argument were missing */
+   if (convert_i(eq + 1, 10, val))
+   return 0;
return 1;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/14] numparse: new module for parsing integral numbers

2015-03-17 Thread Michael Haggerty

Implement wrappers for strtol() and strtoul() that are safer and more
convenient to use.

Signed-off-by: Michael Haggerty 
---
 Makefile   |   1 +
 numparse.c | 180 +
 numparse.h | 207 +
 3 files changed, 388 insertions(+)
 create mode 100644 numparse.c
 create mode 100644 numparse.h

diff --git a/Makefile b/Makefile
index 44f1dd1..6c0cfcc 100644
--- a/Makefile
+++ b/Makefile
@@ -732,6 +732,7 @@ LIB_OBJS += notes.o
 LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
+LIB_OBJS += numparse.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
diff --git a/numparse.c b/numparse.c
new file mode 100644
index 000..90b44ce
--- /dev/null
+++ b/numparse.c
@@ -0,0 +1,180 @@
+#include "git-compat-util.h"
+#include "numparse.h"
+
+#define NUM_NEGATIVE (1 << 16)
+
+
+static int parse_precheck(const char *s, unsigned int *flags)
+{
+   const char *number;
+
+   if (isspace(*s)) {
+   if (!(*flags & NUM_LEADING_WHITESPACE))
+   return -NUM_LEADING_WHITESPACE;
+   do {
+   s++;
+   } while (isspace(*s));
+   }
+
+   if (*s == '+') {
+   if (!(*flags & NUM_PLUS))
+   return -NUM_PLUS;
+   number = s + 1;
+   *flags &= ~NUM_NEGATIVE;
+   } else if (*s == '-') {
+   if (!(*flags & NUM_MINUS))
+   return -NUM_MINUS;
+   number = s + 1;
+   *flags |= NUM_NEGATIVE;
+   } else {
+   number = s;
+   *flags &= ~NUM_NEGATIVE;
+   }
+
+   if (!(*flags & NUM_BASE_SPECIFIER)) {
+   int base = *flags & NUM_BASE_MASK;
+   if (base == 0) {
+   /* This is a pointless combination of options. */
+   die("BUG: base=0 specified without NUM_BASE_SPECIFIER");
+   } else if (base == 16 && starts_with(number, "0x")) {
+   /*
+* We want to treat this as zero terminated by
+* an 'x', whereas strtol()/strtoul() would
+* silently eat the "0x". We accomplish this
+* by treating it as a base 10 number:
+*/
+   *flags = (*flags & ~NUM_BASE_MASK) | 10;
+   }
+   }
+   return 0;
+}
+
+int parse_l(const char *s, unsigned int flags, long *result, char **endptr)
+{
+   long l;
+   const char *end;
+   int err = 0;
+
+   err = parse_precheck(s, &flags);
+   if (err)
+   return err;
+
+   /*
+* Now let strtol() do the heavy lifting:
+*/
+   errno = 0;
+   l = strtol(s, (char **)&end, flags & NUM_BASE_MASK);
+   if (errno) {
+   if (errno == ERANGE) {
+   if (!(flags & NUM_SATURATE))
+   return -NUM_SATURATE;
+   } else {
+   return -NUM_OTHER_ERROR;
+   }
+   }
+   if (end == s)
+   return -NUM_NO_DIGITS;
+
+   if (*end && !(flags & NUM_TRAILING))
+   return -NUM_TRAILING;
+
+   /* Everything was OK */
+   *result = l;
+   if (endptr)
+   *endptr = (char *)end;
+   return 0;
+}
+
+int parse_ul(const char *s, unsigned int flags,
+unsigned long *result, char **endptr)
+{
+   unsigned long ul;
+   const char *end;
+   int err = 0;
+
+   err = parse_precheck(s, &flags);
+   if (err)
+   return err;
+
+   /*
+* Now let strtoul() do the heavy lifting:
+*/
+   errno = 0;
+   ul = strtoul(s, (char **)&end, flags & NUM_BASE_MASK);
+   if (errno) {
+   if (errno == ERANGE) {
+   if (!(flags & NUM_SATURATE))
+   return -NUM_SATURATE;
+   } else {
+   return -NUM_OTHER_ERROR;
+   }
+   }
+   if (end == s)
+   return -NUM_NO_DIGITS;
+
+   /*
+* strtoul(), perversely, accepts negative numbers, converting
+* them to the positive number with the same bit pattern. We
+* don't ever want that.
+*/
+   if ((flags & NUM_NEGATIVE) && ul) {
+   if (!(flags & NUM_SATURATE))
+   return -NUM_SATURATE;
+   ul = 0;
+   }
+
+   if (*end && !(flags & NUM_TRAILING))
+   return -NUM_TRAILING;
+
+   /* Everything was OK */
+   *result = ul;
+   if (endptr)
+   *endptr = (char *)end;
+   return 0;
+}
+
+int parse_i(const char *s, unsigned int flags, int *result, char **endptr)
+{
+   long l;
+   int err;
+   char *end;
+
+   err = parse_l(s, flags, &l, &end);
+

[PATCH 12/14] opt_arg(): simplify pointer handling

2015-03-17 Thread Michael Haggerty

Increment "arg" when a character is consumed, not just before consuming
the next character.

Signed-off-by: Michael Haggerty 
---
 diff.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 03cdabf..bc1e3c3 100644
--- a/diff.c
+++ b/diff.c
@@ -3358,14 +3358,13 @@ static int opt_arg(const char *arg, int arg_short, 
const char *arg_long, int *va
char c, *eq;
int len;
 
-   if (*arg != '-')
+   if (*arg++ != '-')
return 0;
-   c = *++arg;
+   c = *arg++;
if (!c)
return 0;
if (c == arg_short) {
-   c = *++arg;
-   if (!c)
+   if (!*arg)
return 1; /* optional argument was missing */
if (convert_i(arg, 10, val))
die("The value for -%c must be a non-negative integer", 
arg_short);
@@ -3373,7 +3372,6 @@ static int opt_arg(const char *arg, int arg_short, const 
char *arg_long, int *va
}
if (c != '-')
return 0;
-   arg++;
eq = strchrnul(arg, '=');
len = eq - arg;
if (!len || strncmp(arg, arg_long, len))
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/14] builtin_diff(): detect errors when parsing --unified argument

2015-03-17 Thread Michael Haggerty

The previous code used strtoul() without any checks that it succeeded.
Instead use convert_l(), in strict mode, and die() if there is an
error. This tightens up the parsing:

* Leading whitespace is no longer allowed
* '+' and '-' are no longer allowed
* Trailing junk is not allowed

Signed-off-by: Michael Haggerty 
---
 diff.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index abc32c8..a350677 100644
--- a/diff.c
+++ b/diff.c
@@ -2,6 +2,7 @@
  * Copyright (C) 2005 Junio C Hamano
  */
 #include "cache.h"
+#include "numparse.h"
 #include "quote.h"
 #include "diff.h"
 #include "diffcore.h"
@@ -2393,12 +2394,12 @@ static void builtin_diff(const char *name_a,
xecfg.flags |= XDL_EMIT_FUNCCONTEXT;
if (pe)
xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
-   if (!diffopts)
-   ;
-   else if (skip_prefix(diffopts, "--unified=", &v))
-   xecfg.ctxlen = strtoul(v, NULL, 10);
-   else if (skip_prefix(diffopts, "-u", &v))
-   xecfg.ctxlen = strtoul(v, NULL, 10);
+   if (diffopts
+   && (skip_prefix(diffopts, "--unified=", &v) ||
+   skip_prefix(diffopts, "-u", &v))) {
+   if (convert_l(v, 10, &xecfg.ctxlen))
+   die("--unified argument must be a non-negative 
integer");
+   }
if (o->word_diff)
init_diff_words_data(&ecbdata, o, one, two);
xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/14] handle_revision_opt(): use skip_prefix() in many places

2015-03-17 Thread Michael Haggerty

This reduces the need for "magic numbers".

Signed-off-by: Michael Haggerty 
---
 revision.c | 59 ++-
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/revision.c b/revision.c
index 66520c6..25838fc 100644
--- a/revision.c
+++ b/revision.c
@@ -1719,8 +1719,8 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
revs->max_count = atoi(argv[1]);
revs->no_walk = 0;
return 2;
-   } else if (starts_with(arg, "-n")) {
-   revs->max_count = atoi(arg + 2);
+   } else if (skip_prefix(arg, "-n", &arg)) {
+   revs->max_count = atoi(arg);
revs->no_walk = 0;
} else if ((argcount = parse_long_opt("max-age", argv, &optarg))) {
revs->max_age = atoi(optarg);
@@ -1779,15 +1779,15 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
} else if (!strcmp(arg, "--author-date-order")) {
revs->sort_order = REV_SORT_BY_AUTHOR_DATE;
revs->topo_order = 1;
-   } else if (starts_with(arg, "--early-output")) {
+   } else if (skip_prefix(arg, "--early-output", &arg)) {
int count = 100;
-   switch (arg[14]) {
+   switch (*arg++) {
case '=':
-   count = atoi(arg+15);
+   count = atoi(arg);
/* Fallthrough */
case 0:
revs->topo_order = 1;
-  revs->early_output = count;
+   revs->early_output = count;
}
} else if (!strcmp(arg, "--parents")) {
revs->rewrite_parents = 1;
@@ -1804,12 +1804,12 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
revs->min_parents = 2;
} else if (!strcmp(arg, "--no-merges")) {
revs->max_parents = 1;
-   } else if (starts_with(arg, "--min-parents=")) {
-   revs->min_parents = atoi(arg+14);
+   } else if (skip_prefix(arg, "--min-parents=", &arg)) {
+   revs->min_parents = atoi(arg);
} else if (starts_with(arg, "--no-min-parents")) {
revs->min_parents = 0;
-   } else if (starts_with(arg, "--max-parents=")) {
-   revs->max_parents = atoi(arg+14);
+   } else if (skip_prefix(arg, "--max-parents=", &arg)) {
+   revs->max_parents = atoi(arg);
} else if (starts_with(arg, "--no-max-parents")) {
revs->max_parents = -1;
} else if (!strcmp(arg, "--boundary")) {
@@ -1891,26 +1891,27 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
revs->verbose_header = 1;
revs->pretty_given = 1;
get_commit_format(NULL, revs);
-   } else if (starts_with(arg, "--pretty=") || starts_with(arg, 
"--format=")) {
+   } else if (skip_prefix(arg, "--pretty=", &arg) ||
+  skip_prefix(arg, "--format=", &arg)) {
/*
 * Detached form ("--pretty X" as opposed to "--pretty=X")
 * not allowed, since the argument is optional.
 */
revs->verbose_header = 1;
revs->pretty_given = 1;
-   get_commit_format(arg+9, revs);
+   get_commit_format(arg, revs);
} else if (!strcmp(arg, "--show-notes") || !strcmp(arg, "--notes")) {
revs->show_notes = 1;
revs->show_notes_given = 1;
revs->notes_opt.use_default_notes = 1;
} else if (!strcmp(arg, "--show-signature")) {
revs->show_signature = 1;
-   } else if (!strcmp(arg, "--show-linear-break") ||
-  starts_with(arg, "--show-linear-break=")) {
-   if (starts_with(arg, "--show-linear-break="))
-   revs->break_bar = xstrdup(arg + 20);
-   else
-   revs->break_bar = "..";
+   } else if (!strcmp(arg, "--show-linear-break")) {
+   revs->break_bar = "..";
+   revs->track_linear = 1;
+   revs->track_first_time = 1;
+   } else if (skip_prefix(arg, "--show-linear-break=", &arg)) {
+   revs->break_bar = xstrdup(arg);
revs->track_linear = 1;
revs->track_first_time = 1;
} else if (starts_with(arg, "--show-notes=") ||
@@ -1961,8 +1962,8 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
revs->abbrev = 0;
} else if (!strcmp(arg, "--abbrev")) {
revs->abbrev = DEFAULT_ABBREV;
-   } else if (starts_with(arg, "--abbrev=")) {
-   revs->abbrev = strtoul(arg + 9, NULL, 10);
+   } else if (skip_prefix(arg,

[PATCH 03/14] write_subdirectory(): use convert_ui() for parsing mode

2015-03-17 Thread Michael Haggerty

Use convert_ui() instead of strtoul_ui() when parsing tree entries'
modes. This tightens up the parsing a bit:

* Leading whitespace is no longer allowed
* '+' and '-' are no longer allowed

Signed-off-by: Michael Haggerty 
---
 contrib/convert-objects/convert-objects.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/convert-objects/convert-objects.c 
b/contrib/convert-objects/convert-objects.c
index f3b57bf..4f484a4 100644
--- a/contrib/convert-objects/convert-objects.c
+++ b/contrib/convert-objects/convert-objects.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "numparse.h"
 #include "blob.h"
 #include "commit.h"
 #include "tree.h"
@@ -88,7 +89,7 @@ static int write_subdirectory(void *buffer, unsigned long 
size, const char *base
unsigned int mode;
char *slash, *origpath;
 
-   if (!path || strtoul_ui(buffer, 8, &mode))
+   if (!path || convert_ui(buffer, 8, &mode))
die("bad tree conversion");
mode = convert_mode(mode);
path++;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/14] strtoul_ui(), strtol_i(): remove functions

2015-03-17 Thread Michael Haggerty

Their callers have been changed to use the numparse module.

Signed-off-by: Michael Haggerty 
---
 git-compat-util.h | 26 --
 1 file changed, 26 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index a3095be..cbe7f16 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -751,32 +751,6 @@ static inline int sane_iscase(int x, int is_lower)
return (x & 0x20) == 0;
 }
 
-static inline int strtoul_ui(char const *s, int base, unsigned int *result)
-{
-   unsigned long ul;
-   char *p;
-
-   errno = 0;
-   ul = strtoul(s, &p, base);
-   if (errno || *p || p == s || (unsigned int) ul != ul)
-   return -1;
-   *result = ul;
-   return 0;
-}
-
-static inline int strtol_i(char const *s, int base, int *result)
-{
-   long ul;
-   char *p;
-
-   errno = 0;
-   ul = strtol(s, &p, base);
-   if (errno || *p || p == s || (int) ul != ul)
-   return -1;
-   *result = ul;
-   return 0;
-}
-
 #ifdef INTERNAL_QSORT
 void git_qsort(void *base, size_t nmemb, size_t size,
   int(*compar)(const void *, const void *));
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/14] numparse module: systematically tighten up integer parsing

2015-03-17 Thread Michael Haggerty

Sorry for the long email. Feel free to skip over the background info
and continue reading at "My solution" below.

This odyssey started with a typo:

$ git shortlog -snl v2.2.0..v2.3.0
 14795  Junio C Hamano
  1658  Jeff King
  1400  Shawn O. Pearce
  1109  Linus Torvalds
   760  Jonathan Nieder
[...]

It took me a minute to realize why so many commits are listed. It
turns out that "-l", which I added by accident, requires an integer
argument, but it happily swallows "v2.2.0..v2.3.0" without error. This
leaves no range argument, so the command traverses the entire history
of HEAD.

The relevant code is

else if ((argcount = short_opt('l', av, &optarg))) {
options->rename_limit = strtoul(optarg, NULL, 10);
return argcount;
}

, which is broken in many ways. First of all, strtoul() is way too
permissive for simple tasks like this:

* It allows leading whitespace.

* It allows arbitrary trailing characters.

* It allows a leading sign character ('+' or '-'), even though the
  result is unsigned.

* If the string doesn't contain a number at all, it sets its "endptr"
  argument to point at the start of the string but doesn't set errno.

* If the value is larger than fits in an unsigned long, it returns the
  value clamped to the range 0..ULONG_MAX (setting errno to ERANGE).

* If the value is between -ULONG_MAX and 0, it returns the positive
  integer with the same bit pattern, without setting errno(!) (I can
  hardly think of a scenario where this would be useful.)

* For base=0 (autodetect base), it allows a base specifier prefix "0x"
  or "0" and parses the number accordingly. For base=16 it also allows
  a "0x" prefix.

strtol() has similar foibles.

The caller compounds the permissivity of strtoul() with further sins:

* It does absolutely no error detection.

* It assigns the return value, which is an unsigned long, to an int
  value. This could truncate the result, perhaps even resulting in
  rename_limit being set to a negative value.

When I looked around, I found scores of sites that call atoi(),
strtoul(), and strtol() carelessly. And it's understandable. Calling
any of these functions safely is so much work as to be completely
impractical in day-to-day code.

git-compat-util.h has two functions, strtoul_ui() and strtol_i(), that
try to make parsing integers a little bit easier. But they are only
used in a few places, they hardly restrict the pathological
flexibility of strtoul()/strtol(), strtoul_ui() doesn't implement
clamping consistently when converting from unsigned long to unsigned
int, and neither function can be used when the integer value *is*
followed by other characters.


My solution: the numparse module

So I hereby propose a new module, numparse, to make it easier to parse
integral values from strings in a convenient, safe, and flexible way.
The first commit introduces the new module, and subsequent commits
change a sample (a small fraction!) of call sites to use the new
functions. Consider it a proof-of-concept. If people are OK with this
approach, I will continue sending patches to fix other call sites. (I
already have another two dozen such patches in my repo).

Please see the docstrings in numparse.h in the first commit for
detailed API docs. Briefly, there are eight new functions:

parse_{l,ul,i,ui}(const char *s, unsigned int flags,
  T *result, char **endptr);
convert_{l,ul,i,ui}(const char *s, unsigned int flags, T *result);

The parse_*() functions are for parsing a number from the start of a
string; the convert_*() functions are for parsing a string that
consists of a single number. The flags argument selects not only the
base of the number, but also which of strtol()/strtoul()'s many
features should be allowed. The *_i() and *.ui() functions are for
parsing int and unsigned int values; they are careful about how they
truncate the corresponding long values. The functions all return 0 on
success and a negative number on error.

Here are a few examples of how these functions can be used (taken from
the header of numparse.h):

* Convert hexadecimal string s into an unsigned int. Die if there are
  any characters in s besides hexadecimal digits, or if the result
  exceeds the range of an unsigned int:

if (convert_ui(s, 16, &result))
die("...");

* Read a base-ten long number from the front of a string, allowing
  sign characters and setting endptr to point at any trailing
  characters:

if (parse_l(s, 10 | NUM_SIGN | NUM_TRAILING, &result, &endptr))
die("...");

* Convert decimal string s into a signed int, but not allowing the
  string to contain a '+' or '-' prefix (and thereby indirectly
  ensuring that the result will be non-negative):

if (convert_i(s, 10, &result))
die("...");

* Convert s into a signed int, interpreting prefix "0x" to mean
  hexadecimal and "0" to mean octal. If the value doesn't fit in

[PATCH 02/14] cacheinfo_callback(): use convert_ui() when handling "--cacheinfo"

2015-03-17 Thread Michael Haggerty

Use convert_ui() instead of strtoul_ui() to parse the  argument.
This tightens up the parsing a bit:

* Leading whitespace is no longer allowed
* '+' and '-' are no longer allowed

Signed-off-by: Michael Haggerty 
---
 builtin/update-index.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 5878986..845e944 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -4,6 +4,7 @@
  * Copyright (C) Linus Torvalds, 2005
  */
 #include "cache.h"
+#include "numparse.h"
 #include "lockfile.h"
 #include "quote.h"
 #include "cache-tree.h"
@@ -672,7 +673,7 @@ static int cacheinfo_callback(struct parse_opt_ctx_t *ctx,
}
if (ctx->argc <= 3)
return error("option 'cacheinfo' expects ,,");
-   if (strtoul_ui(*++ctx->argv, 8, &mode) ||
+   if (convert_ui(*++ctx->argv, 8, &mode) ||
get_sha1_hex(*++ctx->argv, sha1) ||
add_cacheinfo(mode, sha1, *++ctx->argv, 0))
die("git update-index: --cacheinfo cannot add %s", *ctx->argv);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] diff-lib.c: adjust position of i-t-a entries in diff

2015-03-17 Thread Duy Nguyen

On Mon, Mar 16, 2015 at 09:05:45AM -0700, Junio C Hamano wrote:
> The offending one came from eec3e7e4 (cache-tree: invalidate i-t-a
> paths after generating trees, 2012-12-16), which was a fix to an
> earlier bug where a cache-tree written out of an index with i-t-a
> entries had incorrect information and still claimed it is fully
> valid after write-tree rebuilt it.  The test probably should add
> another path without i-t-a bit, run the same "diff --cached" with
> updated expectation before write-tre, and run the "diff --cached"
> again to make sure it produces a result that match the updated
> expectation.

Would adding another non-i-t-a entry help? Before this patch
"diff --cached" after write-tree shows the i-t-a entry only when
eec3e7e4 is applied. But with this patch we don't show i-t-a entry any
more, before or after write-tree, eec3e7e4 makes no visible difference.

We could even revert eec3e7e4 and the outcome of "diff --cached" would
be the same because we just sort of move the "invalidation" part from
cache-tree to do_oneway_diff(). Not invalidating would speed up "diff
--cached" when i-t-a entries are present. Still it may be a good idea
to invalidate i-t-a paths to be on the safe side. Perhaps a patch like
this to resurrect the test?

-- 8< --
Subject: t2203: enable 'cache-tree invalidates i-t-a paths' test

This test is disabled in the previous patch because the "diff
--cached" expectation is the same, with or without eec3e7e4 [1] where
this test is added.

But it still is a good idea to invalidate i-t-a paths because the
index does _not_ match the cached-tree exactly.  "diff --cached" may
not care, but other operations might. Update the test to catch this
invalidation instead.

[1] cache-tree: invalidate i-t-a paths after generating trees, 2012-12-16

Signed-off-by: Nguyễn Thái Ngọc Duy 

diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index 42827b8..1a6c814 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -77,7 +77,7 @@ test_expect_success 'can "commit -a" with an i-t-a entry' '
git commit -a -m all
 '
 
-test_expect_failure 'cache-tree invalidates i-t-a paths' '
+test_expect_success 'cache-tree invalidates i-t-a paths' '
git reset --hard &&
mkdir dir &&
: >dir/foo &&
@@ -86,14 +86,14 @@ test_expect_failure 'cache-tree invalidates i-t-a paths' '
 
: >dir/bar &&
git add -N dir/bar &&
-   git diff --cached --name-only >actual &&
-   echo dir/bar >expect &&
-   test_cmp expect actual &&
-
git write-tree >/dev/null &&
-
-   git diff --cached --name-only >actual &&
-   echo dir/bar >expect &&
+   test-dump-cache-tree >actual &&
+   cat >expect <<-\EOF &&
+   invalid   (1 subtrees)
+   invalid  #(ref)  (1 subtrees)
+   invalid  dir/ (0 subtrees)
+   invalid  #(ref) dir/ (0 subtrees)
+   EOF
test_cmp expect actual
 ' 
-- 8< --
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH/RFC/GSOC] make git-pull a builtin

2015-03-17 Thread Paul Tan

The git-pull command is currently implemented by the shell script
git-pull.sh. However, git-pull.sh suffers from some problems:

* Inconsistent command line interface: since git-pull.sh does not use
  git's internal parse-options API, it's command line interface is
  inconsistent with other git builtins. For example, the option --st
  should throw an ambigious option error as it could mean --stat or
  --strategy, but git-pull.sh is hardcoded to assume --st resolves to
  --strategy.  It also does not list the command line options in its
  usage help.

* Inefficient: shell scripts can only access git functionality through
  spawning git processes, which has a relatively high overhead on
  certain operating systems such as Windows. Furthermore, the repeated
  starting up of git means that commonly-used files such config files,
  the index file and often accessed objects/packfiles are repeatedly
  read, which can cause poor performance on filesystems or operating
  systems with slow IO operations.

* Makes it difficult to port git: git-pull.sh requires a posix
  conformant shell and utilities such as sed and tr. This makes it
  harder to port git to non-posix environments.

By making git-pull a builtin, it has access to git's internal API and
thus solves the above problems:

* Usage of parse-options API means greater consistency of the
  command-line interface with other git commands.

* No spawning of external git processes: internal API and builtins are
  called directly. This also means that they use the same underlying
  object/config cache, which reduces needless IO/parsing operations.

* Low-level control over code execution and thus greater efficiency. For
  example commit SHA1s now only need to be parsed from a string once
  (when resolving branch refs), while as a shell script they would have
  to be stored as strings and parsed over and over again.

Furthermore, this reduces git's dependence on posix shells and
utilities, which would be very helpful in porting git to non-posix
environments such as Windows.

Signed-off-by: Paul Tan 
---

Hi, I would like to share this very rough prototype with everyone. All
the relevant tests (t5520-t5572) pass.

I did a quick benchmark on my Linux and Windows systems, which have
nearly the same hardware, by running 'git pull' 100 times on a repo and
its clone with 1 commit. On Linux, there is negligible performance
difference, but on Windows the runtime fell from 8m 25s to 1m 3s. The
timings should not be affected by any msys2 emulation as the git
processes were spawned from a native executable. I started this as a
just-for-fun exercise to learn about the git internal API, but seeing as
the results are very encouraging, I'm tempted to take on this project
for GSoC.

Of course, it would be easy to make any shell script a builtin by using
the run_command* functions, but I feel that that would not take
advantage of the potential benefits in using git's internal API
directly. So, I set myself a few requirements for the ideal end result:

* zero spawning of processes so that the internal object, index and
config cache can be taken advantage of. Some operating systems (like
Windows) also have a relatively large overhead when spawning
processes[1].

* avoid needless parsing/string conversions by directly accessing the C
data structures. Massaging of strings cause too much code churn, so
functions that take the richer C data structures as input should be
preferred over functions that work by parsing string arguments (e.g.
prefer using the run_diff_files() function instead of calling
cmd_diff_files or spawning a git diff-files process)

* use the internal API as much as possible: share code between the
builtins (e.g. builtin fmt-merge-msg, exposed in fmt-merge-msg.h)
in order to reduce code complexity. For example, the fork_point() code
in this patch should be shared with the merge-base builtin.

[1] http://stackoverflow.com/questions/10710912

However, there were some problems encountered when trying to fulfill the
requirements above:

* Many builtins only provide their main entry point to access their
functionality -- this means that argument options are parsed, put back
together, and then re-parsed again, which is quite inefficient and can
create a lot of code complexity. A better solution would be to expose
the functionality of these builtins as a separate function without
argument parsing, similar to how fmt_merge_msg is implemented.

* Some buitins do not expect to be called by their cmd_* functions
through code paths other than git.c. For example, fetch.c actually
frees the strings it pulls from argv. This patch works around these
functions by not attempting to free memory passed to these functions,
but a more comprehensive solution would be to investigate and fix
these functions. (Or expose their functionality in as a separate function)

* Finally, there is the possibility that in the process of conversion
bugs or incompatible behavioral changes may be introduced whic

Re: [PATCH v4 1/2] cat-file: teach cat-file a '--literally' option

2015-03-17 Thread Karthik Nayak



On March 17, 2015 12:21:33 PM GMT+05:30, Eric Sunshine 
 wrote:
>On Tue, Mar 17, 2015 at 1:16 AM, Karthik Nayak 
>wrote:
>> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
>> index df99df4..1625246 100644
>> --- a/builtin/cat-file.c
>> +++ b/builtin/cat-file.c
>> @@ -323,8 +338,8 @@ static int batch_objects(struct batch_options
>*opt)
>>  }
>
>I don't presently have time for a proper read-through, however, this
>popped out while quickly running my eyes over the changes.
>
>>  static const char * const cat_file_usage[] = {
>> -   N_("git cat-file (-t | -s | -e | -p |  | --textconv)
>"),
>> -   N_("git cat-file (--batch | --batch-check) <
>"),
>> +   N_("git cat-file (-t [--literally]|-s
>[--literally]|-e|-p||--textconv) "),
>> +   N_("git cat-file (--batch | --batch-check)
>"),
>
>The '<' in the second line before  is intentional. It
>means that  is provided via stdin. Therefore, it is
>incorrect to remove it.
I see. It seemed out of place. Makes sense, probably should have asked. 

Thanks.
>
>> NULL
>>  };
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/GSoC/RFC] fetch: git fetch --deepen

2015-03-17 Thread Dongcan Jiang

> C Git is not the only client that can talk to upload-pack. A buggy
> client may send both. The least we need is make sure it would not
> crash or break the server security in any way. But if we have to
> consider that, we may as well reject with a clear message, which would
> help the client writer. And speaking of clients..

Actually, I mean that the upload-pack in this patch will not crash,
whatever it receives.
e.g. "depth N", "depth_deepen", "depth N"+"depth_deepen"

Because "depth_deepen" is just a boolean signal to tell the upload-pack
that "depth N" means "deepen N", it takes effect only when "depth N"
is given. Otherwise, "depth_deepen" will be ignored.


> You missed Junio's keyword "existing". Your new client will work well
> with your new server. But it breaks "git clone --depth" (or "git fetch
> --depth") for all existing git installations.

Oh, thank you for pointing out this. And sorry Junio. I misunderstood
what you said. I haven't realized the problem of existing git server.
You've reminded me of the importance of compatibility. Thank you!

2015-03-17 18:44 GMT+08:00 Duy Nguyen :
> On Mon, Mar 16, 2015 at 11:10 PM, Dongcan Jiang  
> wrote:
>> Hi Duy,
>>
>> Sorry for my late response.
>>
>>> we need to make sure that upload-pack barf if some client sends both 
>>> "deepen" and
>>> "depth".
>>
>> Actually, in my current design, when client just wants "depth", it
>> sends "depth N";
>> when it want "deepen", it sends "depth N" as well as "depth_deepen".
>> For the latter
>> case, it tells upload-pack to collect objects for "deepen N".
>>
>> Thus, I'm not so sure whether upload-pack needs to check their exclusion.
>
> C Git is not the only client that can talk to upload-pack. A buggy
> client may send both. The least we need is make sure it would not
> crash or break the server security in any way. But if we have to
> consider that, we may as well reject with a clear message, which would
> help the client writer. And speaking of clients..
>
> On Mon, Mar 16, 2015 at 11:08 PM, Dongcan Jiang  
> wrote:
 - if (starts_with(line, "deepen ")) {
 + if (starts_with(line, "depth ")) {
   char *end;
 - depth = strtol(line + 7, &end, 0);
 - if (end == line + 7 || depth <= 0)
 - die("Invalid deepen: %s", line);
 + depth = strtol(line + 6, &end, 0);
 + if (end == line + 6 || depth <= 0)
 + die("Invalid depth: %s", line);
   continue;
   }
   if (!starts_with(line, "want ") ||
>>>
>>> I do not quite understand this hunk.  What happens when this program
>>> is talking to an existing fetch-pack that requested "deepen"?
>>
>> In this hunk, I actually just changed the one command name in upload-pack
>> service from "deepen" to "depth". I made this change because the name
>> "deepen" here is quite misleading, as it just means "depth". Of course,
>> I've changed the caller's sent pack from "deepen" to "depth" too (See below).
>
> You missed Junio's keyword "existing". Your new client will work well
> with your new server. But it breaks "git clone --depth" (or "git fetch
> --depth") for all existing git installations.
> --
> Duy



-- 
江东灿（Dongcan Jiang）
Team of Search Engine & Web Mining
School of Electronic Engineering & Computer Science
Peking University, Beijing, 100871, P.R.China
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GSoC] Applying for conversion scripts to builtins

2015-03-17 Thread Johannes Schindelin

Hi Paul,

On 2015-03-17 01:22, Paul Tan wrote:

> On Tue, Mar 17, 2015 at 12:49 AM, Yurii Shevtsov  wrote:
>
> Generally, it would be easy to convert any shell script to C by just
> using the run_command* functions (and in less lines of code), but that
> would not be taking advantage of the potential benefits in porting
> shell scripts to C. To summarize the (ideal) requirements:
> 
> * zero spawning of processes so that the internal object/config/index
> cache can be taken advantage of. (and to avoid the process spawning
> overhead which is relative large in e.g. Windows)

Spawning definitely uses up many more resources on Windows.

However, spawning a full-fledged Bash requires MSys (or soon MSys2) to spin up 
an entire POSIX emulation layer. This costs us dearly. For example, when I run 
the t3404 test (which exercises scripting heavily, what with `git rebase -i` 
being implemented as a shell script) on MacOSX, it takes roughly a minute to 
complete. On a comparable Windows machine, it takes roughly 12 minutes to 
complete.

Therefore, I would wager a bet that just the mere conversion of a shell script 
into even a primitive `run_command()`-based builtin would help performance on 
Windows in a noticeable manner.

Of course, it would be *even nicer* to avoid the spawning altogether.

> * avoid needless parsing since we have direct access to the C data
> structures.

True that. Turning SHA-1s into strings, spawning, and reparsing the same SHA-1 
is quite a lot of unnecessary churn.

The biggest benefit of avoiding needless parsing, however, is not performance. 
It is avoiding quoting issues. This is particularly so on Windows, where Git is 
sometimes called from outside a shell environment, where we have to deal with 
inconsistent quoting because it is every Windows program's own job to parse the 
command-line, including the quoting.

> * use the internal API as much as possible: share code between the
> builtins (e.g. fmt-merge-msg.c, exposed in fmt-merge-msg.h) in order
> to reduce code complexity.

That is definitely something that even the Git maintainer should be interested 
in (he does not touch Windows, therefore the performance differences do not 
concern him): by sharing code paths between different subcommands, you ensure 
that you have to fix problems only once, not twice or more.

Concrete example: on Windows, we have file locking issues because files that 
are in use cannot be deleted. For that reason, we have Windows-specific code 
that is "nice" by trying harder to delete files, giving programs a little time 
to let their locks go. This locking issue happens also when a virus scanner 
"uses", say, the .git-rewrite/revs file that was written by `git 
filter-branch`, while said shell script already wants to delete the file 
because it is obsolete. If `git filter-branch` were a builtin, the bug would 
already be fixed due to our override of the `unlink()` function in C. Now we 
have to fix that bug separately because `filter-branch` is a shell script.

> The biggest wins would definitely be portability, but there may be
> performance improvements, though they are theoretical at this point.
> 
> I'm not exactly sure if the above requirements are sane, which is why
> I'm also CC-ing Dscho who knows the problems of git on Windows more
> than I do.

Thanks for bringing this to my attention. I hope I managed to add useful 
information to the discussion.

Ciao,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/GSoC/RFC] fetch: git fetch --deepen

2015-03-17 Thread Duy Nguyen

On Mon, Mar 16, 2015 at 11:10 PM, Dongcan Jiang  wrote:
> Hi Duy,
>
> Sorry for my late response.
>
>> we need to make sure that upload-pack barf if some client sends both 
>> "deepen" and
>> "depth".
>
> Actually, in my current design, when client just wants "depth", it
> sends "depth N";
> when it want "deepen", it sends "depth N" as well as "depth_deepen".
> For the latter
> case, it tells upload-pack to collect objects for "deepen N".
>
> Thus, I'm not so sure whether upload-pack needs to check their exclusion.

C Git is not the only client that can talk to upload-pack. A buggy
client may send both. The least we need is make sure it would not
crash or break the server security in any way. But if we have to
consider that, we may as well reject with a clear message, which would
help the client writer. And speaking of clients..

On Mon, Mar 16, 2015 at 11:08 PM, Dongcan Jiang  wrote:
>>> - if (starts_with(line, "deepen ")) {
>>> + if (starts_with(line, "depth ")) {
>>>   char *end;
>>> - depth = strtol(line + 7, &end, 0);
>>> - if (end == line + 7 || depth <= 0)
>>> - die("Invalid deepen: %s", line);
>>> + depth = strtol(line + 6, &end, 0);
>>> + if (end == line + 6 || depth <= 0)
>>> + die("Invalid depth: %s", line);
>>>   continue;
>>>   }
>>>   if (!starts_with(line, "want ") ||
>>
>> I do not quite understand this hunk.  What happens when this program
>> is talking to an existing fetch-pack that requested "deepen"?
>
> In this hunk, I actually just changed the one command name in upload-pack
> service from "deepen" to "depth". I made this change because the name
> "deepen" here is quite misleading, as it just means "depth". Of course,
> I've changed the caller's sent pack from "deepen" to "depth" too (See below).

You missed Junio's keyword "existing". Your new client will work well
with your new server. But it breaks "git clone --depth" (or "git fetch
--depth") for all existing git installations.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Promoting Git developers

2015-03-17 Thread Thomas Ferris Nicolaisen

On Sun, Mar 15, 2015 at 9:46 AM, Christian Couder
 wrote:
>
> I wrote something about a potential Git Rev News news letter:
>
> https://github.com/git/git.github.io/pull/15
>

I would love to have/use something like this in the GitMinutes
podcast. Perhaps in addition to the very random interview format that
I have now, I could do a more regular episodes about Git news, where I
incorporate this.

I also volunteer to help with the production, if you'll allow list
lurkers like myself to contribute ;)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] git submodule purge

2015-03-17 Thread Fredrik Gustafsson

On Tue, Mar 17, 2015 at 09:18:26AM +0100, Patrick Steinhardt wrote:
> Is it even possible to create a new submodule without any
> upstream repository? At least `git submodule init` does not work
> without a corresponding entry in .gitmodules which the user would
> have needed to create himself manually. In this case one _could_
> assume that the user knows what he is doing and expect him not to
> call `submodule purge` (or whatever the command will be called)
> on the authoritative copy. Other than that I've got no idea how
> to assure safety.

Look at git/t/t7400-submodule-basic.sh for example at the test starting
at line 84 on how to add a submodule without any upstream.

Git has already a disadvantage against other SCM (like mercurial)
because it's "too easy to loose data with git". Meaning that we purge
unrefered commits. (Yes this is up to debate if this is good or bad, but
here's not the place).

I think we should be very carefully with adding commands that
permanently removes data. They should be really well crafted so that
there's no way to do this by mistake.

-- 
Fredrik Gustafsson

phone: +46 733-608274
e-mail: iv...@iveqy.com
website: http://www.iveqy.com
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] git submodule purge

2015-03-17 Thread Patrick Steinhardt

On Mon, Mar 16, 2015 at 08:55:03AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt  writes:
> 
> > I think there should be a command that is able to remove those
> > dangling repositories if the following conditions are met:
> >
> > - the submodule should not be initialized
> >
> > - the submodule should not have an entry in .gitmodules in the
> >   currently checked out revision
> >
> > - the submodule should not contain any commits that are not
> >   upstream
> >
> > - the submodule should not contain other submodules that do not
> >   meet those conditions
> 
> I do not have a strong opinion on whether it is a good idea to make
> it possible to remove the .git/modules/*, but should it be a
> separate subcommand, or should it be an option to the 'deinit'
> subcommand?

See my response to Jonathan.

> Also, how would you apply the safety to a repository without
> "upstream", i.e. the authoritative copy?

Is it even possible to create a new submodule without any
upstream repository? At least `git submodule init` does not work
without a corresponding entry in .gitmodules which the user would
have needed to create himself manually. In this case one _could_
assume that the user knows what he is doing and expect him not to
call `submodule purge` (or whatever the command will be called)
on the authoritative copy. Other than that I've got no idea how
to assure safety.

Regards
Patrick

signature.asc
Description: PGP signature

Re: [RFC] git submodule purge

2015-03-17 Thread Patrick Steinhardt

On Mon, Mar 16, 2015 at 01:03:53PM -0700, Jonathan Nieder wrote:
> (+cc: Jens and Heiko, submodule experts)
> Hi,
> 
> Patrick Steinhardt wrote:
> 
> > This proposal is just for discussion. If there is any interest I
> > will implement the feature and send some patches.
> >
> > Currently it is hard to properly remove submodules. That is when
> > a submodule is deinitialized and removed from a repository the
> > directory '.git/modules/' will still be present and
> > there is no way to remove it despite manually calling `rm` on it.
> > I think there should be a command that is able to remove those
> > dangling repositories if the following conditions are met:
> >
> > - the submodule should not be initialized
> >
> > - the submodule should not have an entry in .gitmodules in the
> >   currently checked out revision
> >
> > - the submodule should not contain any commits that are not
> >   upstream
> >
> > - the submodule should not contain other submodules that do not
> >   meet those conditions
> >
> > This would ensure that it is hard to loose any commits that may
> > be of interest. In the case that the user knows what he is doing
> > we may provide a '--force' switch to override those checks.
> 
> Those conditions look simultaneously too strong and too weak. ;-)
> 
> In principle, it should be safe to remove .git/modules/ as
> long as
> 
>  (1) it (and its submodules, sub-sub-modules, etc) doesn't have any
>  un-pushed local commits.
> 
>  (2) it is not being referred to by a .git file in the work tree of
>  the parent repository.
> 
> Condition (1) can be relaxed if the user knows what they are losing
> and is okay with that.  Condition (2) can be avoided by removing
> (de-initing) the copy of that submodule in the worktree at the same
> time.
> 
> The functionality sounds like a useful thing to have, whether as an
> option to 'git submodule deinit' or as a new subcommand.  In the long
> term I would like it to be possible to do everything 'git submodule'
> can do using normal git commands instead of that specialized
> interface.  What command do you think this would eventually belong in?
> (An option to "git gc", maybe?)
> 
> Thanks,
> Jonathan

Thanks for your feedback.

Considering that purging the submodule is tightly coupled with
de-initializing it, it might make sense to provide this
functionality as part of `git submodule deinit`. Maybe something
like `git submodule deinit --purge` would work for the user.
Problem is if the user first removes the submodule and does not
first deinitialize it he is not able to purge the repository
afterwards as deinit will complain about the submodule not being
matched anymore. We could just make `deinit --purge` work with
removed submodules, but that does not feel very natural to me.

`git gc` feels saner in that regard, but I don't think it would
be easy to spot for users as this command is in general not used
very frequently by them. One could argue though that it does not
need to be discoverable.

Regards
Patrick


signature.asc
Description: PGP signature

Re: [PATCH 0/5] not making corruption worse

2015-03-17 Thread Jeff King

On Tue, Mar 17, 2015 at 03:27:50AM -0400, Jeff King wrote:

> The general strategy for these is to use for_each_rawref traversals in
> these situations. That doesn't cover _every_ possible scenario. For
> example, you could do:
> 
>   git clone --no-local repo.git backup.git &&
>   rm -rf repo.git
> 
> and you might be disappointed if "backup.git" omitted some broken refs
> (upload-pack will simply skip the broken refs in its advertisement).  We
> could tighten this, but then it becomes hard to access slightly broken
> repositories (e.g., you might prefer to clone what you can, and not have
> git die() when it tries to serve the breakage). Patch 2 provides a
> tweakable safety valve for this.

One thing I thought about while working on this was whether we should
just make _all_ ref iterations for_each_rawref. The benefit to not doing
so in the hypothetical above is that you might be able to clone "foo"
even if "bar" is broken.

But it strikes me as weird that we consider the _tips_ of history to be
special for ignoring breakage. If the tip of "bar" is broken, we omit
it. But if the tip is fine, and there's breakage three commits down in
the history, then doing a clone is going to fail horribly, as
pack-objects realizes it can't generate the pack. So in practice, I'm
not sure how much you're buying with the "don't mention broken refs"
code.

OTOH, there are probably _some_ situations that can be recovered with
the current code that could not otherwise. For example, in the current
code, I can still fetch "foo" even if "bar" is broken 3 commits down.
Whereas if the tip is broken, there's a reasonable chance that
"upload-pack" would just barf and I could fetch nothing.

So I stuck to the status quo in most cases, and only turned on the more
aggressive behavior for destructive operations (and people who want to
go wild can set GIT_REF_PARANOIA=1 for their every day operations if
they want to).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] refs.c: drop curate_packed_refs

2015-03-17 Thread Jeff King

When we delete a ref, we have to rewrite the entire
packed-refs file. We take this opportunity to "curate" the
packed-refs file and drop any entries that are crufty or
broken.

Dropping broken entries (e.g., with bogus names, or ones
that point to missing objects) is actively a bad idea, as it
means that we lose any notion that the data was there in the
first place. Aside from the general hackiness that we might
lose any information about ref "foo" while deleting an
unrelated ref "bar", this may seriously hamper any attempts
by the user at recovering from the corruption in "foo".

They will lose the sha1 and name of "foo"; the exact pointer
may still be useful even if they recover missing objects
from a different copy of the repository. But worse, once the
ref is gone, there is no trace of the corruption. A
follow-up "git prune" may delete objects, even though it
would otherwise bail when seeing corruption.

We could just drop the "broken" bits from
curate_packed_refs, and continue to drop the "crufty" bits:
refs whose loose counterpart exists in the filesystem. This
is not wrong to do, and it does have the advantage that we
may write out a slightly smaller packed-refs file. But it
has two disadvantages:

  1. It is a potential source of races or mistakes with
 respect to these refs that are otherwise unrelated to
 the operation. To my knowledge, there aren't any active
 problems in this area, but it seems like an unnecessary
 risk.

  2. We have to spend time looking up the matching loose
 refs for every item in the packed-refs file. If you
 have a large number of packed refs that do not change,
 that outweights the benefit from writing out a smaller
 packed-refs file (it doesn't get smaller, and you do a
 bunch of directory traversal to find that out).

Signed-off-by: Jeff King 
---
I'll admit my argument against curate_packed_refs is a bit hand-wavy. I
won't be _too_ sad if somebody insists on cutting this back to just
keeping "broken" refs around, and still curating the "crufty" ones.

 refs.c  | 67 +
 t/t5312-prune-corruption.sh |  2 +-
 2 files changed, 2 insertions(+), 67 deletions(-)

diff --git a/refs.c b/refs.c
index 7f0e7be..47e4e53 100644
--- a/refs.c
+++ b/refs.c
@@ -2621,68 +2621,10 @@ int pack_refs(unsigned int flags)
return 0;
 }
 
-/*
- * If entry is no longer needed in packed-refs, add it to the string
- * list pointed to by cb_data.  Reasons for deleting entries:
- *
- * - Entry is broken.
- * - Entry is overridden by a loose ref.
- * - Entry does not point at a valid object.
- *
- * In the first and third cases, also emit an error message because these
- * are indications of repository corruption.
- */
-static int curate_packed_ref_fn(struct ref_entry *entry, void *cb_data)
-{
-   struct string_list *refs_to_delete = cb_data;
-
-   if (entry->flag & REF_ISBROKEN) {
-   /* This shouldn't happen to packed refs. */
-   error("%s is broken!", entry->name);
-   string_list_append(refs_to_delete, entry->name);
-   return 0;
-   }
-   if (!has_sha1_file(entry->u.value.sha1)) {
-   unsigned char sha1[20];
-   int flags;
-
-   if (read_ref_full(entry->name, 0, sha1, &flags))
-   /* We should at least have found the packed ref. */
-   die("Internal error");
-   if ((flags & REF_ISSYMREF) || !(flags & REF_ISPACKED)) {
-   /*
-* This packed reference is overridden by a
-* loose reference, so it is OK that its value
-* is no longer valid; for example, it might
-* refer to an object that has been garbage
-* collected.  For this purpose we don't even
-* care whether the loose reference itself is
-* invalid, broken, symbolic, etc.  Silently
-* remove the packed reference.
-*/
-   string_list_append(refs_to_delete, entry->name);
-   return 0;
-   }
-   /*
-* There is no overriding loose reference, so the fact
-* that this reference doesn't refer to a valid object
-* indicates some kind of repository corruption.
-* Report the problem, then omit the reference from
-* the output.
-*/
-   error("%s does not point to a valid object!", entry->name);
-   string_list_append(refs_to_delete, entry->name);
-   return 0;
-   }
-
-   return 0;
-}
-
 int repack_without_refs(struct string_list *refnames, struct strbuf *err)
 {
struct ref_dir *packed;
-   struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
-

[PATCH 4/5] repack: turn on "ref paranoia" when doing a destructive repack

2015-03-17 Thread Jeff King

If we are repacking with "-ad", we will drop any unreachable
objects. Likewise, using "-Ad --unpack-unreachable="
will drop any old, unreachable objects. In these cases, we
want to make sure the reachability we compute with "--all"
is complete. We can do this by passing GIT_REF_PARANOIA=1 in
the environment to pack-objects.

Note that "-Ad" is safe already, because it only loosens
unreachable objects. It is up to "git prune" to avoid
deleting them.

Signed-off-by: Jeff King 
---
 builtin/repack.c| 8 ++--
 t/t5312-prune-corruption.sh | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 28fbc70..f2edeb0 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -228,13 +228,17 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
get_non_kept_pack_filenames(&existing_packs);
 
if (existing_packs.nr && delete_redundant) {
-   if (unpack_unreachable)
+   if (unpack_unreachable) {
argv_array_pushf(&cmd.args,
"--unpack-unreachable=%s",
unpack_unreachable);
-   else if (pack_everything & LOOSEN_UNREACHABLE)
+   argv_array_push(&cmd.env_array, 
"GIT_REF_PARANOIA=1");
+   } else if (pack_everything & LOOSEN_UNREACHABLE) {
argv_array_push(&cmd.args,
"--unpack-unreachable");
+   } else {
+   argv_array_push(&cmd.env_array, 
"GIT_REF_PARANOIA=1");
+   }
}
} else {
argv_array_push(&cmd.args, "--unpacked");
diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index cccab58..e3e9994 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -38,7 +38,7 @@ test_expect_success 'put bogus object into pack' '
verbose git cat-file -e $bogus
 '
 
-test_expect_failure 'destructive repack keeps packed object' '
+test_expect_success 'destructive repack keeps packed object' '
test_might_fail git repack -Ad --unpack-unreachable=now &&
verbose git cat-file -e $bogus &&
test_might_fail git repack -ad &&
-- 
2.3.3.520.g3cfbb5d

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] prune: turn on ref_paranoia flag

2015-03-17 Thread Jeff King

Prune should know about broken objects at the tips of refs,
so that we can feed them to our traversal rather than
ignoring them. It's better for us to abort the operation on
the broken object than it is to start deleting objects with
an incomplete view of the reachability namespace.

Note that for missing objects, aborting is the best we can
do. For a badly-named ref, we technically could use its sha1
as a reachability tip. However, the iteration code just
feeds us a null sha1, so there would be a reasonable amount
of code involved to pass down our wishes. It's not really
worth trying to do better, because this is a case that
should happen extremely rarely, and the message we provide:

  fatal: unable to parse object: refs/heads/bogus:name

is probably enough to point the user in the right direction.

Signed-off-by: Jeff King 
---
Note that we should already be aborting for non-tip objects. I guess we
could test that explicitly, too, but I didn't here.

 builtin/prune.c | 1 +
 t/t5312-prune-corruption.sh | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 04d3b12..17094ad 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -115,6 +115,7 @@ int cmd_prune(int argc, const char **argv, const char 
*prefix)
expire = ULONG_MAX;
save_commit_buffer = 0;
check_replace_refs = 0;
+   ref_paranoia = 1;
init_revisions(&revs, prefix);
 
argc = parse_options(argc, argv, prefix, options, prune_usage, 0);
diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index 167031e..cccab58 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -25,7 +25,7 @@ test_expect_success 'create history reachable only from a 
bogus-named ref' '
git reset --hard HEAD^
 '
 
-test_expect_failure 'pruning does not drop bogus object' '
+test_expect_success 'pruning does not drop bogus object' '
test_when_finished "git hash-object -w -t commit saved" &&
test_might_fail git prune --expire=now &&
verbose git cat-file -e $bogus
@@ -62,7 +62,7 @@ test_expect_success 'create history with missing tip commit' '
test_must_fail git cat-file -e $missing
 '
 
-test_expect_failure 'pruning with a corrupted tip does not drop history' '
+test_expect_success 'pruning with a corrupted tip does not drop history' '
test_when_finished "git hash-object -w -t commit saved" &&
test_might_fail git prune --expire=now &&
verbose git cat-file -e $recoverable
-- 
2.3.3.520.g3cfbb5d

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] refs: introduce a "ref paranoia" flag

2015-03-17 Thread Jeff King

Most operations that iterate over refs are happy to ignore
broken cruft. However, some operations should be performed
with knowledge of these broken refs, because it is better
for the operation to choke on a missing object than it is to
silently pretend that the ref did not exist (e.g., if we are
computing the set of reachable tips in order to prune
objects).

These processes could just call for_each_rawref, except that
ref iteration is often hidden behind other interfaces. For
instance, for a destructive "repack -ad", we would have to
inform "pack-objects" that we are destructive, and then it
would in turn have to tell the revision code that our
"--all" should include broken refs.

It's much simpler to just set a global for "dangerous"
operations that includes broken refs in all iterations.

Signed-off-by: Jeff King 
---
I waffled on documenting this for end users. I'm not sure if people
would ever want to actually use the environment variable themselves. The
best hypothetical I could come up with was the:

  git clone --no-local repo.git backup.git &&
  rm -rf repo.git

scenario.

 Documentation/git.txt | 11 +++
 cache.h   |  8 
 environment.c |  1 +
 refs.c|  5 +
 4 files changed, 25 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index af30620..8da85a6 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -1026,6 +1026,17 @@ GIT_ICASE_PATHSPECS::
variable when it is invoked as the top level command by the
end user, to be recorded in the body of the reflog.
 
+`GIT_REF_PARANOIA`::
+   If set to `1`, include broken or badly named refs when iterating
+   over lists of refs. In a normal, non-corrupted repository, this
+   does nothing. However, enabling it may help git to detect and
+   abort some operations in the presence of broken refs. Git sets
+   this variable automatically when performing destructive
+   operations like linkgit:git-prune[1]. You should not need to set
+   it yourself unless you want to be paranoid about making sure
+   an operation has touched every ref (e.g., because you are
+   cloning a repository to make a backup).
+
 
 Discussion[[Discussion]]
 
diff --git a/cache.h b/cache.h
index 761c570..162ea6f 100644
--- a/cache.h
+++ b/cache.h
@@ -614,6 +614,14 @@ extern int protect_hfs;
 extern int protect_ntfs;
 
 /*
+ * Include broken refs in all ref iterations, which will
+ * generally choke dangerous operations rather than letting
+ * them silently proceed without taking the broken ref into
+ * account.
+ */
+extern int ref_paranoia;
+
+/*
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
diff --git a/environment.c b/environment.c
index 1ade5c9..a40044c 100644
--- a/environment.c
+++ b/environment.c
@@ -24,6 +24,7 @@ int is_bare_repository_cfg = -1; /* unspecified */
 int log_all_ref_updates = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
+int ref_paranoia = -1;
 int repository_format_version;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
diff --git a/refs.c b/refs.c
index e23542b..7f0e7be 100644
--- a/refs.c
+++ b/refs.c
@@ -1934,6 +1934,11 @@ static int do_for_each_ref(struct ref_cache *refs, const 
char *base,
data.fn = fn;
data.cb_data = cb_data;
 
+   if (ref_paranoia < 0)
+   ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0);
+   if (ref_paranoia)
+   data.flags |= DO_FOR_EACH_INCLUDE_BROKEN;
+
return do_for_each_entry(refs, base, do_one_ref, &data);
 }
 
-- 
2.3.3.520.g3cfbb5d

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] t5312: test object deletion code paths in a corrupted repository

2015-03-17 Thread Jeff King

When we are doing a destructive operation like "git prune",
we want to be extra careful that the set of reachable tips
we compute is valid. If there is any corruption or oddity,
we are better off aborting the operation and letting the
user figure things out rather than plowing ahead and
possibly deleting some data that cannot be recovered.

The tests here include:

  1. Pruning objects mentioned only be refs with invalid
 names. This used to abort prior to d0f810f (refs.c:
 allow listing and deleting badly named refs,
 2014-09-03), but since then we silently ignore the tip.

 Likewise, we test repacking that can drop objects
 (either "-ad", which drops anything unreachable,
 or "-Ad --unpack-unreachable=", which tries to
 optimize out a loose object write that would be
 directly pruned).

  2. Pruning objects when some refs point to missing
 objects. We don't know whether any dangling objects
 would have been reachable from the missing objects. We
 are better to keep them around, as they are better than
 nothing for helping the user recover history.

  3. Packed refs that point to missing objects can sometimes
 be dropped. By itself, this is more of an annoyance
 (you do not have the object anyway; even if you can
 recover it from elsewhere, all you are losing is a
 placeholder for your state at the time of corruption).
 But coupled with (2), if we drop the ref and then go
 on to prune, we may lose unrecoverable objects.

Note that we use test_might_fail for some of the operations.
In some cases, it would be appropriate to abort the
operation, and in others, it might be acceptable to continue
but taking the information into account. The tests don't
care either way, and check only for data loss.

Signed-off-by: Jeff King 
---
 t/t5312-prune-corruption.sh | 104 
 1 file changed, 104 insertions(+)
 create mode 100755 t/t5312-prune-corruption.sh

diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
new file mode 100755
index 000..167031e
--- /dev/null
+++ b/t/t5312-prune-corruption.sh
@@ -0,0 +1,104 @@
+#!/bin/sh
+
+test_description='
+Test pruning of repositories with minor corruptions. The goal
+here is that we should always be erring on the side of safety. So
+if we see, for example, a ref with a bogus name, it is OK either to
+bail out or to proceed using it as a reachable tip, but it is _not_
+OK to proceed as if it did not exist. Otherwise we might silently
+delete objects that cannot be recovered.
+'
+. ./test-lib.sh
+
+test_expect_success 'disable reflogs' '
+   git config core.logallrefupdates false &&
+   rm -rf .git/logs
+'
+
+test_expect_success 'create history reachable only from a bogus-named ref' '
+   test_tick && git commit --allow-empty -m master &&
+   base=$(git rev-parse HEAD) &&
+   test_tick && git commit --allow-empty -m bogus &&
+   bogus=$(git rev-parse HEAD) &&
+   git cat-file commit $bogus >saved &&
+   echo $bogus >.git/refs/heads/bogus:name &&
+   git reset --hard HEAD^
+'
+
+test_expect_failure 'pruning does not drop bogus object' '
+   test_when_finished "git hash-object -w -t commit saved" &&
+   test_might_fail git prune --expire=now &&
+   verbose git cat-file -e $bogus
+'
+
+test_expect_success 'put bogus object into pack' '
+   git tag reachable $bogus &&
+   git repack -ad &&
+   git tag -d reachable &&
+   verbose git cat-file -e $bogus
+'
+
+test_expect_failure 'destructive repack keeps packed object' '
+   test_might_fail git repack -Ad --unpack-unreachable=now &&
+   verbose git cat-file -e $bogus &&
+   test_might_fail git repack -ad &&
+   verbose git cat-file -e $bogus
+'
+
+# subsequent tests will have different corruptions
+test_expect_success 'clean up bogus ref' '
+   rm .git/refs/heads/bogus:name
+'
+
+test_expect_success 'create history with missing tip commit' '
+   test_tick && git commit --allow-empty -m one &&
+   recoverable=$(git rev-parse HEAD) &&
+   git cat-file commit $recoverable >saved &&
+   test_tick && git commit --allow-empty -m two &&
+   missing=$(git rev-parse HEAD) &&
+   # point HEAD elsewhere
+   git checkout $base &&
+   rm .git/objects/$(echo $missing | sed "s,..,&/,") &&
+   test_must_fail git cat-file -e $missing
+'
+
+test_expect_failure 'pruning with a corrupted tip does not drop history' '
+   test_when_finished "git hash-object -w -t commit saved" &&
+   test_might_fail git prune --expire=now &&
+   verbose git cat-file -e $recoverable
+'
+
+test_expect_success 'pack-refs does not silently delete broken loose ref' '
+   git pack-refs --all --prune &&
+   echo $missing >expect &&
+   git rev-parse refs/heads/master >actual &&
+   test_cmp expect actual
+'
+
+# we do not want to count on running pack-refs to
+# actually pack it, as it is perfectly rea

[PATCH 0/5] not making corruption worse

2015-03-17 Thread Jeff King

This is a grab bag of fixes related to performing destructive operations
in a repository with minor corruption. Of course we hope never to see
corruption in the first place, but I think if we do see it, we should
err on the side of not making things worse. IOW, it is better to abort
and say "fix this before pruning" than it is to just start deleting
objects.

The issue that spurred this is that I noticed recent versions of git
will omit funny-named refs from iteration. This comes from d0f810f
(refs.c: allow listing and deleting badly named refs, 2014-09-03), in
v2.2.0.  That's probably a good idea in general, but for things like
"prune" we need to be more careful (and we were, prior to that commit).

Similarly, if you have a ref whose tip object is missing, we tend to
just ignore it in our traversals, and it presents a similar problem for
pruning. I didn't trace it back, but I think this problem is much older.

The general strategy for these is to use for_each_rawref traversals in
these situations. That doesn't cover _every_ possible scenario. For
example, you could do:

  git clone --no-local repo.git backup.git &&
  rm -rf repo.git

and you might be disappointed if "backup.git" omitted some broken refs
(upload-pack will simply skip the broken refs in its advertisement).  We
could tighten this, but then it becomes hard to access slightly broken
repositories (e.g., you might prefer to clone what you can, and not have
git die() when it tries to serve the breakage). Patch 2 provides a
tweakable safety valve for this.

And not strictly related to the above, but in the same ballpark, is the
issue with packed-refs that I noted here:

  http://thread.gmane.org/gmane.comp.version-control.git/256051/focus=256896

where we can drop broken refs from the packed-refs file during the
deletion of an unrelated ref.

The patches are:

  [1/5]: t5312: test object deletion code paths in a corrupted repository
  [2/5]: refs: introduce a "ref paranoia" flag
  [3/5]: prune: turn on ref_paranoia flag
  [4/5]: repack: turn on "ref paranoia" when doing a destructive repack
  [5/5]: refs.c: drop curate_packed_refs

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

69 matches

Mail list logo