Re: [PATCH] README: create HTTP/HTTPS links from URLs in Markdown

2017-03-01 Thread Jeff King
On Thu, Mar 02, 2017 at 07:34:21AM +, Eric Wong wrote:

> Jeff King  wrote:
> > On Wed, Mar 01, 2017 at 10:22:04PM +, Eric Wong wrote:
> > 
> > > Markdown supports automatic links by surrounding URLs with
> > > angle brackets, as documented in
> > > 
> > 
> > One of the joys of markdown is that there are so many variants. A lot of
> > them (including GitHub-flavored markdown) will linkify URLs even when
> > they're not inside angle brackets.
> > 
> > So I don't mind this patch, but I'm curious what's rendering the
> > markdown you're seeing. I'd think online that one would either come
> > across the raw text, or the GFM from https://github.com/git/git.
> 
> I was using Gruber's reference implementation from Debian stable
> (1.0.1-7).

OK. I guess my question more was "why are you doing that?". I'd expect
people to find the GFM rendering on GitHub, or just look at the text via
"less".

But it's not really my business why you would want to do it. :) It's
reasonable for us to cater to the common subset of renderers.

-Peff


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Linus Torvalds
On Wed, Mar 1, 2017 at 11:07 AM, Jeff King  wrote:
>
> So obviously the smaller object size is nice, and the diffstat is
> certainly satisfying. My only qualm would be whether this conflicts with
> the optimizations that Dan is working on (probably not conceptually, but
> textually).

Yeah. But I'll happily just re-apply the patch on any new version that
Dan posts.  The patch is obviously trivial, even if size-wise it's a
fair number of lines.

So I wouldn't suggest using the patched version as some kind of
starting point. It's much easier to just take a new version of
upstream and repeat the diet patch on it.

.. and obviously later versions of the upstream sha1dc code may not
even need this at all since Dan and Marc are now aware of the issue.

  Linus


Re: [PATCH v5 24/24] t1406: new tests for submodule ref store

2017-03-01 Thread Duy Nguyen
On Wed, Mar 1, 2017 at 10:11 PM, Michael Haggerty  wrote:
> By trial and error, I found that the test succeeds if I comment out the
> "for_each_reflog()" test. By having that test write its results to
> `/tmp` where they won't be deleted, I found that the problem is that the
> `actual` results are not sorted correctly:
>
> refs/heads/new-master 0x0
> refs/heads/master 0x0
> HEAD 0x1
>
> I don't know why it's so Heisenbergish.

It happens consistently on my other laptop. And yes it looks like
sorting order problem, probably because of the underlying file system.
I did wonder about that at some point but never asked. We do not
guarantee any sorting order in the for-each api, do we?
-- 
Duy


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Duy Nguyen
On Thu, Mar 2, 2017 at 6:19 AM, Jeff King  wrote:
> You have to remember that some of the Git for Windows users are doing
> horrific things like using repositories with 450MB .git/index files, and
> the speed to compute the sha1 during an update is noticeable there.

We probably should separate this use case from the object hashing
anyway. Here we need a better, more reliable crc32 basically, to
detect bit flips. Even if we move to SHA-something, we can keep
staying with SHA-1 here (and with the fastest implementation)
-- 
Duy


Re: [PATCH] mingw: use OpenSSL's SHA-1 routines

2017-03-01 Thread Johannes Sixt

Am 24.02.2017 um 22:54 schrieb Junio C Hamano:

Johannes Sixt  writes:

I'll use the patch for daily work for a while to see whether it hurts.


Please ping this thread again when you have something to add.  For
now, I'll demote this patch from 'next' to 'pu' when we rewind and
rebuild 'next' post 2.12 release.


As I already said in [1], I have no objection. I've used the patch in 
production for some time now and did not notice any slowdowns.


[1] 
https://public-inbox.org/git/0080edd6-a515-2fe9-6266-b6f6bbedf...@kdbg.org/


-- Hannes



Re: [PATCH 01/10] submodule: decouple url and submodule existence

2017-03-01 Thread Junio C Hamano
Brandon Williams  writes:

> On 02/24, Junio C Hamano wrote:
>
>> Also as a grouping measure, submodule.active that lists submodule
>> paths feels hard to use.  When switching between two branches in the
>> superproject that have the same submodule bound at two different
>> paths, who is responsible for updating submodule.active in
>> superproject's config?  If it were a list of submodule names, this
>> objection does not apply, though.
>
> I agree that if you are listing every submodule path by hand then this
> may not be the best approach and would be difficult to use.  The idea is
> that this would allow a user to set a general pathspec to identify a
> group of modules they are interested in.  Perhaps once attributes can be
> used in pathspecs a user could group submodules by setting a particular
> attribute and then submodule.active would have a value like
> ":(attr:foo)" to indicate I'm interested in all submodules with the
> "foo" attribute.

OK.  As .gitattributes is tracked just like .gitmodules in tree, the
project can adjust the path pattern that matches a renamed submodule
and gives it an attribute in .gitattributes in the same commit in
the superproject that moves the directory to which the submodule is
bound, just like it updates .gitmodules to adjust the name<->path
mapping.  So that plan solves my specific worry about using "path"
for grouping and not "name".

Thanks.


Plastic ATM

2017-03-01 Thread Hacked Atm


Contact us on Our Email: hacked...@gmail.com We sell plastic ATM cards with pin 
and cash out from ATM. They works all over the world. Our You-tube Page is : 
https://youtu.be/qYiCP4kIO-U


Re: [PATCH] Travis: also test on 32-bit Linux

2017-03-01 Thread Junio C Hamano
On Tue, Feb 28, 2017 at 11:17 AM, Johannes Schindelin
 wrote:
>  This patch
> asks Travis CI to install a Docker image with 32-bit libraries and then
> goes on to build and test Git using this 32-bit setup.
>
> A big thank you to Lars Schneider without whose help this patch would
> not have happened.

This has been in 'pu' for a few days, and
https://travis-ci.org/git/git/builds shows that we have
a new build job running successfully.

Good job ;-)

Thanks.


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Junio C Hamano
Dan Shumow  writes:

> At this point, I would suggest that I take the C optimizations,
> clean them up and fold them in with the diet changes Linus has
> suggested.  The slowdown is still 2x over block-sha1 and more over
> OpenSSL.  But it is better than nothing.  And then if there is
> interest Marc and I can investigate other processor specific
> optimizations like ASM or SIMD and circle back with those
> performance optimizations at a later date.
>
> Also, to Johannes Schindelin's point:
>> My concern is about that unexpected turn "oh, let's just switch
>> to C99 because, well, because my compiler canehandle it, and
>> everybody else should just switch tn a modern compiler". That
>> really sounded careless.
>
> While it will probably be a pain, if it is a requirement, we can
> modify the code to move away from any c99 specific stuff we have
> in here, if it makes adopting the code more palatable for Git.

I was assuming that we would treat your code just like how we treat
any other "borrowed code from elsewhere".  The usual way for us to
do so is to take code that was released by the "upstream" (under a
license that allows us to use it---yours is MIT, which does) in the
style and language of upstream's choice, and then we in the Git
development community takes responsiblity for massaging the code to
match our style, for trimming what we won't use and for doing any
other customization to fit our needs.

As you and Marc seemed to be still working on speeding up, such a
customization work to fully adjust your code to our codebase was
premature, so I tentatively queued what we saw on the list as-is on
our 'pu' branch so that people can have a reference point.  Which
unfortunately solicited a premature reaction by Johannes.  Please do
not worry too much about the comment.

But if you are willing to help us by getting involved in the
"customization" part, too, that would be a very welcome news to us.
In that case, "welcome to the Git development community" ;-)

So,... from my point of view, we are OK either way.  It is OK if you
are a third-party upstream that is not particularly interested in
Git project's specific requirement.  We surely would be happier if
you and Marc, the upstream authors of the code in question, also act
as participants in the Git development community.

Either way, thanks for your great help.


[PATCH 01/18] lib-submodule-update.sh: reorder create_lib_submodule_repo

2017-03-01 Thread Stefan Beller
Redraw the ASCII art describing the setup using more space, such that
it is easier to understand.  The leaf commits are now ordered the same
way the actual code is ordered.

Add empty lines to the setup code separating each of the leaf commits,
each starting with a "checkout -b".

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 49 ---
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 915eb4a7c6..5df528ea81 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -15,22 +15,27 @@
 # - Tracked file replaced by submodule (replace_sub1_with_file =>
 #   replace_file_with_sub1)
 #
-#   --O-O
-#  /  ^ replace_directory_with_sub1
-# /   replace_sub1_with_directory
-#/O
-#   / ^
-#  /  modify_sub1
-#  O--O---O
-#  ^  ^\  ^
-#  |  | \ remove_sub1
-#  |  |  -O-O
-#  |  |   \   ^ replace_file_with_sub1
-#  |  |\  replace_sub1_with_file
-#  |   add_sub1 --O-O
-# no_submodule^ valid_sub1
-# invalid_sub1
+# O
+#/^
+#   / remove_sub1
+#  /
+#   add_sub1  /---O
+# |  /^
+# | / modify_sub1
+# v/
+#  O--O---O-O
+#  ^   \  ^ replace_directory_with_sub1
+#  |\ replace_sub1_with_directory
+# no_submodule   \
+# O-O
+#  \  ^ replace_file_with_sub1
+#   \ replace_sub1_with_file
+#\
+# O-O
+# ^ valid_sub1
+# invalid_sub1
 #
+
 create_lib_submodule_repo () {
git init submodule_update_repo &&
(
@@ -49,10 +54,11 @@ create_lib_submodule_repo () {
git config submodule.sub1.ignore all &&
git add .gitmodules &&
git commit -m "Add sub1" &&
-   git checkout -b remove_sub1 &&
+
+   git checkout -b remove_sub1 add_sub1 &&
git revert HEAD &&
 
-   git checkout -b "modify_sub1" "add_sub1" &&
+   git checkout -b modify_sub1 add_sub1 &&
git submodule update &&
(
cd sub1 &&
@@ -67,7 +73,7 @@ create_lib_submodule_repo () {
git add sub1 &&
git commit -m "Modify sub1" &&
 
-   git checkout -b "replace_sub1_with_directory" "add_sub1" &&
+   git checkout -b replace_sub1_with_directory add_sub1 &&
git submodule update &&
git -C sub1 checkout modifications &&
git rm --cached sub1 &&
@@ -75,22 +81,25 @@ create_lib_submodule_repo () {
git config -f .gitmodules --remove-section "submodule.sub1" &&
git add .gitmodules sub1/* &&
git commit -m "Replace sub1 with directory" &&
+
git checkout -b replace_directory_with_sub1 &&
git revert HEAD &&
 
-   git checkout -b "replace_sub1_with_file" "add_sub1" &&
+   git checkout -b replace_sub1_with_file add_sub1 &&
git rm sub1 &&
echo "content" >sub1 &&
git add sub1 &&
git commit -m "Replace sub1 with file" &&
+
git checkout -b replace_file_with_sub1 &&
git revert HEAD &&
 
-   git checkout -b "invalid_sub1" "add_sub1" &&
+   git checkout -b invalid_sub1 add_sub1 &&
git update-index --cacheinfo 16 
0123456789012345678901234567890123456789 sub1 &&
git commit -m "Invalid sub1 commit" &&
git checkout -b valid_sub1 &&
git revert HEAD &&
+
git checkout master
)
 }
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 14/18] unpack-trees: check if we can perform the operation for submodules

2017-03-01 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 unpack-trees.c | 131 +
 unpack-trees.h |   1 +
 2 files changed, 123 insertions(+), 9 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 616a0ae4b2..8333da2cc9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -10,6 +10,8 @@
 #include "attr.h"
 #include "split-index.h"
 #include "dir.h"
+#include "submodule.h"
+#include "submodule-config.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -45,6 +47,9 @@ static const char 
*unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
 
/* ERROR_WOULD_LOSE_ORPHANED_REMOVED */
"Working tree file '%s' would be removed by sparse checkout update.",
+
+   /* ERROR_WOULD_LOSE_SUBMODULE */
+   "Submodule '%s' cannot checkout new HEAD.",
 };
 
 #define ERRORMSG(o,type) \
@@ -161,6 +166,8 @@ void setup_unpack_trees_porcelain(struct 
unpack_trees_options *opts,
_("The following working tree files would be overwritten by 
sparse checkout update:\n%s");
msgs[ERROR_WOULD_LOSE_ORPHANED_REMOVED] =
_("The following working tree files would be removed by sparse 
checkout update:\n%s");
+   msgs[ERROR_WOULD_LOSE_SUBMODULE] =
+   _("Submodule '%s' cannot checkout new HEAD");
 
opts->show_all_errors = 1;
/* rejected paths may not have a static buffer */
@@ -240,12 +247,75 @@ static void display_error_msgs(struct 
unpack_trees_options *o)
fprintf(stderr, _("Aborting\n"));
 }
 
+static int check_submodule_move_head(const struct cache_entry *ce,
+const char *old_id,
+const char *new_id,
+struct unpack_trees_options *o)
+{
+   const struct submodule *sub = submodule_from_ce(ce);
+   if (!sub)
+   return 0;
+
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   if (submodule_move_head(ce->name, old_id, new_id, 
SUBMODULE_MOVE_HEAD_DRY_RUN))
+   return o->gently ? -1 :
+   add_rejected_path(o, 
ERROR_WOULD_LOSE_SUBMODULE, ce->name);
+   return 0;
+   case SM_UPDATE_NONE:
+   return 0;
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   case SM_UPDATE_COMMAND:
+   default:
+   warning(_("submodule update strategy not supported for 
submodule '%s'"), ce->name);
+   return -1;
+   }
+}
+
+static void reload_gitmodules_file(struct index_state *index,
+  struct checkout *state)
+{
+   int i;
+   for (i = 0; i < index->cache_nr; i++) {
+   struct cache_entry *ce = index->cache[i];
+   if (ce->ce_flags & CE_UPDATE) {
+   int r = strcmp(ce->name, ".gitmodules");
+   if (r < 0)
+   continue;
+   else if (r == 0) {
+   submodule_free();
+   checkout_entry(ce, state, NULL);
+   gitmodules_config();
+   git_config(submodule_config, NULL);
+   } else
+   break;
+   }
+   }
+}
+
 /*
  * Unlink the last component and schedule the leading directories for
  * removal, such that empty directories get removed.
  */
 static void unlink_entry(const struct cache_entry *ce)
 {
+   const struct submodule *sub = submodule_from_ce(ce);
+   if (sub) {
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   submodule_move_head(ce->name, "HEAD", NULL,
+   SUBMODULE_MOVE_HEAD_FORCE);
+   break;
+   case SM_UPDATE_NONE:
+   case SM_UPDATE_COMMAND:
+   return; /* Do not touch the submodule. */
+   }
+   }
if (!check_leading_path(ce->name, ce_namelen(ce)))
return;
if (remove_or_warn(ce->ce_mode, ce->name))
@@ -301,6 +371,9 @@ static int check_updates(struct unpack_trees_options *o)
remove_marked_cache_entries(index);
remove_scheduled_dirs();
 
+   if (should_update_submodules() && o->update && !o->dry_run)
+   reload_gitmodules_file(index, );
+
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
 
@@ -1358,17 +1431,26 @@ static int verify_uptodate_1(const struct cache_entry 
*ce,
if (!lstat(ce->name, )) {
int flags = 

[PATCH 05/18] lib-submodule-update.sh: define tests for recursing into submodules

2017-03-01 Thread Stefan Beller
Currently lib-submodule-update.sh provides 2 functions
test_submodule_switch and test_submodule_forced_switch that are used by a
variety of tests to ensure that submodules behave as expected. The current
expected behavior is that submodules are not touched at all (see
42639d2317a for the exact setup).

In the future we want to teach all these commands to recurse
into submodules. To do that, we'll add two testing functions to
submodule-update-lib.sh: test_submodule_switch_recursing and
test_submodule_forced_switch_recursing.

These two functions behave in analogy to the already existing functions
just with a different expectation on submodule behavior. The submodule
in the working tree is expected to be updated to the recorded submodule
version. The behavior is analogous to e.g. the behavior of files in a
nested directory in the working tree, where a change to the working tree
handles any arising directory/file conflicts just fine.

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 506 +-
 1 file changed, 504 insertions(+), 2 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 83202c54fc..aa196d4d42 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -4,6 +4,7 @@
 # - New submodule (no_submodule => add_sub1)
 # - Removed submodule (add_sub1 => remove_sub1)
 # - Updated submodule (add_sub1 => modify_sub1)
+# - Updated submodule recursively (add_nested_sub => modify_sub1_recursively)
 # - Submodule updated to invalid commit (add_sub1 => invalid_sub1)
 # - Submodule updated from invalid commit (invalid_sub1 => valid_sub1)
 # - Submodule replaced by tracked files in directory (add_sub1 =>
@@ -19,8 +20,8 @@
 #/^
 #   / remove_sub1
 #  /
-#   add_sub1  /---O
-# |  /^
+#   add_sub1  /---O-OO  modify_sub1_recursively
+# |  /^ add_nested_sub
 # | / modify_sub1
 # v/
 #  O--O---O-O
@@ -48,6 +49,17 @@ create_lib_submodule_repo () {
git commit -m "Base inside first submodule" &&
git branch "no_submodule"
) &&
+   git init submodule_update_sub2 &&
+   (
+   cd submodule_update_sub2
+   echo "expect" >>.gitignore &&
+   echo "actual" >>.gitignore &&
+   echo "x" >file1 &&
+   echo "y" >file2 &&
+   git add .gitignore file1 file2 &&
+   git commit -m "nested submodule base" &&
+   git branch "no_submodule"
+   ) &&
git init submodule_update_repo &&
(
cd submodule_update_repo &&
@@ -84,6 +96,26 @@ create_lib_submodule_repo () {
git add sub1 &&
git commit -m "Modify sub1" &&
 
+   git checkout -b add_nested_sub modify_sub1 &&
+   git -C sub1 checkout -b "add_nested_sub" &&
+   git -C sub1 submodule add --branch no_submodule 
../submodule_update_sub2 sub2 &&
+   git -C sub1 commit -a -m "add a nested submodule" &&
+   git add sub1 &&
+   git commit -a -m "update submodule, that updates a nested 
submodule" &&
+   git checkout -b modify_sub1_recursively &&
+   git -C sub1 checkout -b modify_sub1_recursively &&
+   git -C sub1/sub2 checkout -b modify_sub1_recursively &&
+   echo change >sub1/sub2/file3 &&
+   git -C sub1/sub2 add file3 &&
+   git -C sub1/sub2 commit -m "make a change in nested sub" &&
+   git -C sub1 add sub2 &&
+   git -C sub1 commit -m "update nested sub" &&
+   git add sub1 &&
+   git commit -m "update sub1, that updates nested sub" &&
+   git -C sub1 push origin modify_sub1_recursively &&
+   git -C sub1/sub2 push origin modify_sub1_recursively &&
+   git -C sub1 submodule deinit -f --all &&
+
git checkout -b replace_sub1_with_directory add_sub1 &&
git submodule update &&
git -C sub1 checkout modifications &&
@@ -150,6 +182,15 @@ test_git_directory_is_unchanged () {
)
 }
 
+test_git_directory_exists() {
+   test -e ".git/modules/$1" &&
+   if test -f sub1/.git
+   then
+   # does core.worktree point at the right place?
+   test "$(git -C .git/modules/$1 config core.worktree)" = 
"../../../$1"
+   fi
+}
+
 # Helper function to be executed at the start of every test below, it sets up
 # the submodule repo if it doesn't exist and configures the most problematic
 # settings for diff.ignoreSubmodules.
@@ -180,6 +221,27 @@ reset_work_tree_to () {
)
 }
 
+reset_work_tree_to_interested () {
+   reset_work_tree_to $1 &&
+   # make the submodule git 

[RFCv6 PATCH 00/18] Checkout aware of Submodules!

2017-03-01 Thread Stefan Beller
previous work:
https://public-inbox.org/git/20170223225735.10994-1-sbel...@google.com/
https://public-inbox.org/git/20161203003022.29797-1-sbel...@google.com/

v6:
 * added support for read-tree (see last patch) to see how generic the
   code of the previous patches is. I am pretty pleased with that patch.
 * marked two functions static. Thanks Ramsay!
 * fixed the recursive test; it still fails but it is the code that fails,
   not the test. For this I had to change the setup code slightly.
 * 2 new patches adding tiny refactoring to the submodule test lib.  
 * interdiff (to origin/sb/checkout-recurse-submodules, which is v5) below.

v5:
 * as v4 was the first version queued by Junio, we do have an interdiff below!
 * renamed functions
 * changed the API, now the caller has to take care of the submodule strategy
   themselves. (Note this can be different for different situations, e.g.
   when a submodule is deleted, we can do that for any strategy except none and
   !command. But for moving to a new state of the submodule we currently
   only implement "checkout" [unspecified defaults to checkout]. warning about
   the others, doing nothing there.)

v4:
 * addressed all comments of Brian, Junio and Brandon.
 Thanks!
 * one major point of change is the introduction of another patch
   "lib-submodule-update.sh: do not use ./. as submodule remote",
   as that took some time to track down the existing bug.
 
v3:
 * moved tests from t2013 to the generic submodule library.
 * factored out the refactoring patches to be up front
 * As I redid the complete implementation, I have the impression this time
   it is cleaner than previous versions.
 
 I think we still have to fix the corner cases of directory/file/submodule 
 conflicts before merging, but this serves as a status update on my current
 way of thinking how to implement the worktree commands being aware of
 submodules.
 
Thanks,
Stefan

v2:
* based on top of the series sent out an hour ago
  "[PATCHv4 0/5] submodule embedgitdirs"
* Try to embed a submodule if we need to remove it.
* Strictly do not change behavior if not giving the new flag.
* I think I missed some review comments from v1, but I'd like to get
  the current state out over the weekend, as a lot has changed so far.
  On Monday I'll go through the previous discussion with a comb to see
  if I missed something.
  
v1:
When working with submodules, nearly anytime after checking out
a different state of the projects, that has submodules changed
you'd run "git submodule update" with a current version of Git.

There are two problems with this approach:

* The "submodule update" command is dangerous as it
  doesn't check for work that may be lost in the submodule
  (e.g. a dangling commit).
* you may forget to run the command as checkout is supposed
  to do all the work for you.

Integrate updating the submodules into git checkout, with the same
safety promises that git-checkout has, i.e. not throw away data unless
asked to. This is done by first checking if the submodule is at the same
sha1 as it is recorded in the superproject. If there are changes we stop
proceeding the checkout just like it is when checking out a file that
has local changes.

The integration happens in the code that is also used in other commands
such that it will be easier in the future to make other commands aware
of submodule.

This also solves d/f conflicts in case you replace a file/directory
with a submodule or vice versa.

The patches are still a bit rough, but the overall series seems
promising enough to me that I want to put it out here.

Any review, specifically on the design level welcome!

Thanks,
Stefan


Stefan Beller (14):
  lib-submodule-update.sh: reorder create_lib_submodule_repo
  lib-submodule-update.sh: define tests for recursing into submodules
  make is_submodule_populated gently
  connect_work_tree_and_git_dir: safely create leading directories
  update submodules: add submodule config parsing
  update submodules: add a config option to determine if submodules are
updated
  update submodules: introduce is_interesting_submodule
  update submodules: move up prepare_submodule_repo_env
  update submodules: add submodule_go_from_to
  unpack-trees: pass old oid to verify_clean_submodule
  unpack-trees: check if we can perform the operation for submodules
  read-cache: remove_marked_cache_entries to wipe selected submodules.
  entry.c: update submodules when interesting
  builtin/checkout: add --recurse-submodules switch
diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index fa1d557e5b..ed9d63ef4a 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -115,6 +115,12 @@ OPTIONS
directories the index file and index output file are
located in.
 
+--[no-]recurse-submodules::
+   Using --recurse-submodules will update the content of all initialized
+   submodules according to the commit recorded in the superproject by
+   

RE: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Dan Shumow
I played around tweaking the code a bit more and I got our performance down to 
a 2.077182x slowdown with check and a 1.055961x slowdown without checking.  
However, that slowdown is basically with the check turned off through our API.  
If I rip extraneous code for storing states and checking if we are doing 
collision detection out, I can reach performance parity with the block-sha1 
implementation in the Git codebase, which basically tells me that is about as 
good as I can do for optimizing the C code.

SHA1 is more amenable to assembler implementation because its use of rotations, 
which are notoriously difficult to access through C code.  And as this happens 
in the inner loop of the function, the inline asm tends to not cut it.  This is 
one of the reasons that the OpenSSL SHA-1 runs like a scalded monkey, compared 
to the C implemenations.  Marc and I have also discussed using SIMD operations 
to speed up the UBC checks, which could definitely help achieve better 
performance, but is highly dependent on processor support.  It will take some 
time to do either a SIMD implementation of the UBC checks or an assembler 
implementation.

At this point, I would suggest that I take the C optimizations, clean them up 
and fold them in with the diet changes Linus has suggested.  The slowdown is 
still 2x over block-sha1 and more over OpenSSL.  But it is better than nothing. 
 And then if there is interest Marc and I can investigate other processor 
specific optimizations like ASM or SIMD and circle back with those performance 
optimizations at a later date.

Also, to Johannes Schindelin's point:
> My concern is about that unexpected turn "oh, let's just switch to C99 
> because, well, because my compiler canehandle it, and everybody else should 
> just switch tn a modern compiler". That really sounded careless.

While it will probably be a pain, if it is a requirement, we can modify the 
code to move away from any c99 specific stuff we have in here, if it makes 
adopting the code more palatable for Git.

Thanks,
Dan




Re: [PATCH 03/18] lib-submodule-update: teach test_submodule_content the -C flag

2017-03-01 Thread Eric Wong
Stefan Beller  wrote:
>  test_submodule_content () {
> + if test "$1" == "-C"

Use a single '=' for portability in sh.  It's also a good idea
to prefix variables with 'x' or some such, since "$1" could be
"-n" or some other supported switch for test(1).
So, something like:

if test x"$1" = "x-C"

...or use a case statement.

On Debian systems, I use the "checkbashisms" from the
"devscripts" package to find and avoid bash dependencies.

Thanks.


[PATCH 03/18] lib-submodule-update: teach test_submodule_content the -C flag

2017-03-01 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index c0d6325133..a906c92cfb 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -193,6 +193,11 @@ test_superproject_content () {
 # Test that the given submodule at path "$1" contains the content according
 # to the submodule commit recorded in the superproject's commit "$2"
 test_submodule_content () {
+   if test "$1" == "-C"
+   then
+   cd "$2"
+   shift; shift;
+   fi
if test $# != 2
then
echo "test_submodule_content needs two arguments"
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 11/18] update submodules: move up prepare_submodule_repo_env

2017-03-01 Thread Stefan Beller
In a later patch we need to prepare the submodule environment with
another git directory, so split up the function.

Also move it up in the file such that we do not need to declare the
function later before using it.

Signed-off-by: Stefan Beller 
---
 submodule.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/submodule.c b/submodule.c
index 8b2c0212be..0b2596e88a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -356,6 +356,23 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
strbuf_release();
 }
 
+static void prepare_submodule_repo_env_no_git_dir(struct argv_array *out)
+{
+   const char * const *var;
+
+   for (var = local_repo_env; *var; var++) {
+   if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
+   argv_array_push(out, *var);
+   }
+}
+
+void prepare_submodule_repo_env(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
+DEFAULT_GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1390,18 +1407,6 @@ int parallel_submodules(void)
return parallel_jobs;
 }
 
-void prepare_submodule_repo_env(struct argv_array *out)
-{
-   const char * const *var;
-
-   for (var = local_repo_env; *var; var++) {
-   if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
-   argv_array_push(out, *var);
-   }
-   argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
-DEFAULT_GIT_DIR_ENVIRONMENT);
-}
-
 /*
  * Embeds a single submodules git directory into the superprojects git dir,
  * non recursively.
-- 
2.12.0.rc1.52.ge239d7e709.dirty



Re: Delta compression not so effective

2017-03-01 Thread Linus Torvalds
On Wed, Mar 1, 2017 at 4:12 PM, Marius Storm-Olsen  wrote:
>
> No, the list of git verify-objects in the previous post was from the bottom
> of the sorted list, so those are the largest blobs, ~249MB..

.. so with a 6GB window, you should easily sill have 20+ objects. Not
a huge window, but it should find some deltas.

But a smaller window - _together_ with a suboptimal sorting choice -
could then result in lack of successful delta matches.

> So, this repo must be knocking several parts of Git's insides. I was curious
> about why it was so slow on the writing objects part, since the whole repo
> is on a 4x RAID 5, 7k spindels. Now, they are not SSDs sure, but the thing
> has ~400MB/s continuous throughput available.
>
> iostat -m 5 showed trickle read/write to the process, and 80-100% CPU single
> thread (since the "write objects" stage is single threaded, obviously).

So the writing phase isn't multi-threaded because it's not expected to
matter. But if you can't even generate deltas, you aren't just
*writing* much more data, you're compressing all that data with zlib
too.

So even with a fast disk subsystem, you won't even be able to saturate
the disk, simply because the compression will be slower (and
single-threaded).

> Filenames are fairly static, and the bulk of the 6000 biggest non-delta'ed
> blobs are the same DLLs (multiple of them)

I think the first thing you should test is to repack with fewer
threads, and a bigger pack window. Do somethinig like

  -c pack.threads=4 --window-memory=30g

instead. Just to see if that starts finding deltas.

> Right, now on this machine, I really didn't notice much difference between
> standard zlib level and doing -9. The 203GB version was actually with
> zlib=9.

Don't. zlib has *horrible* scaling with higher compressions. It
doesn't actually improve the end result very much, and it makes things
*much* slower.

zlib was a reasonable choice when git started - well-known, stable, easy to use.

But realistically it's a relatively horrible choice today, just
because there are better alternatives now.

>> Hos sensitive is your material? Could you make a smaller repo with
>> some of the blobs that still show the symptoms? I don't think I want
>> to download 206GB of data even if my internet access is good.
>
> Pretty sensitive, and not sure how I can reproduce this reasonable well.
> However, I can easily recompile git with any recommended
> instrumentation/printfs, if you have any suggestions of good places to
> start? If anyone have good file/line numbers, I'll give that a go, and
> report back?

So the first thing you might want to do is to just print out the
objects after sorting them, and before it starts trying to finsd
deltas.

See prepare_pack() in builtin/pack-objects.c, where it does something like this:

if (nr_deltas && n > 1) {
unsigned nr_done = 0;
if (progress)
progress_state = start_progress(_("Compressing
objects"),
nr_deltas);
QSORT(delta_list, n, type_size_sort);
ll_find_deltas(delta_list, n, window+1, depth, _done);
stop_progress(_state);


and notice that QSORT() line: that's what sorts the objects. You can
do something like

for (i = 0; i < n; i++)
show_object_entry_details(delta_list[i]);

right after that QSORT(), and make that print out the object hash,
filename hash, and size (we don't have the filename that the object
was associated with any more at that stage - they take too much
space).

Save off that array for off-line processing: when you have the object
hash, you can see what the contents are, and match it up wuith the
file in the git history using something like

   git log --oneline --raw -R --abbrev=40

which shows you the log, but also the "diff" in the form of "this
filename changed from SHA1 to SHA1", so you can match up the object
hashes with where they are in the tree (and where they are in
history).

So then you could try to figure out if that type_size_sort() heuristic
is just particularly horrible for you.

In fact, if your data is not *so* sensitive, and you're ok with making
the one-line commit logs and the filenames public, you could make just
those things available, and maybe I'll have time to look at it.

I'm in the middle of the kernel merge window, but I'm in the last
stretch, and because of the SHA1 thing I've been looking at git
lately. No promises, though.

   Linus


[PATCH 15/18] read-cache, remove_marked_cache_entries: wipe selected submodules.

2017-03-01 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 read-cache.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 9054369dd0..9a2abacf7a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,8 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "submodule.h"
+#include "submodule-config.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -520,6 +522,22 @@ int remove_index_entry_at(struct index_state *istate, int 
pos)
return 1;
 }
 
+static void remove_submodule_according_to_strategy(const struct submodule *sub)
+{
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   submodule_move_head(sub->path, "HEAD", NULL, \
+   SUBMODULE_MOVE_HEAD_FORCE);
+   break;
+   case SM_UPDATE_NONE:
+   case SM_UPDATE_COMMAND:
+   ; /* Do not touch the submodule. */
+   }
+}
+
 /*
  * Remove all cache entries marked for removal, that is where
  * CE_REMOVE is set in ce_flags.  This is much more effective than
@@ -532,8 +550,13 @@ void remove_marked_cache_entries(struct index_state 
*istate)
 
for (i = j = 0; i < istate->cache_nr; i++) {
if (ce_array[i]->ce_flags & CE_REMOVE) {
-   remove_name_hash(istate, ce_array[i]);
-   save_or_free_index_entry(istate, ce_array[i]);
+   const struct submodule *sub = 
submodule_from_ce(ce_array[i]);
+   if (sub) {
+   remove_submodule_according_to_strategy(sub);
+   } else {
+   remove_name_hash(istate, ce_array[i]);
+   save_or_free_index_entry(istate, ce_array[i]);
+   }
}
else
ce_array[j++] = ce_array[i];
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 13/18] unpack-trees: pass old oid to verify_clean_submodule

2017-03-01 Thread Stefan Beller
The check (which uses the old oid) is yet to be implemented, but this part
is just a refactor, so it can go separately first.

Signed-off-by: Stefan Beller 
---
 unpack-trees.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19fe8..616a0ae4b2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1407,7 +1407,8 @@ static void invalidate_ce_path(const struct cache_entry 
*ce,
  * Currently, git does not checkout subprojects during a superproject
  * checkout, so it is not going to overwrite anything.
  */
-static int verify_clean_submodule(const struct cache_entry *ce,
+static int verify_clean_submodule(const char *old_sha1,
+ const struct cache_entry *ce,
  enum unpack_trees_error_types error_type,
  struct unpack_trees_options *o)
 {
@@ -1427,16 +1428,18 @@ static int verify_clean_subdirectory(const struct 
cache_entry *ce,
struct dir_struct d;
char *pathbuf;
int cnt = 0;
-   unsigned char sha1[20];
 
-   if (S_ISGITLINK(ce->ce_mode) &&
-   resolve_gitlink_ref(ce->name, "HEAD", sha1) == 0) {
-   /* If we are not going to update the submodule, then
+   if (S_ISGITLINK(ce->ce_mode)) {
+   unsigned char sha1[20];
+   int sub_head = resolve_gitlink_ref(ce->name, "HEAD", sha1);
+   /*
+* If we are not going to update the submodule, then
 * we don't care.
 */
-   if (!hashcmp(sha1, ce->oid.hash))
+   if (!sub_head && !hashcmp(sha1, ce->oid.hash))
return 0;
-   return verify_clean_submodule(ce, error_type, o);
+   return verify_clean_submodule(sub_head ? NULL : 
sha1_to_hex(sha1),
+ ce, error_type, o);
}
 
/*
-- 
2.12.0.rc1.52.ge239d7e709.dirty



Re: [PATCH 2/2] pretty: use fmt_output_email_subject()

2017-03-01 Thread Jeff King
On Wed, Mar 01, 2017 at 11:38:57AM -0800, Junio C Hamano wrote:

> On Wed, Mar 1, 2017 at 11:32 AM, Jeff King  wrote:
> > I would think that future callers would just need to provide a dummy
> > pp->rev. I guess that logic could be pushed down into
> > fmt_output_email_subject(), so that it skips looking at
> > opt->subject_prefix, etc, when "opt" is NULL, and just hits the
> > "Subject:" case arm.
> 
> The "flexibility" I was wondering about is that the current .subject can
> point at any caller-supplied string, not "Subject:".

Ah, I see. I don't think that is a huge loss, as nobody was using it.
And "Subject:" is already hard-coded in the nr/total counting bits,
which are what you'd want to reuse. I think it is fine to punt to the
future. If somebody really wants it later, the right fix is for them to
provide a string that fmt_output_email_subject() would use in place of
"Subject:" when it adds to the strbuf.

-Peff


[PATCH 06/18] make is_submodule_populated gently

2017-03-01 Thread Stefan Beller
We need the gentle version in a later patch. As we have just one caller,
migrate the caller.

Signed-off-by: Stefan Beller 
---
 builtin/grep.c | 2 +-
 submodule.c| 7 ++-
 submodule.h| 8 +++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef499..b17835aed6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -616,7 +616,7 @@ static int grep_submodule(struct grep_opt *opt, const 
unsigned char *sha1,
 {
if (!is_submodule_initialized(path))
return 0;
-   if (!is_submodule_populated(path)) {
+   if (!is_submodule_populated_gently(path, NULL)) {
/*
 * If searching history, check for the presense of the
 * submodule's gitdir before skipping the submodule.
diff --git a/submodule.c b/submodule.c
index 3b98766a6b..0e55372f37 100644
--- a/submodule.c
+++ b/submodule.c
@@ -234,15 +234,12 @@ int is_submodule_initialized(const char *path)
return ret;
 }
 
-/*
- * Determine if a submodule has been populated at a given 'path'
- */
-int is_submodule_populated(const char *path)
+int is_submodule_populated_gently(const char *path, int *return_error_code)
 {
int ret = 0;
char *gitdir = xstrfmt("%s/.git", path);
 
-   if (resolve_gitdir(gitdir))
+   if (resolve_gitdir_gently(gitdir, return_error_code))
ret = 1;
 
free(gitdir);
diff --git a/submodule.h b/submodule.h
index 05ab674f06..0b915bd3ac 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,7 +41,13 @@ extern int submodule_config(const char *var, const char 
*value, void *cb);
 extern void gitmodules_config(void);
 extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
-extern int is_submodule_populated(const char *path);
+/*
+ * Determine if a submodule has been populated at a given 'path' by checking if
+ * the /.git resolves to a valid git repository.
+ * If return_error_code is NULL, die on error.
+ * Otherwise the return error code is the same as of resolve_gitdir_gently.
+ */
+extern int is_submodule_populated_gently(const char *path, int 
*return_error_code);
 extern int parse_submodule_update_strategy(const char *value,
struct submodule_update_strategy *dst);
 extern const char *submodule_strategy_to_string(const struct 
submodule_update_strategy *s);
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 17/18] builtin/checkout: add --recurse-submodules switch

2017-03-01 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 Documentation/git-checkout.txt |  7 +++
 builtin/checkout.c | 28 
 t/lib-submodule-update.sh  | 23 +--
 t/t2013-checkout-submodule.sh  |  5 +
 4 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index 8e2c0662dd..d6399c0af8 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -256,6 +256,13 @@ section of linkgit:git-add[1] to learn how to operate the 
`--patch` mode.
out anyway. In other words, the ref can be held by more than one
worktree.
 
+--[no-]recurse-submodules::
+   Using --recurse-submodules will update the content of all initialized
+   submodules according to the commit recorded in the superproject. If
+   local modifications in a submodule would be overwritten the checkout
+   will fail unless `-f` is used. If nothing (or --no-recurse-submodules)
+   is used, the work trees of submodules will not be updated.
+
 ::
Branch to checkout; if it refers to a branch (i.e., a name that,
when prepended with "refs/heads/", is a valid ref), then that
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f174f50303..e9c5fcfaf8 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -21,12 +21,31 @@
 #include "submodule-config.h"
 #include "submodule.h"
 
+static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
+
 static const char * const checkout_usage[] = {
N_("git checkout [] "),
N_("git checkout [] [] -- ..."),
NULL,
 };
 
+static int option_parse_recurse_submodules(const struct option *opt,
+  const char *arg, int unset)
+{
+   if (unset) {
+   recurse_submodules = RECURSE_SUBMODULES_OFF;
+   return 0;
+   }
+   if (arg)
+   recurse_submodules =
+   parse_update_recurse_submodules_arg(opt->long_name,
+   arg);
+   else
+   recurse_submodules = RECURSE_SUBMODULES_ON;
+
+   return 0;
+}
+
 struct checkout_opts {
int patch_mode;
int quiet;
@@ -1163,6 +1182,9 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
N_("second guess 'git checkout 
'")),
OPT_BOOL(0, "ignore-other-worktrees", 
_other_worktrees,
 N_("do not check if another worktree is holding the 
given ref")),
+   { OPTION_CALLBACK, 0, "recurse-submodules", _submodules,
+   "checkout", "control recursive updating of 
submodules",
+   PARSE_OPT_OPTARG, option_parse_recurse_submodules },
OPT_BOOL(0, "progress", _progress, N_("force progress 
reporting")),
OPT_END(),
};
@@ -1193,6 +1215,12 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
git_xmerge_config("merge.conflictstyle", conflict_style, NULL);
}
 
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
+   git_config(submodule_config, NULL);
+   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT)
+   
set_config_update_recurse_submodules(recurse_submodules);
+   }
+
if ((!!opts.new_branch + !!opts.new_branch_force + 
!!opts.new_orphan_branch) > 1)
die(_("-b, -B and --orphan are mutually exclusive"));
 
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index aa196d4d42..d71972c3db 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -782,6 +782,11 @@ test_submodule_forced_switch () {
 
 test_submodule_switch_recursing () {
command="$1"
+   RESULT=success
+   if test "$KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS" = 1
+   then
+   RESULT=failure
+   fi
# Appearing submodule #
# Switching to a commit letting a submodule appear checks it out ...
test_expect_success "$command: added submodule is checked out" '
@@ -891,7 +896,7 @@ test_submodule_switch_recursing () {
'
# Replacing a submodule with files in a directory must succeeds
# when the submodule is clean
-   test_expect_success "$command: replace submodule with a directory" '
+   test_expect_$RESULT "$command: replace submodule with a directory" '
prolog &&
reset_work_tree_to_interested add_sub1 &&
(
@@ -903,7 +908,7 @@ test_submodule_switch_recursing () {
)
'
# ... absorbing a .git directory.
-   test_expect_success "$command: replace submodule containing a .git 
directory with a directory must absorb the git dir" '
+   test_expect_$RESULT "$command: replace 

[PATCH 18/18] builtin/read-tree: add --recurse-submodules switch

2017-03-01 Thread Stefan Beller
A new known failure mode is introduced[1], which is actually not
a failure but a feature in read-tree. Unlike checkout for which
the recursive submodule tests were originally written, read-tree does
warn about ignored untracked files that would be overwritten.
For the sake of keeping the test library for submodules genric, just
mark the test as a failure.

[1] KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED

Signed-off-by: Stefan Beller 
---
 Documentation/git-read-tree.txt |  6 ++
 builtin/read-tree.c | 29 +
 t/lib-submodule-update.sh   |  7 ++-
 t/t1013-read-tree-submodule.sh  |  7 +++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index fa1d557e5b..ed9d63ef4a 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -115,6 +115,12 @@ OPTIONS
directories the index file and index output file are
located in.
 
+--[no-]recurse-submodules::
+   Using --recurse-submodules will update the content of all initialized
+   submodules according to the commit recorded in the superproject by
+   calling read-tree recursively, also setting the submodules HEAD to be
+   detached at that commit.
+
 --no-sparse-checkout::
Disable sparse checkout support even if `core.sparseCheckout`
is true.
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 8ba64bc785..2dc5cc06da 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -15,10 +15,13 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "resolve-undo.h"
+#include "submodule.h"
+#include "submodule-config.h"
 
 static int nr_trees;
 static int read_empty;
 static struct tree *trees[MAX_UNPACK_TREES];
+int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 
 static int list_tree(unsigned char *sha1)
 {
@@ -96,6 +99,23 @@ static int debug_merge(const struct cache_entry * const 
*stages,
return 0;
 }
 
+static int option_parse_recurse_submodules(const struct option *opt,
+  const char *arg, int unset)
+{
+   if (unset) {
+   recurse_submodules = RECURSE_SUBMODULES_OFF;
+   return 0;
+   }
+   if (arg)
+   recurse_submodules =
+   parse_update_recurse_submodules_arg(opt->long_name,
+   arg);
+   else
+   recurse_submodules = RECURSE_SUBMODULES_ON;
+
+   return 0;
+}
+
 static struct lock_file lock_file;
 
 int cmd_read_tree(int argc, const char **argv, const char *unused_prefix)
@@ -137,6 +157,9 @@ int cmd_read_tree(int argc, const char **argv, const char 
*unused_prefix)
 N_("skip applying sparse checkout filter")),
OPT_BOOL(0, "debug-unpack", _unpack,
 N_("debug unpack-trees")),
+   { OPTION_CALLBACK, 0, "recurse-submodules", _submodules,
+   "checkout", "control recursive updating of 
submodules",
+   PARSE_OPT_OPTARG, option_parse_recurse_submodules },
OPT_END()
};
 
@@ -152,6 +175,12 @@ int cmd_read_tree(int argc, const char **argv, const char 
*unused_prefix)
 
hold_locked_index(_file, LOCK_DIE_ON_ERROR);
 
+   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT) {
+   gitmodules_config();
+   git_config(submodule_config, NULL);
+   set_config_update_recurse_submodules(RECURSE_SUBMODULES_ON);
+   }
+
prefix_set = opts.prefix ? 1 : 0;
if (1 < opts.merge + opts.reset + prefix_set)
die("Which one? -m, --reset, or --prefix?");
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index d71972c3db..5caae06bc5 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -787,6 +787,11 @@ test_submodule_switch_recursing () {
then
RESULT=failure
fi
+   RESULT1=success
+   if test "$KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED" = 1
+   then
+   RESULT1=failure
+   fi
# Appearing submodule #
# Switching to a commit letting a submodule appear checks it out ...
test_expect_success "$command: added submodule is checked out" '
@@ -827,7 +832,7 @@ test_submodule_switch_recursing () {
)
'
# ... but an ignored file is fine.
-   test_expect_success "$command: added submodule removes an untracked 
ignored file" '
+   test_expect_$RESULT1 "$command: added submodule removes an untracked 
ignored file" '
test_when_finished "rm submodule_update/.git/info/exclude" &&
prolog &&
reset_work_tree_to_interested no_submodule &&
diff --git a/t/t1013-read-tree-submodule.sh 

[PATCH 07/18] connect_work_tree_and_git_dir: safely create leading directories

2017-03-01 Thread Stefan Beller
In a later patch we'll use connect_work_tree_and_git_dir when the
directory for the gitlink file doesn't exist yet. This patch makes
connect_work_tree_and_git_dir safe to use for both cases of
either the git dir or the working dir missing.

To do so, we need to call safe_create_leading_directories[_const]
on both directories. However this has to happen before we construct
the absolute paths as real_pathdup assumes the directories to
be there already.

So for both the config file in the git dir as well as the .git link
file we need to
a) construct the name
b) call SCLD
c) get the absolute path
d) once a-c is done for both we can consume the absolute path
   to compute the relative path to each other and store those
   relative paths.

The implementation provided here puts a) and b) for both cases first,
and then performs c and d after.

One of the two users of 'connect_work_tree_and_git_dir' already checked
for the directory being there, so we can loose that check as
connect_work_tree_and_git_dir handles this functionality now.

Signed-off-by: Stefan Beller 
---
 dir.c   | 32 +---
 submodule.c | 11 ++-
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/dir.c b/dir.c
index 4541f9e146..6f52af7abb 100644
--- a/dir.c
+++ b/dir.c
@@ -2728,23 +2728,33 @@ void untracked_cache_add_to_index(struct index_state 
*istate,
 /* Update gitfile and core.worktree setting to connect work tree and git dir */
 void connect_work_tree_and_git_dir(const char *work_tree_, const char 
*git_dir_)
 {
-   struct strbuf file_name = STRBUF_INIT;
+   struct strbuf gitfile_sb = STRBUF_INIT;
+   struct strbuf cfg_sb = STRBUF_INIT;
struct strbuf rel_path = STRBUF_INIT;
-   char *git_dir = real_pathdup(git_dir_);
-   char *work_tree = real_pathdup(work_tree_);
+   char *git_dir, *work_tree;
 
-   /* Update gitfile */
-   strbuf_addf(_name, "%s/.git", work_tree);
-   write_file(file_name.buf, "gitdir: %s",
-  relative_path(git_dir, work_tree, _path));
+   /* Prepare .git file */
+   strbuf_addf(_sb, "%s/.git", work_tree_);
+   if (safe_create_leading_directories_const(gitfile_sb.buf))
+   die(_("could not create directories for %s"), gitfile_sb.buf);
+
+   /* Prepare config file */
+   strbuf_addf(_sb, "%s/config", git_dir_);
+   if (safe_create_leading_directories_const(cfg_sb.buf))
+   die(_("could not create directories for %s"), cfg_sb.buf);
 
+   git_dir = real_pathdup(git_dir_);
+   work_tree = real_pathdup(work_tree_);
+
+   /* Write .git file */
+   write_file(gitfile_sb.buf, "gitdir: %s",
+  relative_path(git_dir, work_tree, _path));
/* Update core.worktree setting */
-   strbuf_reset(_name);
-   strbuf_addf(_name, "%s/config", git_dir);
-   git_config_set_in_file(file_name.buf, "core.worktree",
+   git_config_set_in_file(cfg_sb.buf, "core.worktree",
   relative_path(work_tree, git_dir, _path));
 
-   strbuf_release(_name);
+   strbuf_release(_sb);
+   strbuf_release(_sb);
strbuf_release(_path);
free(work_tree);
free(git_dir);
diff --git a/submodule.c b/submodule.c
index 0e55372f37..04d185738f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1442,8 +1442,6 @@ void absorb_git_dir_into_superproject(const char *prefix,
 
/* Not populated? */
if (!sub_git_dir) {
-   char *real_new_git_dir;
-   const char *new_git_dir;
const struct submodule *sub;
 
if (err_code == READ_GITFILE_ERR_STAT_FAILED) {
@@ -1466,13 +1464,8 @@ void absorb_git_dir_into_superproject(const char *prefix,
sub = submodule_from_path(null_sha1, path);
if (!sub)
die(_("could not lookup name for submodule '%s'"), 
path);
-   new_git_dir = git_path("modules/%s", sub->name);
-   if (safe_create_leading_directories_const(new_git_dir) < 0)
-   die(_("could not create directory '%s'"), new_git_dir);
-   real_new_git_dir = real_pathdup(new_git_dir);
-   connect_work_tree_and_git_dir(path, real_new_git_dir);
-
-   free(real_new_git_dir);
+   connect_work_tree_and_git_dir(path,
+   git_path("modules/%s", sub->name));
} else {
/* Is it already absorbed into the superprojects git dir? */
char *real_sub_git_dir = real_pathdup(sub_git_dir);
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 04/18] lib-submodule-update.sh: replace sha1 by hash

2017-03-01 Thread Stefan Beller
Cleaning up code by generalising it.
Currently the mailing list discusses yet again how
to migrate away from sha1.

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index a906c92cfb..83202c54fc 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -171,9 +171,9 @@ reset_work_tree_to () {
git checkout -f "$1" &&
git status -u -s >actual &&
test_must_be_empty actual &&
-   sha1=$(git rev-parse --revs-only HEAD:sub1) &&
-   if test -n "$sha1" &&
-  test $(cd "../submodule_update_sub1" && git rev-parse 
--verify "$sha1^{commit}")
+   hash=$(git rev-parse --revs-only HEAD:sub1) &&
+   if test -n "$hash" &&
+  test $(cd "../submodule_update_sub1" && git rev-parse 
--verify "$hash^{commit}")
then
git submodule update --init --recursive "sub1"
fi
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 08/18] update submodules: add submodule config parsing

2017-03-01 Thread Stefan Beller
Similar to b33a15b08 (push: add recurseSubmodules config option,
2015-11-17) and 027771fcb1 (submodule: allow erroneous values for the
fetchRecurseSubmodules option, 2015-08-17), we add submodule-config code
that is later used to parse whether we are interested in updating
submodules.

We need the `die_on_error` parameter to be able to call this parsing
function for the config file as well, which if incorrect lets Git die.

As we're just touching the header file, also mark all functions extern.

Signed-off-by: Stefan Beller 
---
 submodule-config.c | 20 
 submodule-config.h | 17 +
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 93453909cf..3e8e380d98 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -234,6 +234,26 @@ int parse_fetch_recurse_submodules_arg(const char *opt, 
const char *arg)
return parse_fetch_recurse(opt, arg, 1);
 }
 
+static int parse_update_recurse(const char *opt, const char *arg,
+   int die_on_error)
+{
+   switch (git_config_maybe_bool(opt, arg)) {
+   case 1:
+   return RECURSE_SUBMODULES_ON;
+   case 0:
+   return RECURSE_SUBMODULES_OFF;
+   default:
+   if (die_on_error)
+   die("bad %s argument: %s", opt, arg);
+   return RECURSE_SUBMODULES_ERROR;
+   }
+}
+
+int parse_update_recurse_submodules_arg(const char *opt, const char *arg)
+{
+   return parse_update_recurse(opt, arg, 1);
+}
+
 static int parse_push_recurse(const char *opt, const char *arg,
   int die_on_error)
 {
diff --git a/submodule-config.h b/submodule-config.h
index 70f19363fd..d434ecdb45 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -22,16 +22,17 @@ struct submodule {
int recommend_shallow;
 };
 
-int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
-int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
-int parse_submodule_config_option(const char *var, const char *value);
-const struct submodule *submodule_from_name(const unsigned char 
*commit_or_tree,
-   const char *name);
-const struct submodule *submodule_from_path(const unsigned char 
*commit_or_tree,
-   const char *path);
+extern int parse_fetch_recurse_submodules_arg(const char *opt, const char 
*arg);
+extern int parse_update_recurse_submodules_arg(const char *opt, const char 
*arg);
+extern int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
+extern int parse_submodule_config_option(const char *var, const char *value);
+extern const struct submodule *submodule_from_name(
+   const unsigned char *commit_or_tree, const char *name);
+extern const struct submodule *submodule_from_path(
+   const unsigned char *commit_or_tree, const char *path);
 extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
  unsigned char *gitmodules_sha1,
  struct strbuf *rev);
-void submodule_free(void);
+extern void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 16/18] entry.c: update submodules when interesting

2017-03-01 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 entry.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/entry.c b/entry.c
index c6eea240b6..d2b512da90 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,7 @@
 #include "blob.h"
 #include "dir.h"
 #include "streaming.h"
+#include "submodule.h"
 
 static void create_directories(const char *path, int path_len,
   const struct checkout *state)
@@ -146,6 +147,7 @@ static int write_entry(struct cache_entry *ce,
unsigned long size;
size_t wrote, newsize = 0;
struct stat st;
+   const struct submodule *sub;
 
if (ce_mode_s_ifmt == S_IFREG) {
struct stream_filter *filter = get_stream_filter(ce->name,
@@ -203,6 +205,10 @@ static int write_entry(struct cache_entry *ce,
return error("cannot create temporary submodule %s", 
path);
if (mkdir(path, 0777) < 0)
return error("cannot create submodule directory %s", 
path);
+   sub = submodule_from_ce(ce);
+   if (sub)
+   return submodule_move_head(ce->name,
+   NULL, oid_to_hex(>oid), 
SUBMODULE_MOVE_HEAD_FORCE);
break;
default:
return error("unknown file mode for %s in index", path);
@@ -259,7 +265,31 @@ int checkout_entry(struct cache_entry *ce,
strbuf_add(, ce->name, ce_namelen(ce));
 
if (!check_path(path.buf, path.len, , state->base_dir_len)) {
+   const struct submodule *sub;
unsigned changed = ce_match_stat(ce, , 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
+   /*
+* Needs to be checked before !changed returns early,
+* as the possibly empty directory was not changed
+*/
+   sub = submodule_from_ce(ce);
+   if (sub) {
+   int err;
+   if (!is_submodule_populated_gently(ce->name, )) {
+   struct stat sb;
+   if (lstat(ce->name, ))
+   die(_("could not stat file '%s'"), 
ce->name);
+   if (!(st.st_mode & S_IFDIR))
+   unlink_or_warn(ce->name);
+
+   return submodule_move_head(ce->name,
+   NULL, oid_to_hex(>oid),
+   SUBMODULE_MOVE_HEAD_FORCE);
+   } else
+   return submodule_move_head(ce->name,
+   "HEAD", oid_to_hex(>oid),
+   SUBMODULE_MOVE_HEAD_FORCE);
+   }
+
if (!changed)
return 0;
if (!state->force) {
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 12/18] update submodules: add submodule_move_head

2017-03-01 Thread Stefan Beller
In later patches we introduce the options and flag for commands
that modify the working directory, e.g. git-checkout.

This piece of code will be used universally for
all these working tree modifications as it
* supports dry run to answer the question:
  "Is it safe to change the submodule to this new state?"
  e.g. is it overwriting untracked files or are there local
  changes that would be overwritten?
* supports a force flag that can be used for resetting
  the tree.

Signed-off-by: Stefan Beller 
---
 submodule.c | 135 
 submodule.h |   7 
 2 files changed, 142 insertions(+)

diff --git a/submodule.c b/submodule.c
index 0b2596e88a..bc5fecf8c5 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1239,6 +1239,141 @@ int bad_to_remove_submodule(const char *path, unsigned 
flags)
return ret;
 }
 
+static int submodule_has_dirty_index(const struct submodule *sub)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   argv_array_pushl(, "diff-index", "--quiet", \
+   "--cached", "HEAD", NULL);
+   cp.no_stdin = 1;
+   cp.no_stdout = 1;
+   cp.dir = sub->path;
+   if (start_command())
+   die("could not recurse into submodule '%s'", sub->path);
+
+   return finish_command();
+}
+
+static void submodule_reset_index(const char *path)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+
+   argv_array_pushf(, "--super-prefix=%s/", path);
+   argv_array_pushl(, "read-tree", "-u", "--reset", NULL);
+
+   argv_array_push(, EMPTY_TREE_SHA1_HEX);
+
+   if (run_command())
+   die("could not reset submodule index");
+}
+
+/**
+ * Moves a submodule at a given path from a given head to another new head.
+ * For edge cases (a submodule coming into existence or removing a submodule)
+ * pass NULL for old or new respectively.
+ */
+int submodule_move_head(const char *path,
+const char *old,
+const char *new,
+unsigned flags)
+{
+   int ret = 0;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   const struct submodule *sub;
+
+   sub = submodule_from_path(null_sha1, path);
+
+   if (!sub)
+   die("BUG: could not get submodule information for '%s'", path);
+
+   if (old && !(flags & SUBMODULE_MOVE_HEAD_FORCE)) {
+   /* Check if the submodule has a dirty index. */
+   if (submodule_has_dirty_index(sub))
+   return error(_("submodule '%s' has dirty index"), path);
+   }
+
+   if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
+   if (old) {
+   if (!submodule_uses_gitfile(path))
+   absorb_git_dir_into_superproject("", path,
+   ABSORB_GITDIR_RECURSE_SUBMODULES);
+   } else {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addf(, "%s/modules/%s",
+   get_git_common_dir(), sub->name);
+   connect_work_tree_and_git_dir(path, sb.buf);
+   strbuf_release();
+
+   /* make sure the index is clean as well */
+   submodule_reset_index(path);
+   }
+   }
+
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+
+   argv_array_pushf(, "--super-prefix=%s/", path);
+   argv_array_pushl(, "read-tree", NULL);
+
+   if (flags & SUBMODULE_MOVE_HEAD_DRY_RUN)
+   argv_array_push(, "-n");
+   else
+   argv_array_push(, "-u");
+
+   if (flags & SUBMODULE_MOVE_HEAD_FORCE)
+   argv_array_push(, "--reset");
+   else
+   argv_array_push(, "-m");
+
+   argv_array_push(, old ? old : EMPTY_TREE_SHA1_HEX);
+   argv_array_push(, new ? new : EMPTY_TREE_SHA1_HEX);
+
+   if (run_command()) {
+   ret = -1;
+   goto out;
+   }
+
+   if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
+   if (new) {
+   struct child_process cp1 = CHILD_PROCESS_INIT;
+   /* also set the HEAD accordingly */
+   cp1.git_cmd = 1;
+   cp1.no_stdin = 1;
+   cp1.dir = path;
+
+   argv_array_pushl(, "update-ref", "HEAD",
+new ? new : EMPTY_TREE_SHA1_HEX, NULL);
+
+   if (run_command()) {
+   ret = -1;
+   goto out;
+   

[PATCH 10/18] submodules: introduce check to see whether to touch a submodule

2017-03-01 Thread Stefan Beller
In later patches we introduce the --recurse-submodule flag for commands
that modify the working directory, e.g. git-checkout.

It is potentially expensive to check if a submodule needs an update,
because a common theme to interact with submodules is to spawn a child
process for each interaction.

So let's introduce a function that checks if a submodule needs
to be checked for an update before attempting the update.

Signed-off-by: Stefan Beller 
---
 submodule.c | 16 
 submodule.h |  7 +++
 2 files changed, 23 insertions(+)

diff --git a/submodule.c b/submodule.c
index 591f4a694e..8b2c0212be 100644
--- a/submodule.c
+++ b/submodule.c
@@ -548,6 +548,22 @@ void set_config_update_recurse_submodules(int value)
config_update_recurse_submodules = value;
 }
 
+int should_update_submodules(void)
+{
+   return config_update_recurse_submodules == RECURSE_SUBMODULES_ON;
+}
+
+const struct submodule *submodule_from_ce(const struct cache_entry *ce)
+{
+   if (!S_ISGITLINK(ce->ce_mode))
+   return NULL;
+
+   if (!should_update_submodules())
+   return NULL;
+
+   return submodule_from_path(null_sha1, ce->name);
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
diff --git a/submodule.h b/submodule.h
index b4e60c08d2..6f3fe85c7c 100644
--- a/submodule.h
+++ b/submodule.h
@@ -65,6 +65,13 @@ extern void show_submodule_inline_diff(FILE *f, const char 
*path,
const struct diff_options *opt);
 extern void set_config_fetch_recurse_submodules(int value);
 extern void set_config_update_recurse_submodules(int value);
+/* Check if we want to update any submodule.*/
+extern int should_update_submodules(void);
+/*
+ * Returns the submodule struct if the given ce entry is a submodule
+ * and it should be updated. Returns NULL otherwise.
+ */
+extern const struct submodule *submodule_from_ce(const struct cache_entry *ce);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 02/18] lib-submodule-update.sh: do not use ./. as submodule remote

2017-03-01 Thread Stefan Beller
Adding the repository itself as a submodule does not make sense in the
real world. In our test suite we used to do that out of convenience in
some tests as the current repository has easiest access for setting up
'just a submodule'.

However this doesn't quite test the real world, so let's do not follow
this pattern any further and actually create an independent repository
that we can use as a submodule.

When using './.' as the remote the superproject and submodule share the
same objects, such that testing if a given sha1 is a valid commit works
in either repository.  As running commands in an unpopulated submodule
fall back to the superproject, this happens in `reset_work_tree_to`
to determine if we need to populate the submodule. Fix this bug by
checking in the actual remote now.

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 5df528ea81..c0d6325133 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -37,6 +37,17 @@
 #
 
 create_lib_submodule_repo () {
+   git init submodule_update_sub1 &&
+   (
+   cd submodule_update_sub1 &&
+   echo "expect" >>.gitignore &&
+   echo "actual" >>.gitignore &&
+   echo "x" >file1 &&
+   echo "y" >file2 &&
+   git add .gitignore file1 file2 &&
+   git commit -m "Base inside first submodule" &&
+   git branch "no_submodule"
+   ) &&
git init submodule_update_repo &&
(
cd submodule_update_repo &&
@@ -49,7 +60,7 @@ create_lib_submodule_repo () {
git branch "no_submodule" &&
 
git checkout -b "add_sub1" &&
-   git submodule add ./. sub1 &&
+   git submodule add ../submodule_update_sub1 sub1 &&
git config -f .gitmodules submodule.sub1.ignore all &&
git config submodule.sub1.ignore all &&
git add .gitmodules &&
@@ -162,7 +173,7 @@ reset_work_tree_to () {
test_must_be_empty actual &&
sha1=$(git rev-parse --revs-only HEAD:sub1) &&
if test -n "$sha1" &&
-  test $(cd "sub1" && git rev-parse --verify "$sha1^{commit}")
+  test $(cd "../submodule_update_sub1" && git rev-parse 
--verify "$sha1^{commit}")
then
git submodule update --init --recursive "sub1"
fi
-- 
2.12.0.rc1.52.ge239d7e709.dirty



[PATCH 09/18] update submodules: add a config option to determine if submodules are updated

2017-03-01 Thread Stefan Beller
In later patches we introduce the options and flag for commands
that modify the working directory, e.g. git-checkout.

Have a central place to store such settings whether we want to update
a submodule.

Signed-off-by: Stefan Beller 
---
 submodule.c | 6 ++
 submodule.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/submodule.c b/submodule.c
index 04d185738f..591f4a694e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -17,6 +17,7 @@
 #include "worktree.h"
 
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
+static int config_update_recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int parallel_jobs = 1;
 static struct string_list changed_submodule_paths = STRING_LIST_INIT_NODUP;
 static int initialized_fetch_ref_tips;
@@ -542,6 +543,11 @@ void set_config_fetch_recurse_submodules(int value)
config_fetch_recurse_submodules = value;
 }
 
+void set_config_update_recurse_submodules(int value)
+{
+   config_update_recurse_submodules = value;
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
diff --git a/submodule.h b/submodule.h
index 0b915bd3ac..b4e60c08d2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -64,6 +64,7 @@ extern void show_submodule_inline_diff(FILE *f, const char 
*path,
const char *del, const char *add, const char *reset,
const struct diff_options *opt);
 extern void set_config_fetch_recurse_submodules(int value);
+extern void set_config_update_recurse_submodules(int value);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.12.0.rc1.52.ge239d7e709.dirty



From I. Joseph.

2017-03-01 Thread university
Greetings From Igho,

I am Igho Joseph, local miner of Gold and Diamonds and have much to sell to 
investors contact me if interested of the above mention so we could find a 
ground to connect.

Here are my contacts. Skype... ig.diamondsupply  Cell phone +221776771176
Email... ig.diamondsup...@hotmail.com.

Best Regards
I. Joseph.


Re: Delta compression not so effective

2017-03-01 Thread Marius Storm-Olsen

On 3/1/2017 12:30, Linus Torvalds wrote:

On Wed, Mar 1, 2017 at 9:57 AM, Marius Storm-Olsen  wrote:


Indeed, I did do a
-c pack.threads=20 --window-memory=6g
to 'git repack', since the machine is a 20-core (40 threads) machine with
126GB of RAM.

So I guess with these sized objects, even at 6GB per thread, it's not enough
to get a big enough Window for proper delta-packing?


Hmm. The 6GB window should be plenty good enough, unless your blobs
are in the gigabyte range too.


No, the list of git verify-objects in the previous post was from the 
bottom of the sorted list, so those are the largest blobs, ~249MB..




This repo took >14hr to repack on 20 threads though ("compression" step was
very fast, but stuck 95% of the time in "writing objects"), so I can only
imagine how long a pack.threads=1 will take :)


Actually, it's usually the compression phase that should be slow - but
if something is limiting finding deltas (so that we abort early), then
that would certainly tend to speed up compression.

The "writing objects" phase should be mainly about the actual IO.
Which should be much faster *if* you actually find deltas.


So, this repo must be knocking several parts of Git's insides. I was 
curious about why it was so slow on the writing objects part, since the 
whole repo is on a 4x RAID 5, 7k spindels. Now, they are not SSDs sure, 
but the thing has ~400MB/s continuous throughput available.


iostat -m 5 showed trickle read/write to the process, and 80-100% CPU 
single thread (since the "write objects" stage is single threaded, 
obviously).


The failing delta must be triggering other negative behavior.



For example, the sorting code thinks that objects with the same name
across the history are good sources of deltas. But it may be that for
your case, the binary blobs that you have don't tend to actually
change in the history, so that heuristic doesn't end up doing
anything.


These are generally just DLLs (debug & release), which content is 
updated due to upstream project updates. So, filenames/paths tend to 
stay identical, while content changes throughout history.




The sorting does use the size and the type too, but the "filename
hash" (which isn't really a hash, it's something nasty to give
reasonable results for the case where files get renamed) is the main
sort key.

So you might well want to look at the sorting code too. If filenames
(particularly the end of filenames) for the blobs aren't good hints
for the sorting code, that sort might end up spreading all the blobs
out rather than sort them by size.


Filenames are fairly static, and the bulk of the 6000 biggest 
non-delta'ed blobs are the same DLLs (multiple of them)




And again, if that happens, the "can I delta these two objects" code
will notice that the size of the objects are wildly different and
won't even bother trying. Which speeds up the "compressing" phase, of
course, but then because you don't get any good deltas, the "writing
out" phase sucks donkey balls because it does zlib compression on big
objects and writes them out to disk.


Right, now on this machine, I really didn't notice much difference 
between standard zlib level and doing -9. The 203GB version was actually 
with zlib=9.




So there are certainly multiple possible reasons for the deltification
to not work well for you.

Hos sensitive is your material? Could you make a smaller repo with
some of the blobs that still show the symptoms? I don't think I want
to download 206GB of data even if my internet access is good.


Pretty sensitive, and not sure how I can reproduce this reasonable well. 
However, I can easily recompile git with any recommended 
instrumentation/printfs, if you have any suggestions of good places to 
start? If anyone have good file/line numbers, I'll give that a go, and 
report back?


Thanks!

--
.marius


Re: [PATCH 2/2] pretty: use fmt_output_email_subject()

2017-03-01 Thread René Scharfe

Am 01.03.2017 um 12:37 schrieb René Scharfe:

Add the email-style subject prefix (e.g. "Subject: [PATCH] ") directly
when it's needed instead of letting log_write_email_headers() prepare
it in a static buffer in advance.  This simplifies storage ownership and
code flow.

Signed-off-by: Rene Scharfe 
---
This slows down the last three tests in p4000 by ca. 3% for some reason,
so we may want to only do the first part for now, which is performance
neutral on my machine.


Update below.


diff --git a/commit.h b/commit.h
index 9c12abb911..459daef94a 100644
--- a/commit.h
+++ b/commit.h
@@ -142,21 +142,24 @@ static inline int cmit_fmt_is_mail(enum cmit_fmt fmt)
return (fmt == CMIT_FMT_EMAIL || fmt == CMIT_FMT_MBOXRD);
 }

+struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
+
 struct pretty_print_context {
/*
 * Callers should tweak these to change the behavior of pp_* functions.
 */
enum cmit_fmt fmt;
int abbrev;
-   const char *subject;
const char *after_subject;
int preserve_subject;
struct date_mode date_mode;
unsigned date_mode_explicit:1;
+   unsigned print_email_subject:1;


Turning this into an int restores performance according to p4000. 
Didn't know that bitfields can be *that* expensive.


René


Re: Delta compression not so effective

2017-03-01 Thread Marius Storm-Olsen

On 3/1/2017 14:19, Martin Langhoff wrote:

On Wed, Mar 1, 2017 at 8:51 AM, Marius Storm-Olsen  wrote:

BUT, even still, I would expect Git's delta compression to be quite effective, 
compared to the compression present in SVN.


jar files are zipfiles. They don't delta in any useful form, and in
fact they differ even if they contain identical binary files inside.


If you look through the initial post, you'll see that the jar in 
question is in fact a tool (BFG) by Roberto Tyley, which is basically 
git filter-branch on steroids. I used it to quickly filter out the 
extern/ folder, just to prove most of the original size stems from that 
particular folder. That's all.


The repo does not contain zip or jar files. A few images and other 
compressed formats (except a few 100MBs of proprietary files, which 
never change), but nothing unusual.




Commits: 32988
DB (server) size: 139GB


Are you certain of the on-disk storage at the SVN server? Ideally,
you've taken the size with a low-level tool like `du -sh
/path/to/SVNRoot`.


139GB is from 'du -sh' on the SVN server. I imported (via SubGit) 
directly from the (hotcopied) SVN folder on the server. So true SVN size.




Even with no delta compression (as per Junio and Linus' discussion),
based on past experience importing jar/wars/binaries from SVN into
git... I'd expect git's worst case to be on-par with SVN, perhaps ~5%
larger due to compression headers on uncompressible data.


Yes, I was expecting a Git repo <139GB, but like Linus mentioned, 
something must be knocking the delta search off its feet, so it bails 
out. Loose object -> 'hard' repack didn't show that much difference.



Thanks!

--
.marius


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Linus Torvalds
On Wed, Mar 1, 2017 at 12:34 PM, Jeff King  wrote:
>
> I don't think that helps. The sha1 over the pack-file takes about 1.3s
> with openssl, and 5s with sha1dc. So we already know the increase there
> is only a few seconds, not a few minutes.

Yeah, I did a few statistics by adding just logging of "SHA1_Init()"
calls. For that network clone situation, the call distribution is

  1SHA1: Init at builtin/index-pack.c:326
 841228SHA1: Init at builtin/index-pack.c:450
  2SHA1: Init at csum-file.c:152
4415756SHA1: Init at sha1_file.c:3218

(the line numbers are a bit off from 'pu', because I obviously have
the logging code).

The big number (one for every object) is from
write_sha1_file_prepare(), which we'd want to be the strong collision
checking version because those are things we're about to create git
objects out of. It's called from

 - hash_sha1_file() - doesn't actually write the object, but is used
to calculate the sha for incoming data after applying the delta, for
example.

 - write_sha1_file() - many uses, actually writes the object

 - hash_sha1_file_literally() - git hash-object

and that index-pack.c:450 is from unpack_entry_data() for the base
non-delta objects (which should also be the strong kind).

So all of them should check against collision attacks, so none of them
seem to be things you'd want to optimize away..

So I was wrong in thinking that there were a lot of unnecessary SHA1
calculations in that load. They all look like they should be done with
the slower checking code.

Oh well.

  Linus


Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-01 Thread Junio C Hamano
Mike Crowe  writes:

> With the above patch, both "git diff" and "git diff --quiet" report that
> there are no changes. Previously Git would report the extra newline
> correctly.

I sent an updated one that (I think) fixes the real issue, which the
extra would_convert_to_git() calls added in the older iteration to
diff_filespec_check_stat_unmatch() were merely papering over.

It would be nice to see if it fixes the issue for you.

Thanks.




Re: [PATCH 6/6 v5] revert.c: delegate handling of "-" shorthand to setup_revisions

2017-03-01 Thread Junio C Hamano
Siddharth Kannan  writes:

> revert.c:run_sequencer calls setup_revisions right after replacing "-" with
> "@{-1}" for this shorthand. A previous patch taught setup_revisions to handle
> this shorthand by doing the required replacement inside revision.c:get_sha1_1.
>
> Hence, the code here is redundant and has been removed.
>
> This patch also adds a test to check that revert recognizes the "-" shorthand.

Unlike "merge" [*1*], I think this one is OK because "git revert
$commit" does not try to say _how_ the commit was given, and most
importantly, it does not say what branch the reverted thing was.

Thanks.

[Footnote]

*1* Probably "checkout" would exhibit the same issue as we saw in
5/6 for "git merge" if you remove the "- to @{-1}" conversion
from it.




Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Jeff King
On Wed, Mar 01, 2017 at 03:05:25PM -0800, Linus Torvalds wrote:

> On Wed, Mar 1, 2017 at 2:51 PM, Johannes Schindelin
>  wrote:
> >
> > But I think bigger than just developers on Windows OS. There are many
> > developers out there working on large repositories (yes, much larger than
> > Linux). Also using Macs and Linux. I am not at all sure that we want to
> > give them an updated Git they cannot fail to notice to be much slower than
> > before.
> 
> Johannes, have you *tried* the patches?
> 
> I really don't think you have. It is completely unnoticeable in any
> normal situation. The two cases that it's noticeable is:
> 
>  - a full fsck is noticeable slower
> 
>  - a full non-local clone is slower (but not really noticeably so
> since the network traffic dominates).
> 
> In other words, I think you're making shit up. I don't think you
> understand how little the SHA1 performance actually matters. It's
> noticeable in benchmarks. It's not noticeable in any normal operation.
> 
> .. and yes, I've actually been running the patches locally since I
> posted my first version (which apparently didn't go out to the list
> because of list size limits) and now running the version in 'pu'.

You have to remember that some of the Git for Windows users are doing
horrific things like using repositories with 450MB .git/index files, and
the speed to compute the sha1 during an update is noticeable there.

IMHO that is a good sign that the right approach is to switch to an
index format that doesn't require rewriting all 450MB to update one
entry. But obviously that is a much harder problem than just using an
optimized sha1 implementation.

I do think that could argue for turning on the collision detection only
during object-write operations, which is where it matters. It would be
really trivial to flip the "check collisions" bit on sha1dc. But I
suspect you could go faster still by compiling against two separate
implementations: the fast-as-possible one (which could be openssl or
blk-sha1), and the slower-but-careful sha1dc.

-Peff


[PATCH] README: create HTTP/HTTPS links from URLs in Markdown

2017-03-01 Thread Eric Wong
Markdown supports automatic links by surrounding URLs with
angle brackets, as documented in


While we're at it, update URLs to avoid redirecting clients for
git-scm.com (by using HTTPS) and public-inbox.org (by adding a
trailing slash).

Signed-off-by: Eric Wong 
---
 I was going to cite some style manuals (MLA, APA, etc),
 but it seems current versions do not favor angle brackets.
 However, this remains consistent with the recommendations in
 RFC 2369  for mail headers.

 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index c0cd5580e..f17af66a9 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@ Torvalds with help of a group of hackers around the net.
 
 Please read the file [INSTALL][] for installation instructions.
 
-Many Git online resources are accessible from http://git-scm.com/
+Many Git online resources are accessible from 
 including full documentation and Git related tools.
 
 See [Documentation/gittutorial.txt][] to get started, then see
@@ -33,8 +33,8 @@ requests, comments and patches to git@vger.kernel.org (read
 [Documentation/SubmittingPatches][] for instructions on patch submission).
 To subscribe to the list, send an email with just "subscribe git" in
 the body to majord...@vger.kernel.org. The mailing list archives are
-available at https://public-inbox.org/git,
-http://marc.info/?l=git and other archival sites.
+available at ,
+ and other archival sites.
 
 The maintainer frequently sends the "What's cooking" reports that
 list the current status of various development topics to the mailing
-- 
EW


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Jeff King
On Wed, Mar 01, 2017 at 12:58:39PM -0800, Linus Torvalds wrote:

>> I don't think that helps. The sha1 over the pack-file takes about 1.3s
>> with openssl, and 5s with sha1dc. So we already know the increase there
>> is only a few seconds, not a few minutes.
> 
> OK. I guess what w could easily do is to just add an argument to
> git_SHA1_Init() to say whether we want checking or not, and only use the
> checking functions when receiving objects (which would include creating new
> objects from files, and obviously deck).
> 
> You'd still eat the cost on the receiving side of a clone, but that's when
> you really want the checking anyway. At least it wouldn't be so visible on
> the sending side, which is all the hosting etc, where there might be server
> utilization issues.
> 
> Would that make deployment happier? It should be an easy little flag to
> add, I think.

I don't think it makes all that big a difference. The sending side
wastes the extra 2-3 seconds of CPU to checksum the whole packfile, but
it's not inflating all the object contents in the first place (between
reachability bitmaps to get the list of objects in the first place, and
then verbatim reuse of pack contents).

Which isn't to say it's not reasonable to limit the checking to a few
spots (basically anything that's _writing_ objects). But I don't think
it makes a big difference to the server side of a fetch or clone.

-Peff


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Linus Torvalds
On Wed, Mar 1, 2017 at 2:51 PM, Johannes Schindelin
 wrote:
>
> But I think bigger than just developers on Windows OS. There are many
> developers out there working on large repositories (yes, much larger than
> Linux). Also using Macs and Linux. I am not at all sure that we want to
> give them an updated Git they cannot fail to notice to be much slower than
> before.

Johannes, have you *tried* the patches?

I really don't think you have. It is completely unnoticeable in any
normal situation. The two cases that it's noticeable is:

 - a full fsck is noticeable slower

 - a full non-local clone is slower (but not really noticeably so
since the network traffic dominates).

In other words, I think you're making shit up. I don't think you
understand how little the SHA1 performance actually matters. It's
noticeable in benchmarks. It's not noticeable in any normal operation.

.. and yes, I've actually been running the patches locally since I
posted my first version (which apparently didn't go out to the list
because of list size limits) and now running the version in 'pu'.

Linus


Re: What's cooking in git.git (Mar 2017, #01; Wed, 1)

2017-03-01 Thread Junio C Hamano
René Scharfe  writes:

> Am 01.03.2017 um 23:35 schrieb Junio C Hamano:
>> * rs/log-email-subject (2017-03-01) 2 commits
>>  - pretty: use fmt_output_email_subject()
>>  - log-tree: factor out fmt_output_email_subject()
>> 
>>  Code clean-up.
>> 
>>  Will merge to 'next'.
>
> Could you please squash this in?  We only use a single context (as
> opposed to an array), so it doesn't have to be especially compact,
> and using a bitfield slows down half of the tests in p4000 by 3%
> for me.

I thought I saw the keyword "bitfield" to the solution for that 3%
somewhere in the thread and forgot when I updated the "What's
cooking" report.

Will do.  Thanks for being careful.  

>
> ---
>  commit.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/commit.h b/commit.h
> index 459daef94a..528272ac9b 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -154,7 +154,7 @@ struct pretty_print_context {
>   int preserve_subject;
>   struct date_mode date_mode;
>   unsigned date_mode_explicit:1;
> - unsigned print_email_subject:1;
> + int print_email_subject;
>   int expand_tabs_in_log;
>   int need_8bit_cte;
>   char *notes_message;


[PATCH v2 0/2] Minor changes to skip gitweb tests without Time::HiRes

2017-03-01 Thread Ævar Arnfjörð Bjarmason
This was originally just one small patch, but Jakub Narębski pointed
out that calling the module "unusable" when we failed to load it was
confusing, so now the start of this series is just a rephrasing of an
existing error message I copied.

Ævar Arnfjörð Bjarmason (2):
  gitweb tests: Change confusing "skip_all" phrasing
  gitweb tests: Skip tests when we don't have Time::HiRes

 t/gitweb-lib.sh | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.11.0



Re: [PATCH 5/6 v5] merge.c: delegate handling of "-" shorthand to revision.c:get_sha1

2017-03-01 Thread Junio C Hamano
Junio C Hamano  writes:

> Siddharth Kannan  writes:
>
>> The callchain for handling each argument contains the function
>> revision.c:get_sha1 where the shorthand for "-" ~ "@{-1}" has already been
>> implemented in a previous patch; the complete callchain leading to that
>> function is:
>>
>> 1. merge.c:collect_parents
>> 2. commit.c:get_merge_parent : this function calls revision.c:get_sha1
>>
>> This patch also adds a test for checking that the shorthand works properly
>
> This breaks "git merge".
>
>> +test_expect_success 'merge - should work' '
>> +git checkout testing-2 &&
>> +git merge - &&
>> +git rev-parse HEAD HEAD^^ | sort >actual &&
>> +git rev-parse master testing-1 | sort >expect &&
>> +test_cmp expect actual
>
> This test is not sufficient to catch a regression I seem to be
> seeing.
>
>   $ git checkout side
>   $ git checkout pu
>   $ git merge -
>
> used to say "Merge branch 'side' into pu".  With this series merged,
> I seem to be getting "Merge commit '-' into pu".

You stopped at get_sha1_1() in your 3817cebabc ("sha1_name.c: teach
get_sha1_1 "-" shorthand for "@{-1}"", 2017-02-25), instead of going
down to get_sha1_basic() and teaching it that "-" means the same
thing as "@{-1}", which would in turn require you to teach
dwim_ref() that "-" is the same thing as "@{-1}".  As dwim_ref()
does not know about "-" and does not expand it to the refname like
it expands "@{-1}", it would break and that is why 3817cebabc punts
at a bit higher in the callchain.

The breakage by this patch to "git merge" happens for the same
reason.  cmd_merge() calls collect_parents() which annotates the
commits that are merged with their textual name, which used to be
"@{-1}" without this patch but now "-" is passed as-is.  This
annotation will be given to merge_name(), and the first thing it
does is dwim_ref().  The function knows what to do with "@{-1}",
but it does not know what to do with "-", and that is why you end up
producing "Merge commit '-' into ...".

Dropping this patch from the series would make things consistent
with what was done in 3817cebabc and I think that is a sensible
place to stop.  After the dust settles, We _can_ later dig further
by teaching dwim_ref() and friends what "-" means, and when it is
done, this patch would become useful.

Thanks.







Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Johannes Schindelin
Hi,

On Wed, 1 Mar 2017, Linus Torvalds wrote:

> On Wed, Mar 1, 2017 at 1:56 PM, Johannes Schindelin
>  wrote:
> > Footnote *1*: I know, it is easy to forget that some developers cannot
> > choose their tools, or even their hardware. In the past, we seemed to take
> > appropriate care, though.
> 
> I don't think you need to worry about the Windows side.

I am not. I build G?t for Windows using GCC.

My concern is about that unexpected turn "oh, let's just switch to C99
because, well, because my compiler canehandle it, and everybody else
should just switch tn a modern compiler". That really sounded careless.

> That can continue to do something else.
> 
> When I advocated perhaps using  USE_SHA1DC by default, I definitely did
> not mean it in a "everywhere, regardless of issues" manner.
> 
> For example, the conmfig.mak.uname script already explicitly asks for
> "BLK_SHA1 = YesPlease" for Windows. Don't bother changing that, it's an
> explicit choice.

That setting is only in git.git's version, not in gxt-for-windows/git.git.
We switched to OpenSSL because of speed improvements, in particular with
recent Intel processors.

> But the Linux rules don't actually specify which SHA1 version to use,
> so the main Makefile currently defaults to just using openssl.
> 
> So that's the "default" choice I think we might want to change. Not
> the "we're windows, and explicitly want BLK_SHA1 because of
> environment and build infrastructure".

Since we switched away from BLOCK_SHA1, any such change would affect Git
for Windews.

But I think bigger than just developers on Windows OS. There are many
developers out there working on large repositories (yes, much larger than
Linux). Also using Macs and Linux. I am not at all sure that we want to
give them an updated Git they cannot fail to notice to be much slower than
before.

Don't get me wrong: I *hope* that you'll manage to get sha1dc
competitively fast. If you don't, well, then we simply cannot use it by
default for *all* of our calls (you already pointed out that the pack
index' checksum does not need collision detection, and in fact, *any*
operation that works on implicitly trusted data falls into the same court,
e.g. `git add`).

Ciao,
Johannes


What's cooking in git.git (Mar 2017, #01; Wed, 1)

2017-03-01 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

Note: some may have sent updates to topics that are marked as "will
merge to 'next'" that cross with this message.  I'll try to pick them
up before merging the topics to 'next', but mistakes happen X-< and
it appears at least at my end that the latency of messages sent to
vger reaching recipients has got longer in the past few weeks?

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* jk/interpret-branch-name (2017-02-28) 8 commits
 - checkout: restrict @-expansions when finding branch
 - strbuf_check_ref_format(): expand only local branches
 - branch: restrict @-expansions when deleting
 - t3204: test git-branch @-expansion corner cases
 - interpret_branch_name: allow callers to restrict expansions
 - strbuf_branchname: add docstring
 - strbuf_branchname: drop return value
 - interpret_branch_name: move docstring to header file

 "git branch @" created refs/heads/@ as a branch, and in general the
 code that handled @{-1} and @{upstream} was a bit too loose in
 disambiguating.

 Expecting a reroll.
 cf. <20170228123331.wubqplp5zjwzz...@sigill.intra.peff.net>


* jk/sha1dc (2017-03-01) 7 commits
 - Put sha1dc on a diet
 - sha1dc: avoid 'for' loop initial decl
 - sha1dc: resurrect LICENSE file
 - sha1dc: avoid c99 declaration-after-statement
 - Makefile: add USE_SHA1DC knob
 - sha1dc: adjust header includes for git
 - add collision-detecting sha1 implementation

 Borrow "detect attempt to create collisions" variant of SHA-1
 implementation by Marc Stevens (CWI) and Dan Shumow (Microsoft).

 Expecting a cleaned-up reroll after discussion settles.
 cf. 


* js/travis-32bit-linux (2017-02-28) 1 commit
 - Travis: also test on 32-bit Linux

 Add 32-bit Linux variant to the set of platforms to be tested with
 Travis CI.

 Will merge to 'next'.


* jt/http-base-url-update-upon-redirect (2017-02-28) 1 commit
 - http: attempt updating base URL only if no error

 When a redirected http transport gets an error during the
 redirected request, we ignored the error we got from the server,
 and ended up giving a not-so-useful error message.

 Will merge to 'next'.


* jt/mark-tree-uninteresting-for-uninteresting-commit (2017-02-28) 3 commits
 - upload-pack: compute blob reachability correctly
 - revision: exclude trees/blobs given commit
 - revision: unify {tree,blob}_objects in rev_info

 The revision/object traversal machinery did not mark all tree and
 blob objects that are contained in an uninteresting commit as
 uninteresting, because that is quite costly.  Instead, it only
 marked those that are contained in an uninteresting boundary commit
 as uninteresting.


* ps/docs-diffcore (2017-02-28) 2 commits
 - docs/diffcore: unquote "Complete Rewrites" in headers
 - docs/diffcore: fix grammar in diffcore-rename header

 Doc update.

 Will merge to 'next'.


* rj/remove-unused-mktemp (2017-02-28) 2 commits
 - wrapper.c: remove unused gitmkstemps() function
 - wrapper.c: remove unused git_mkstemp() function

 Code cleanup.

 Will merge to 'next'.


* sb/submodule-init-url-selection (2017-02-28) 1 commit
 - submodule init: warn about falling back to a local path

 Give a warning when "git submodule init" decides that the submodule
 in the working tree is its upstream, as it is not a very common
 setup.

 Will merge to 'next'.


* jc/diff-populate-filespec-size-only-fix (2017-03-01) 1 commit
 - diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()

 "git diff --quiet" relies on the size field in diff_filespec to be
 correctly populated, but diff_populate_filespec() helper function
 made an incorrect short-cut when asked only to populate the size
 field for paths that need to go through convert_to_git() (e.g. CRLF
 conversion).

 Will merge to 'next'.


* nd/conditional-config-include (2017-03-01) 4 commits
 - SQUASH???
 - config: add conditional include
 - config.txt: reflow the second include.path paragraph
 - config.txt: clarify multiple key values in include.path

 The configuration file learned a new "includeIf..path"
 that includes the contents of the given path only when the
 condition holds.  This allows you to say "include this work-related
 bit only in the repositories under my ~/work/ directory".

 I think this is almost ready for 'next'.


* rs/log-email-subject (2017-03-01) 2 commits
 - pretty: use fmt_output_email_subject()
 - log-tree: factor out fmt_output_email_subject()

 Code clean-up.

 Will merge to 'next'.

--
[Stalled]

* 

Re: What's cooking in git.git (Mar 2017, #01; Wed, 1)

2017-03-01 Thread René Scharfe
Am 01.03.2017 um 23:35 schrieb Junio C Hamano:
> * rs/log-email-subject (2017-03-01) 2 commits
>  - pretty: use fmt_output_email_subject()
>  - log-tree: factor out fmt_output_email_subject()
> 
>  Code clean-up.
> 
>  Will merge to 'next'.

Could you please squash this in?  We only use a single context (as
opposed to an array), so it doesn't have to be especially compact,
and using a bitfield slows down half of the tests in p4000 by 3%
for me.

---
 commit.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit.h b/commit.h
index 459daef94a..528272ac9b 100644
--- a/commit.h
+++ b/commit.h
@@ -154,7 +154,7 @@ struct pretty_print_context {
int preserve_subject;
struct date_mode date_mode;
unsigned date_mode_explicit:1;
-   unsigned print_email_subject:1;
+   int print_email_subject;
int expand_tabs_in_log;
int need_8bit_cte;
char *notes_message;
-- 
2.12.0



Re: [PATCH v2 0/2] Minor changes to skip gitweb tests without Time::HiRes

2017-03-01 Thread Junio C Hamano
Thanks, will replace what has been on 'pu'.

Note that I'd dropped a double-SP from the latter one while queuing.


Re: [PATCH v8 4/6] stash: teach 'push' (and 'create_stash') to honor pathspec

2017-03-01 Thread Thomas Gummerer
On 02/28, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > +   git reset ${GIT_QUIET:+-q} -- "$@"
> > +   git ls-files -z --modified -- "$@" |
> > +   git checkout-index -z --force --stdin
> > +   git checkout ${GIT_QUIET:+-q} HEAD -- $(git ls-files -z 
> > --modified "$@")
> 
> I think you forgot to remove this line, whose correction was added
> as two lines immediately before it.  I'll remove it while queuing.

Yes, sorry.  What you queued looks good to me, thanks!

> > +   git clean --force ${GIT_QUIET:+-q} -d -- "$@"
> 
> Thanks.


[PATCH v2 1/2] gitweb tests: Change confusing "skip_all" phrasing

2017-03-01 Thread Ævar Arnfjörð Bjarmason
Change the phrasing so that instead of saying that the CGI module is
unusable, we say that it's not available.

This came up on the git mailing list in
<4b34e3a0-3da7-d821-2a7f-9a420ac1d...@gmail.com> from Jakub Narębski.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/gitweb-lib.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index d5dab5a94f..59ef15efbd 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -110,7 +110,7 @@ perl -MEncode -e '$e="";decode_utf8($e, Encode::FB_CROAK)' 
>/dev/null 2>&1 || {
 }
 
 perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 || {
-   skip_all='skipping gitweb tests, CGI module unusable'
+   skip_all='skipping gitweb tests, CGI & CGI::Util & CGI::Carp modules 
not available'
test_done
 }
 
-- 
2.11.0



Re: [PATCH 01/10] submodule: decouple url and submodule existence

2017-03-01 Thread Stefan Beller
IIRC most of the series is actually refactoring, i.e.
providing a central function that answers
"is this submodule active/initialized?" and then making use of this
function.

Maybe it would be easier to reroll these refactorings first without adding
in the change of behavior.

Off the list we talked about different models how to indicate that
a submodule is active.

Current situation:
submodule..URL is used as a boolean flag

Possible future A:
1) If submodule.active exists use that
2) otherwise go be individual .URL configs

submodule.active is a config that contains a pathspec,
i.e. specifies the group of submodules instead of marking
each submodule individually.

How to get to future A:
* apply this patch series

Possible future B:
1) the boolean submodule..exists is used
if existent
2) If that boolean is missing we'd respect a grouping
   feature such as submodule.active
3) fallback to the .URL as a boolean indicator.

How to get to future B:
1) possibly take the refactoring part from this series
2) implement the .exists flag (that can solve the worktree
  problem, as then we don't have to abuse the .URL
  as a boolean indicator any more.)
3) implement the grouping feature that takes precedence
  over .URL, but yields precedence to the .exists flag.
  (This would solve the grouping problem)

By having two different series (2) and (3) we'd solve
just one thing at a time with each series, such that
this seems easier to understand and review.

The question is if this future B is also easier to use for
users and I think it would be, despite being technically more
complex.

Most users only have a few submodules. Also the assumption
is to have either interest in very few specific submodules
or all of them. For both of these cases the .exists is a good idea
as it is very explicit, but verbose.

The grouping feature would help in the case of lots of
submodules, e.g. to select all of them you would just give
an easy pathspec "." and that would help going forward
as by such a submodule spec all new submodules would
be "auto-on".

Possible future C:
Similar to B, but with branch specific configuration.
submodule..active would work as a
grouping feature. I wonder how it would work with
the .exists flag.

Stefan


Re: [PATCH 5/6 v5] merge.c: delegate handling of "-" shorthand to revision.c:get_sha1

2017-03-01 Thread Junio C Hamano
Siddharth Kannan  writes:

> The callchain for handling each argument contains the function
> revision.c:get_sha1 where the shorthand for "-" ~ "@{-1}" has already been
> implemented in a previous patch; the complete callchain leading to that
> function is:
>
> 1. merge.c:collect_parents
> 2. commit.c:get_merge_parent : this function calls revision.c:get_sha1
>
> This patch also adds a test for checking that the shorthand works properly

This breaks "git merge".

> +test_expect_success 'merge - should work' '
> +git checkout testing-2 &&
> +git merge - &&
> +git rev-parse HEAD HEAD^^ | sort >actual &&
> +git rev-parse master testing-1 | sort >expect &&
> +test_cmp expect actual

This test is not sufficient to catch a regression I seem to be
seeing.

$ git checkout side
$ git checkout pu
$ git merge -

used to say "Merge branch 'side' into pu".  With this series merged,
I seem to be getting "Merge commit '-' into pu".


Re: [PATCH v8 4/6] stash: teach 'push' (and 'create_stash') to honor pathspec

2017-03-01 Thread Junio C Hamano
Thomas Gummerer  writes:

> On 02/28, Junio C Hamano wrote:
>> Thomas Gummerer  writes:
>> 
>> > +  git reset ${GIT_QUIET:+-q} -- "$@"
>> > +  git ls-files -z --modified -- "$@" |
>> > +  git checkout-index -z --force --stdin
>> > +  git checkout ${GIT_QUIET:+-q} HEAD -- $(git ls-files -z 
>> > --modified "$@")
>> 
>> I think you forgot to remove this line, whose correction was added
>> as two lines immediately before it.  I'll remove it while queuing.
>
> Yes, sorry.  What you queued looks good to me, thanks!

Thanks for double-checking, and thanks for working on the topic.  I
think this is ready for 'next'.


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Linus Torvalds
On Wed, Mar 1, 2017 at 1:56 PM, Johannes Schindelin
 wrote:
> Footnote *1*: I know, it is easy to forget that some developers cannot
> choose their tools, or even their hardware. In the past, we seemed to take
> appropriate care, though.

I don't think you need to worry about the Windows side. That can
continue to do something else.

When I advocated perhaps using  USE_SHA1DC by default, I definitely
did not mean it in a "everywhere, regardless of issues" manner.

For example, the conmfig.mak.uname script already explicitly asks for
"BLK_SHA1 = YesPlease" for Windows. Don't bother changing that, it's
an explicit choice.

But the Linux rules don't actually specify which SHA1 version to use,
so the main Makefile currently defaults to just using openssl.

So that's the "default" choice I think we might want to change. Not
the "we're windows, and explicitly want BLK_SHA1 because of
environment and build infrastructure".

 Linus


[PATCH v2 2/2] gitweb tests: Skip tests when we don't have Time::HiRes

2017-03-01 Thread Ævar Arnfjörð Bjarmason
Change the gitweb tests to skip when we can't load the Time::HiRes
module.

Gitweb needs this module to work. It has been in perl core since v5.8,
which is the oldest version we support. However CentOS (and perhaps
some other distributions) carve it into its own non-core-perl package
that's not installed along with /usr/bin/perl by default. Without this
we'll hard fail the gitweb tests when trying to load the module.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/gitweb-lib.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 59ef15efbd..b7a73874e7 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 || {
test_done
 }
 
+perl -mTime::HiRes -e 0  >/dev/null 2>&1 || {
+   skip_all='skipping gitweb tests, Time::HiRes module not available'
+   test_done
+}
+
 gitweb_init
-- 
2.11.0



Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Johannes Schindelin
Hi,

On Wed, 1 Mar 2017, Junio C Hamano wrote:

> Linus Torvalds  writes:
> 
> > That said, I think that it would be lovely to just default to
> > USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
> > No, it doesn't really seem to matter that much in practice.
> 
> Yes.  It would be a very good goal.

So let me get this straight: not only do we now implicitly want to bump
the required C compiler to C99 without any grace period worth mentioning
[*1*], we are also all of a sudden no longer worried about a double digit
percentage drop of speed [*2*]?

Puzzled,
Johannes

Footnote *1*: I know, it is easy to forget that some developers cannot
choose their tools, or even their hardware. In the past, we seemed to take
appropriate care, though.

Footnote *2*: With real-world repositories of notable size, that
performance regression hurts. A lot. We just spent time to get the speed
of SHA-1 down by a couple percent and it was a noticeable improvement here.


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Junio C Hamano
On Wed, Mar 1, 2017 at 1:56 PM, Johannes Schindelin
 wrote:
> Hi,
>
> On Wed, 1 Mar 2017, Junio C Hamano wrote:
>
>> Linus Torvalds  writes:
>>
>> > That said, I think that it would be lovely to just default to
>> > USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
>> > No, it doesn't really seem to matter that much in practice.
>>
>> Yes.  It would be a very good goal.
>
> So let me get this straight: not only do we now implicitly want to bump
> the required C compiler to C99 without any grace period worth mentioning
> [*1*], we are also all of a sudden no longer worried about a double digit
> percentage drop of speed [*2*]?

Before we get the code into shape suitable for 'next', it is more important to
make sure it operates correctly, adding necessary features if any (e.g. "hash
with or without check" knob) while it is in 'pu', and *1* is to allow
it to progress
faster without having to worry about something we could do mechanically
before making it ready for 'next'.

The performance thing is really "let's see how well it goes". With effort to
optimize still "just has began", I think it is too early to tell if
Linus's "doesn't
really seem to matter" is the case or not.

Queuing such a topic on 'pu' is one effective way to make sure people are
working off of the same codebase.


Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-01 Thread Junio C Hamano
Now I thought about it through a bit more thoroughly, I think this
is the right approach, so here is my (tenative) final version.

I seem to be getty really rusty---after all the codepaths involved
are practically all my code and I should have noticed the real
culprit during my first attempt X-<.

Thanks for helping.

-- >8 --
Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in 
diff_populate_filespec()

Callers of diff_populate_filespec() can choose to ask only for the
size of the blob without grabbing the blob data, and the function,
after running lstat() when the filespec points at a working tree
file, returns by copying the value in size field of the stat
structure into the size field of the filespec when this is the case.

However, this short-cut cannot be taken if the contents from the
path needs to go through convert_to_git(), whose resulting real blob
data may be different from what is in the working tree file.

As "git diff --quiet" compares the .size fields of filespec
structures to skip content comparison, this bug manifests as a
false "there are differences" for a file that needs eol conversion,
for example.

Reported-by: Mike Crowe 
Helped-by: Torsten Bögershausen 
Signed-off-by: Junio C Hamano 
---
 diff.c| 19 ++-
 t/t0028-diff-converted.sh | 27 +++
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100755 t/t0028-diff-converted.sh

diff --git a/diff.c b/diff.c
index 8c78fce49d..dc51dceb44 100644
--- a/diff.c
+++ b/diff.c
@@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, 
unsigned int flags)
s->should_free = 1;
return 0;
}
-   if (size_only)
+
+   /*
+* Even if the caller would be happy with getting
+* only the size, we cannot return early at this
+* point if the path requires us to run the content
+* conversion.
+*/
+   if (!would_convert_to_git(s->path) && size_only)
return 0;
+
+   /*
+* Note: this check uses xsize_t(st.st_size) that may
+* not be the true size of the blob after it goes
+* through convert_to_git().  This may not strictly be
+* correct, but the whole point of big_file_threashold
+* and is_binary check being that we want to avoid
+* opening the file and inspecting the contents, this
+* is probably fine.
+*/
if ((flags & CHECK_BINARY) &&
s->size > big_file_threshold && s->is_binary == -1) {
s->is_binary = 1;
diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
new file mode 100755
index 00..3d5ab9565b
--- /dev/null
+++ b/t/t0028-diff-converted.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Mike Crowe
+#
+# These tests ensure that files changing line endings in the presence
+# of .gitattributes to indicate that line endings should be ignored
+# don't cause 'git diff' or 'git diff --quiet' to think that they have
+# been changed.
+
+test_description='git diff with files that require CRLF conversion'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+   echo "* text=auto" >.gitattributes &&
+   printf "Hello\r\nWorld\r\n" >crlf.txt &&
+   git add .gitattributes crlf.txt &&
+   git commit -m "initial"
+'
+
+test_expect_success 'quiet diff works on file with line-ending change that has 
no effect on repository' '
+   printf "Hello\r\nWorld\n" >crlf.txt &&
+   git status &&
+   git diff --quiet
+'
+
+test_done
-- 
2.12.0-319-gc5f21175ee



Re: SHA1 collisions found

2017-03-01 Thread Jeff King
On Tue, Feb 28, 2017 at 03:11:32PM -0800, Linus Torvalds wrote:

> > Of course for dedicated code this can be simplified, and some parts
> > could be further optimized.
> 
> So I'd be worried about changing your tested code too much, since the
> only test-cases we have are the two pdf files. If we screw up too
> much, those will no longer show as collisions, but we could get tons
> of false positives that we wouldn't see, so..

I can probably help with collecting data for that part on GitHub.

I don't have an exact count of how many sha1 computations we do in a
day, but it's...a lot. Obviously every pushed object gets its sha1
computed, but read operations also cover every commit and tree via
parse_object() (though I think most of the blob reads do not).

So it would be trivial to start by swapping out the "die()" on collision
with something that writes to a log. This is the slow path that we don't
expect to trigger at all, so log volume shouldn't be a problem.

I've been waiting to see how speedups develop before deploying it in
production.

-Peff


Re: [PATCH v4 17/22] read-cache: unlink old sharedindex files

2017-03-01 Thread Junio C Hamano
Christian Couder  writes:

> +static int can_delete_shared_index(const char *shared_index_path)
> +{
> + struct stat st;
> + unsigned long expiration;
> +
> + /* Check timestamp */
> + expiration = get_shared_index_expire_date();
> + if (!expiration)
> + return 0;
> + if (stat(shared_index_path, ))
> + return error_errno(_("could not stat '%s"), shared_index_path);
> + if (st.st_mtime > expiration)
> + return 0;
> +
> + return 1;
> +}
> +
> +static int clean_shared_index_files(const char *current_hex)
> +{
> + struct dirent *de;
> + DIR *dir = opendir(get_git_dir());
> +
> + if (!dir)
> + return error_errno(_("unable to open git dir: %s"), 
> get_git_dir());
> +
> + while ((de = readdir(dir)) != NULL) {
> + const char *sha1_hex;
> + const char *shared_index_path;
> + if (!skip_prefix(de->d_name, "sharedindex.", _hex))
> + continue;
> + if (!strcmp(sha1_hex, current_hex))
> + continue;
> + shared_index_path = git_path("%s", de->d_name);
> + if (can_delete_shared_index(shared_index_path) > 0 &&

Is this "can" or "should"?  This sounds like the latter.

> + unlink(shared_index_path))
> + error_errno(_("unable to unlink: %s"), 
> shared_index_path);

This does not make the entire operation to fail (and I think the
behaviour you have here is preferrable--we just want to report
without failing the main operation).

But should it be reported as "error: unable to unlink"?  It may be
better to give this message as a warning.


Re: [PATCH v4 00/22] Add configuration options for split-index

2017-03-01 Thread Junio C Hamano
Christian Couder  writes:

> Highlevel view of the patches in the series
> ~~~
>
> Except for patch 1/22 and 1/22, there are 3 big steps, one for each
> new configuration variable introduced.
>
> There only small differences between this patch series and the v3
> patch series sent a few months ago.
>
> - Patch 1/22 marks a message for translation. It is not new and
>   can be applied separately.
>
> - Patch 2/22 improves the existing indentation style of t1700 by
>   using different here document style. It is a new preparatory
>   patch suggested by Junio.

OK.  I read interdiff against the previous round carefully, and
skimmed all patches less carefully.  

I may have comments on individual patches later, but this round
looked mostly sensible.

Thanks for following it through.



Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Jeff King
On Wed, Mar 01, 2017 at 12:14:34PM -0800, Linus Torvalds wrote:

> > My biggest concern is the index-pack operation. Try this:
> 
> I'm mobile right now, so I can't test, but I'd this perhaps at least partly
> due to the full checksum over the pack-file?
>
> We have two very different uses of SHA1: the actual object name hash, but
> also the sha1file checksums that we do on the index file and the pack files.
>
> And the checksum code really doesn't need the collision checking at all.

I don't think that helps. The sha1 over the pack-file takes about 1.3s
with openssl, and 5s with sha1dc. So we already know the increase there
is only a few seconds, not a few minutes.

And it makes sense if you think about the index-pack operation. It has
to inflate each object, resolving deltas, and checksum the result. And
the number of inflated bytes is _much_ larger than the on-disk bytes.
You can see the difference with:

  git cat-file --batch-all-objects \
--batch-check='%(objectsize:disk) %(objectsize)' |
  perl -alne '
$disk += $F[0]; $raw += $F[1];
END { print "$disk $raw" }
  '

On linux.git that yields:

  1210521959 63279680406

That's over a 50x increase in the bytes we have to sha1 for objects
versus pack-checksums.

-Peff


Re: [PATCH v4 15/22] read-cache: touch shared index files when used

2017-03-01 Thread Junio C Hamano
Christian Couder  writes:

> +/*
> + * Signal that the shared index is used by updating its mtime.
> + *
> + * This way, shared index can be removed if they have not been used
> + * for some time.
> + */
> +static void freshen_shared_index(char *base_sha1_hex, int warn)
> +{
> + const char *shared_index = git_path("sharedindex.%s", base_sha1_hex);
> + if (!check_and_freshen_file(shared_index, 1) && warn)
> + warning("Could not freshen shared index '%s'", shared_index);

I'll downcase "Could" to match existing convention (no need to
resend only to update this one).


Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-01 Thread Mike Crowe
On Wednesday 01 March 2017 at 18:04:44 +0100, tbo...@web.de wrote:
> From: Junio C Hamano 
> 
> git diff --quiet may take a short-cut to see if a file is changed
> in the working tree:
> Whenever the file size differs from what is recorded in the index,
> the file is assumed to be changed and git diff --quiet returns
> exit with code 1
> 
> This shortcut must be suppressed whenever the line endings are converted
> or a filter is in use.
> The attributes say "* text=auto" and a file has
> "Hello\nWorld\n" in the index with a length of 12.
> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
> (Or even "Hello\r\nWorld\n").
> In this case "git add" will not do any changes to the index, and
> "git diff -quiet" should exit 0.
> 
> Add calls to would_convert_to_git() before blindly saying that a different
> size means different content.
> 
> Reported-By: Mike Crowe 
> Signed-off-by: Torsten Bögershausen 
> ---
> This is what I can come up with, collecting all the loose ends.
> I'm not sure if Mike wan't to have the Reported-By with a
> Signed-off-by ?
> The other question is, if the commit message summarizes the discussion
> well enough ?
> 
> diff.c| 18 ++
>  t/t0028-diff-converted.sh | 27 +++
>  2 files changed, 41 insertions(+), 4 deletions(-)
>  create mode 100755 t/t0028-diff-converted.sh
> 
> diff --git a/diff.c b/diff.c
> index 051761b..c264758 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct 
> diff_filepair *p)
>*differences.
>*
>* 2. At this point, the file is known to be modified,
> -  *with the same mode and size, and the object
> -  *name of one side is unknown.  Need to inspect
> -  *the identical contents.
> +  *with the same mode and size, the object
> +  *name of one side is unknown, or size comparison
> +  *cannot be depended upon.  Need to inspect the
> +  *contents.
>*/
>   if (!DIFF_FILE_VALID(p->one) || /* (1) */
>   !DIFF_FILE_VALID(p->two) ||
> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct 
> diff_filepair *p)
>   (p->one->mode != p->two->mode) ||
>   diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
>   diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
> - (p->one->size != p->two->size) ||
> +
> + /*
> +  * only if eol and other conversions are not involved,
> +  * we can say that two contents of different sizes
> +  * cannot be the same without checking their contents.
> +  */
> + (!would_convert_to_git(p->one->path) &&
> +  !would_convert_to_git(p->two->path) &&
> +  (p->one->size != p->two->size)) ||
> +
>   !diff_filespec_is_identical(p->one, p->two)) /* (2) */
>   p->skip_stat_unmatch_result = 1;
>   return p->skip_stat_unmatch_result;
> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
> new file mode 100755
> index 000..3d5ab95
> --- /dev/null
> +++ b/t/t0028-diff-converted.sh
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2017 Mike Crowe
> +#
> +# These tests ensure that files changing line endings in the presence
> +# of .gitattributes to indicate that line endings should be ignored
> +# don't cause 'git diff' or 'git diff --quiet' to think that they have
> +# been changed.
> +
> +test_description='git diff with files that require CRLF conversion'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> + echo "* text=auto" >.gitattributes &&
> + printf "Hello\r\nWorld\r\n" >crlf.txt &&
> + git add .gitattributes crlf.txt &&
> + git commit -m "initial"
> +'
> +
> +test_expect_success 'quiet diff works on file with line-ending change that 
> has no effect on repository' '
> + printf "Hello\r\nWorld\n" >crlf.txt &&
> + git status &&
> + git diff --quiet
> +'
> +
> +test_done

Hi Torsten,

Thanks for investigating this.

I think that you've simplified the test to the point where it doesn't
entirely prove the fix. Although you test the case where the file has
changed size, you don't test the case where it hasn't.

Unfortunately that was the part of my test that could only reproduce the
problem with the sleeps. Maybe someone who understands how the cache works
fully could explain an alternative way to force the cache not to be used.

Also, I think I've found a behaviour change with this fix. Consider:

 echo "* text=auto" >.gitattributes
 printf "Hello\r\nWorld\r\n" >crlf.txt
 git add .gitattributes crlf.txt
 git commit -m "initial"

 printf "\r\n" >>crlf.txt

With the above patch, both "git diff" and "git diff --quiet" report that
there are no changes. Previously Git would report the extra newline
correctly.

Mike.


Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-01 Thread Junio C Hamano
tbo...@web.de writes:

> From: Junio C Hamano 
>
> git diff --quiet may take a short-cut to see if a file is changed
> in the working tree:
> Whenever the file size differs from what is recorded in the index,
> the file is assumed to be changed and git diff --quiet returns
> exit with code 1
>
> This shortcut must be suppressed whenever the line endings are converted
> or a filter is in use.
> The attributes say "* text=auto" and a file has
> "Hello\nWorld\n" in the index with a length of 12.
> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
> (Or even "Hello\r\nWorld\n").
> In this case "git add" will not do any changes to the index, and
> "git diff -quiet" should exit 0.

The thing I find the most disturbing is that at this point in the
flow, p->one->size and p->two->size are supposed to be the sizes of
the blob object, not the contents of the file on the working tree.
IOW, p->two->size being 14 in the above example sounds like pointing
at a different bug, if it is 14.  

The early return in diff_populate_filespec(), where it does

s->size = xsize_t(st.st_size);
...
if (size_only)
return 0;

way before it runs convert_to_git(), may be the real culprit.

I am wondering if the real fix would be to do this, instead of the
two extra would_convert_to_git() call there in the patch you sent.
The result seems to still pass the new test in your patch.

Thanks for helping.

 diff.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 8c78fce49d..dc51dceb44 100644
--- a/diff.c
+++ b/diff.c
@@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, 
unsigned int flags)
s->should_free = 1;
return 0;
}
-   if (size_only)
+
+   /*
+* Even if the caller would be happy with getting
+* only the size, we cannot return early at this
+* point if the path requires us to run the content
+* conversion.
+*/
+   if (!would_convert_to_git(s->path) && size_only)
return 0;
+
+   /*
+* Note: this check uses xsize_t(st.st_size) that may
+* not be the true size of the blob after it goes
+* through convert_to_git().  This may not strictly be
+* correct, but the whole point of big_file_threashold
+* and is_binary check is that we want to avoid
+* opening the file and inspecting the contents, so
+* this is probably fine.
+*/
if ((flags & CHECK_BINARY) &&
s->size > big_file_threshold && s->is_binary == -1) {
s->is_binary = 1;


Re: Delta compression not so effective

2017-03-01 Thread Martin Langhoff
On Wed, Mar 1, 2017 at 8:51 AM, Marius Storm-Olsen  wrote:
> BUT, even still, I would expect Git's delta compression to be quite 
> effective, compared to the compression present in SVN.

jar files are zipfiles. They don't delta in any useful form, and in
fact they differ even if they contain identical binary files inside.

> Commits: 32988
> DB (server) size: 139GB

Are you certain of the on-disk storage at the SVN server? Ideally,
you've taken the size with a low-level tool like `du -sh
/path/to/SVNRoot`.

Even with no delta compression (as per Junio and Linus' discussion),
based on past experience importing jar/wars/binaries from SVN into
git... I'd expect git's worst case to be on-par with SVN, perhaps ~5%
larger due to compression headers on uncompressible data.

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: Delta compression not so effective

2017-03-01 Thread Martin Langhoff
On Wed, Mar 1, 2017 at 1:30 PM, Linus Torvalds
 wrote:
> For example, the sorting code thinks that objects with the same name
> across the history are good sources of deltas.

Marius has indicated he is working with jar files. IME jar and war
files, which are zipfiles containing Java bytecode, range from not
delta-ing in a useful fashion, to pretty good deltas.

Depending on the build process (hi Maven!) there can be enough
variance in the build metadata to throw all the compression machinery
off.

On a simple Maven-driven project I have at hand, two .war files
compiled from the same codebase compressed really well in git. I've
also seen projects where storage space is ~101% of the "uncompressed"
size.

my 2c,



m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: git status --> Out of memory, realloc failed

2017-03-01 Thread Carsten Fuchs

Hi René,

Am 01.03.2017 um 11:02 schrieb René Scharfe:

I use Git at a web hosting service, where my user account has a memory
limit of 768 MB:

(uiserver):p7715773:~$ uname -a
Linux infongp-de15 3.14.0-ui16322-uiabi1-infong-amd64 #1 SMP Debian
3.14.79-2~ui80+4 (2016-11-17) x86_64 GNU/Linux


What's the output of "ulimit -a"?


(uiserver):p7715773:~$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 1
file size   (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 512
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) 1800
max user processes  (-u) 42
virtual memory  (kbytes, -v) 786432
file locks  (-x) unlimited


(uiserver):p7715773:~$ git --version
git version 2.1.4


That's quite old.  Can you try a more recent version easily (2.12.0 just came 
out)?


I don't think that they have any newer version available, having just upgraded from Git 
1.7 a couple of weeks ago... (1und1)


Git 1.7 used to work for me in the same environment, but iirc they also said they 
switched from 32-bit to 64-bit edition and blame the increased memory consumption on 
that change.


I'll ask for a newer version, but I'd be surprised if this happened.


The repository is tracking about 19000 files which together take 260 MB.
The git server version is 2.7.4.1.g5468f9e (Bitbucket)


Is your repository publicly accessible?


Unfortunately, no. There are no big secrets in there, but just a couple of database 
details so that I cannot make it universally available. I can gladly give you access 
though. (E.g. by adding your public SSH key?)


Best regards,
Carsten


Re: Delta compression not so effective

2017-03-01 Thread Linus Torvalds
On Wed, Mar 1, 2017 at 9:57 AM, Marius Storm-Olsen  wrote:
>
> Indeed, I did do a
> -c pack.threads=20 --window-memory=6g
> to 'git repack', since the machine is a 20-core (40 threads) machine with
> 126GB of RAM.
>
> So I guess with these sized objects, even at 6GB per thread, it's not enough
> to get a big enough Window for proper delta-packing?

Hmm. The 6GB window should be plenty good enough, unless your blobs
are in the gigabyte range too.

> This repo took >14hr to repack on 20 threads though ("compression" step was
> very fast, but stuck 95% of the time in "writing objects"), so I can only
> imagine how long a pack.threads=1 will take :)

Actually, it's usually the compression phase that should be slow - but
if something is limiting finding deltas (so that we abort early), then
that would certainly tend to speed up compression.

The "writing objects" phase should be mainly about the actual IO.
Which should be much faster *if* you actually find deltas.

> But arent't the blobs sorted by some metric for reasonable delta-pack
> locality, so even with a 6GB window it should have seen ~25 similar objects
> to deltify against?

Yes they are. The sorting for delta packing tries to make sure that
the window is effective. However, the sorting is also just a
heuristic, and it may well be that your repository layout ends up
screwing up the sorting, so that the windows just work very badly.

For example, the sorting code thinks that objects with the same name
across the history are good sources of deltas. But it may be that for
your case, the binary blobs that you have don't tend to actually
change in the history, so that heuristic doesn't end up doing
anything.

The sorting does use the size and the type too, but the "filename
hash" (which isn't really a hash, it's something nasty to give
reasonable results for the case where files get renamed) is the main
sort key.

So you might well want to look at the sorting code too. If filenames
(particularly the end of filenames) for the blobs aren't good hints
for the sorting code, that sort might end up spreading all the blobs
out rather than sort them by size.

And again, if that happens, the "can I delta these two objects" code
will notice that the size of the objects are wildly different and
won't even bother trying. Which speeds up the "compressing" phase, of
course, but then because you don't get any good deltas, the "writing
out" phase sucks donkey balls because it does zlib compression on big
objects and writes them out to disk.

So there are certainly multiple possible reasons for the deltification
to not work well for you.

Hos sensitive is your material? Could you make a smaller repo with
some of the blobs that still show the symptoms? I don't think I want
to download 206GB of data even if my internet access is good.

Linus


Re: [PATCH 01/10] submodule: decouple url and submodule existence

2017-03-01 Thread Brandon Williams
On 02/24, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Currently the submodule..url config option is used to determine
> > if a given submodule exists and is interesting to the user.  This
> > however doesn't work very well because the URL is a config option for
> > the scope of a repository, whereas the existence of a submodule is an
> > option scoped to the working tree.
> >
> > In a future with worktree support for submodules, there will be multiple
> > working trees, each of which may only need a subset of the submodules
> > checked out.  The URL (which is where the submodule repository can be
> > obtained) should not differ between different working trees.
> >
> > It may also be convenient for users to more easily specify groups of
> > submodules they are interested in as apposed to running "git submodule
> > init " on each submodule they want checked out in their working
> > tree.
> >
> > To this end, the config option submodule.active is introduced which
> > holds a pathspec that specifies which submodules should exist in the
> > working tree.
> 
> Hmph.  submodule.active in .git/config would be shared the same way
> submodule..url in .git/config is shared across the worktrees
> that stems from the same primary repository, no?
> 
> Perhaps there are some other uses of this submodule.active idea, but
> I do not see how it is relevant to solving "multiple worktrees"
> issue.  Per-worktree config would solve it with the current
> submodule..url without submodule.active list, I would think [*1*].

Correct, I should update the language to indicate this allows the URL to
be shared between worktrees, but a per-worktree config must exist before
submodule.active can actually be used to select different groups of
submodules per-worktree.  The idea is that if submodule.active is set
then you no longer look at the URLs to see what is interesting but
rather at the paths.

> Also as a grouping measure, submodule.active that lists submodule
> paths feels hard to use.  When switching between two branches in the
> superproject that have the same submodule bound at two different
> paths, who is responsible for updating submodule.active in
> superproject's config?  If it were a list of submodule names, this
> objection does not apply, though.

I agree that if you are listing every submodule path by hand then this
may not be the best approach and would be difficult to use.  The idea is
that this would allow a user to set a general pathspec to identify a
group of modules they are interested in.  Perhaps once attributes can be
used in pathspecs a user could group submodules by setting a particular
attribute and then submodule.active would have a value like
":(attr:foo)" to indicate I'm interested in all submodules with the
"foo" attribute.

> 
> 
> 
> [Footnote]
> 
> *1* At the conceptual level, I agree that .url that also means "we
> are interested in this one" feels like somewhat an unclean
> design, but that is not what you are "fixing", is it?
> 

-- 
Brandon Williams


Re: [PATCH 3/6] Introduce a new "printf format" for timestamps

2017-03-01 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Schindelin  writes:
>
>> So let's introduce the pseudo format "PRItime" (currently simply being
>> "lu") so that it is easy later on to change the data type to time_t.
>
> The problem being solved is a good thing to solve, and 
>
>> -printf("author-time %lu\n", ci.author_time);
>> +printf("author-time %"PRItime"\n", ci.author_time);
> ...
> It would be better to introduce the timestamp_t we discussed earlier
> before (or at) this step, and typedef it to ulong first, and then in
> this step, change the above to
>
>   printf("author-time %"PRItime"\n", (timestamp_t)ci.author_time);
>
> to keep them in sync.

Nah, ignore me.  This was just me being silly.

I was somehow expecting (incorrecty) that we would pick one single
PRItime for everybody and end up doing an equivalent of

printf("%llu", (unsigned long long)(something_that_is_time_t))

But as long as the plan is to configure PRItime for the platform's
time_t (or whatever the final type for timestamp_t is), we do not
have to have any extra cast here.  The endgame will use the type
that is consistent with %PRItime for variables and structure fields,
and we do not want an extra cast.


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Junio C Hamano
Linus Torvalds  writes:

> That said, I think that it would be lovely to just default to
> USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
> No, it doesn't really seem to matter that much in practice.

Yes.  It would be a very good goal.



Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Jeff King
On Wed, Mar 01, 2017 at 10:49:55AM -0800, Linus Torvalds wrote:

> That said, I think that it would be lovely to just default to
> USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
> No, it doesn't really seem to matter that much in practice.

My biggest concern is the index-pack operation. Try this:

  time git clone --no-local --bare linux tmp.git

with and without USE_SHA1DC. I get:

  [w/ openssl]
  real  1m52.307s
  user  2m47.928s
  sys   0m14.992s

  [w/ sha1dc]
  real  3m4.043s
  user  6m16.412s
  sys   0m13.772s

That's real latency the user will see. It's hard to break it down,
though. The actual "receiving" phase is generally going to be network
bound. The delta-resolution that happens afterwards is totally local and
CPU-bound (but does run in parallel).

And of course this repository tends to the larger side (though certainly
there are bigger ones), and you only feel the pain on clone or when
doing an initial push, not day-to-day.

So maybe we just suck it up and accept that it's a bit slower.

-Peff


Re: [PATCH 2/2] pretty: use fmt_output_email_subject()

2017-03-01 Thread Junio C Hamano
On Wed, Mar 1, 2017 at 11:32 AM, Jeff King  wrote:
> I would think that future callers would just need to provide a dummy
> pp->rev. I guess that logic could be pushed down into
> fmt_output_email_subject(), so that it skips looking at
> opt->subject_prefix, etc, when "opt" is NULL, and just hits the
> "Subject:" case arm.

The "flexibility" I was wondering about is that the current .subject can
point at any caller-supplied string, not "Subject:".


Re: [PATCH 2/2] pretty: use fmt_output_email_subject()

2017-03-01 Thread Jeff King
On Wed, Mar 01, 2017 at 10:08:10AM -0800, Junio C Hamano wrote:

> > strbuf_grow(sb, title.len + 1024);
> > -   if (pp->subject) {
> > -   strbuf_addstr(sb, pp->subject);
> > +   if (pp->print_email_subject) {
> > +   if (pp->rev)
> > +   fmt_output_email_subject(sb, pp->rev);
> 
> A hidden assumption this code makes is that anybody who does not
> want .rev (aka "doing it as part of format-patch that may want
> nr/total etc") does not want _any_ "Subject: ".  It obviously holds
> true in today's code (the one in shortlog-add-commit is the only one
> and it sets an empty string to .subject).
> 
> Does the loss of flexibility to the future callers matter, though?
> I cannot tell offhand.
> 
> Thanks.  Let's see what others think.

I would think that future callers would just need to provide a dummy
pp->rev. I guess that logic could be pushed down into
fmt_output_email_subject(), so that it skips looking at
opt->subject_prefix, etc, when "opt" is NULL, and just hits the
"Subject:" case arm.

I don't think it's a big deal, but it would be easy to fix now, like:

diff --git a/log-tree.c b/log-tree.c
index 4618dd04c..c73df6857 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -334,13 +334,13 @@ void fmt_output_commit(struct strbuf *filename,
 
 void fmt_output_email_subject(struct strbuf *sb, struct rev_info *opt)
 {
-   if (opt->total > 0) {
+   if (opt && opt->total > 0) {
strbuf_addf(sb, "Subject: [%s%s%0*d/%d] ",
opt->subject_prefix,
*opt->subject_prefix ? " " : "",
digits_in_number(opt->total),
opt->nr, opt->total);
-   } else if (opt->total == 0 && opt->subject_prefix && 
*opt->subject_prefix) {
+   } else if (opt && opt->total == 0 && opt->subject_prefix && 
*opt->subject_prefix) {
strbuf_addf(sb, "Subject: [%s] ",
opt->subject_prefix);
} else {
diff --git a/pretty.c b/pretty.c
index d0f86f5d8..6b321c68c 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1608,8 +1608,7 @@ void pp_title_line(struct pretty_print_context *pp,
 
strbuf_grow(sb, title.len + 1024);
if (pp->print_email_subject) {
-   if (pp->rev)
-   fmt_output_email_subject(sb, pp->rev);
+   fmt_output_email_subject(sb, pp->rev);
if (needs_rfc2047_encoding(title.buf, title.len, 
RFC2047_SUBJECT))
add_rfc2047(sb, title.buf, title.len,
encoding, RFC2047_SUBJECT);

-Peff


Re: [PATCH v7 3/3] config: add conditional include

2017-03-01 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

>  int git_config_include(const char *var, const char *value, void *data)
>  {
>   struct config_include_data *inc = data;
> + const char *cond, *key;
> + int cond_len;
>   int ret;
>  
>   /*
> @@ -185,6 +271,12 @@ int git_config_include(const char *var, const char 
> *value, void *data)
>  
>   if (!strcmp(var, "include.path"))
>   ret = handle_path_include(value, inc);
> +
> + if (!parse_config_key(var, "includeif", , _len, ) &&
> + (cond && include_condition_is_true(cond, cond_len)) &&
> + !strcmp(key, "path"))
> + ret = handle_path_include(value, inc);
> +
>   return ret;
>  }

So "includeif.path" (misspelled one without any condition) falls
through to "return ret" and gives the value we got from inc->fn().
I am OK with that (i.e. "missing condition is false").

Or we can make it go to handle_path_include(), effectively making
the "include.path" a short-hand for "includeIf.path".  I am also OK
with that (i.e. "missing condition is true").

Or we could even have "include.[.]path" without
"includeIf"?  I am not sure if it is a bad idea that paints
ourselves in a corner, but somehow I find it tempting.


Hello

2017-03-01 Thread university
Please kindly reply me if this your email is still in use.

Sincerely yours
Anessa...t


Re: [PATCH 2/2] pretty: use fmt_output_email_subject()

2017-03-01 Thread Jeff King
On Wed, Mar 01, 2017 at 12:37:07PM +0100, René Scharfe wrote:

> Add the email-style subject prefix (e.g. "Subject: [PATCH] ") directly
> when it's needed instead of letting log_write_email_headers() prepare
> it in a static buffer in advance.  This simplifies storage ownership and
> code flow.

This looks much cleaner to me.

I suspect we can do the same thing with the mime headers below there.
They end up in extra_headers, which is shown via after_subject. But we
could mostly replace after_subject with a call to fmt_output_email_mime()
or similar.

The only other use of extra_headers seems to be the static to/cc fields
one can add via format-patch. But those _should_ be treated differently,
as they can be allocated once in the rev_info, not per-commit. Which I
think shows off another bug. If you have a large to/cc list, that all
gets lumped into the same 1024-byte buffer, and may cause truncation.

I think the diffopt.stat_sep thing could get similar handling, too. It
appears to be set only in this one spot, and gets looked at in exactly
one. That could be replaced with an on-the-fly function call.

> This slows down the last three tests in p4000 by ca. 3% for some reason,
> so we may want to only do the first part for now, which is performance
> neutral on my machine.

It sounds like the bitfield was the cause, so that should be an easy
fix. The other question is whether it makes "--format=email" any slower.
It shouldn't, as your new approach doesn't do any extra per-commit
allocations (and in fact, it avoids some useless buffer-copying).

I couldn't measure any difference.

-Peff


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Jeff King
On Tue, Feb 28, 2017 at 04:30:26PM -0800, Linus Torvalds wrote:

> From: Linus Torvalds 
> Date: Tue, 28 Feb 2017 16:12:32 -0800
> Subject: [PATCH] Put sha1dc on a diet
> 
> This removes the unnecessary parts of the sha1dc code, shrinking things from
> [...]

So obviously the smaller object size is nice, and the diffstat is
certainly satisfying. My only qualm would be whether this conflicts with
the optimizations that Dan is working on (probably not conceptually, but
textually).

-Peff


Re: gpg verify git sub modules useful?

2017-03-01 Thread Stefan Beller
On Wed, Mar 1, 2017 at 10:28 AM, Patrick Schleizer
 wrote:
> Good questions, thank you for trying to figure out what I am asking. :)
>
> Junio C Hamano:
>> Patrick Schleizer  writes:
>>
>>> When using git submodules, is there value in iterating about the git
>>> submodules running "git verfiy-commit HEAD" or would that be already
>>> covered by the git submodule verification?
>>
>> That depends on what you are referring to with the "git submodule
>> verification"
>
> cd submodule
> if ! git verfiy-commit HEAD ; then
>error
> fi
>
>> and more importantly what threat you are guarding
>> against.
>
> All main (non-submodule) (merge) commits and submodule (merge) commits
> are signed by me.
>
> 1) git --recursive clone main (non-submodule) git repository
> 2) cd git main repository
> 3) git verify-commit HEAD or git verify-tag tag-name
> 4) git submodule update
>
> What if the main (non-submodule) git repository gpg signature was okay
> but then after git fetched the submodules these compromised (MITM'ed) ones?

The signing in Git is just signing the commit hash essentially.

> Does the having gpg verified the root (main git repository) ensure that
> submodule commits are also quasi verified?

That is my understanding. There is no difference between the security of
a file or a submodule, just the way of obtaining and its reporting is different.
Both a file and a submodule are referred to via a hash (currently sha1).
Obtaining a file is implicit whereas obtaining the submodule is explicit.
The reporting (in e.g. git-status) ... depends on a lot of options to be set.

When signing the superproject, you acknowledge the submodules
being in the state as recorded. (Same with s/submodules/files/)

So I am not sure what kind of additional signing you're looking
for in the submodules.

Thanks,
Stefan


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Linus Torvalds
On Wed, Mar 1, 2017 at 10:42 AM, Junio C Hamano  wrote:
> I see //c99 comments

sha1dc is already full of // style comments. I just followed the
existing practice.

> and also T array[] = { [58] = val } both of
> which I think we stay away from (and the former is from the initial
> import), so some people on other platforms MAY have trouble with
> this topic.

Hmm. The "{ [58] = val; }" kind of initialization would be easy to
work around by just filling in everything else with NULL, but it would
make for a pretty nasty readability issue.

That said, if you mis-count the NULL's, the end result will pretty
immediately SIGSEGV, so I guess it wouldn't be much of a maintenance
problem.

But if you're just willing to take the "let's see" approach, I think
the explicitly numbered initializer is much better.

The main people who I assume would really want to use the sha1dc
library are hosting places. And they won't be using crazy compilers
from the last century.

That said, I think that it would be lovely to just default to
USE_SHA1DC and just put the whole attack behind us. Yes, it's slower.
No, it doesn't really seem to matter that much in practice.

  Linus


Re: [PATCH] Put sha1dc on a diet

2017-03-01 Thread Junio C Hamano
Linus Torvalds  writes:

> I notice that sha1dc is in the 'pu' branch now, so let's put my money 
> where my mouth is, and send in the sha1dc diet patch.

I see //c99 comments and also T array[] = { [58] = val } both of
which I think we stay away from (and the former is from the initial
import), so some people on other platforms MAY have trouble with
this topic.

Let's see what happens by queuing it on 'pu' ;-)

Thanks.


[PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-01 Thread tboegi
From: Junio C Hamano 

git diff --quiet may take a short-cut to see if a file is changed
in the working tree:
Whenever the file size differs from what is recorded in the index,
the file is assumed to be changed and git diff --quiet returns
exit with code 1

This shortcut must be suppressed whenever the line endings are converted
or a filter is in use.
The attributes say "* text=auto" and a file has
"Hello\nWorld\n" in the index with a length of 12.
The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
(Or even "Hello\r\nWorld\n").
In this case "git add" will not do any changes to the index, and
"git diff -quiet" should exit 0.

Add calls to would_convert_to_git() before blindly saying that a different
size means different content.

Reported-By: Mike Crowe 
Signed-off-by: Torsten Bögershausen 
---
This is what I can come up with, collecting all the loose ends.
I'm not sure if Mike wan't to have the Reported-By with a
Signed-off-by ?
The other question is, if the commit message summarizes the discussion
well enough ?

diff.c| 18 ++
 t/t0028-diff-converted.sh | 27 +++
 2 files changed, 41 insertions(+), 4 deletions(-)
 create mode 100755 t/t0028-diff-converted.sh

diff --git a/diff.c b/diff.c
index 051761b..c264758 100644
--- a/diff.c
+++ b/diff.c
@@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct 
diff_filepair *p)
 *differences.
 *
 * 2. At this point, the file is known to be modified,
-*with the same mode and size, and the object
-*name of one side is unknown.  Need to inspect
-*the identical contents.
+*with the same mode and size, the object
+*name of one side is unknown, or size comparison
+*cannot be depended upon.  Need to inspect the
+*contents.
 */
if (!DIFF_FILE_VALID(p->one) || /* (1) */
!DIFF_FILE_VALID(p->two) ||
@@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct 
diff_filepair *p)
(p->one->mode != p->two->mode) ||
diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
-   (p->one->size != p->two->size) ||
+
+   /*
+* only if eol and other conversions are not involved,
+* we can say that two contents of different sizes
+* cannot be the same without checking their contents.
+*/
+   (!would_convert_to_git(p->one->path) &&
+!would_convert_to_git(p->two->path) &&
+(p->one->size != p->two->size)) ||
+
!diff_filespec_is_identical(p->one, p->two)) /* (2) */
p->skip_stat_unmatch_result = 1;
return p->skip_stat_unmatch_result;
diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
new file mode 100755
index 000..3d5ab95
--- /dev/null
+++ b/t/t0028-diff-converted.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Mike Crowe
+#
+# These tests ensure that files changing line endings in the presence
+# of .gitattributes to indicate that line endings should be ignored
+# don't cause 'git diff' or 'git diff --quiet' to think that they have
+# been changed.
+
+test_description='git diff with files that require CRLF conversion'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+   echo "* text=auto" >.gitattributes &&
+   printf "Hello\r\nWorld\r\n" >crlf.txt &&
+   git add .gitattributes crlf.txt &&
+   git commit -m "initial"
+'
+
+test_expect_success 'quiet diff works on file with line-ending change that has 
no effect on repository' '
+   printf "Hello\r\nWorld\n" >crlf.txt &&
+   git status &&
+   git diff --quiet
+'
+
+test_done
-- 
2.10.0



Re: gpg verify git sub modules useful?

2017-03-01 Thread Patrick Schleizer
Good questions, thank you for trying to figure out what I am asking. :)

Junio C Hamano:
> Patrick Schleizer  writes:
> 
>> When using git submodules, is there value in iterating about the git
>> submodules running "git verfiy-commit HEAD" or would that be already
>> covered by the git submodule verification?
> 
> That depends on what you are referring to with the "git submodule
> verification"

cd submodule
if ! git verfiy-commit HEAD ; then
   error
fi

> and more importantly what threat you are guarding
> against. 

All main (non-submodule) (merge) commits and submodule (merge) commits
are signed by me.

1) git --recursive clone main (non-submodule) git repository
2) cd git main repository
3) git verify-commit HEAD or git verify-tag tag-name
4) git submodule update

What if the main (non-submodule) git repository gpg signature was okay
but then after git fetched the submodules these compromised (MITM'ed) ones?

Does the having gpg verified the root (main git repository) ensure that
submodule commits are also quasi verified?

> "git -C  verify-commit HEAD" may make sure
> that the contents of that commit object is GPG signed by whoever you
> trust--is that what you want to make sure?

> Or do you want all
> commits in the submodule history to be similarly signed because the
> tree of the superproject can switch to some other commit there?

I guess so.



Re: [PATCH 3/6] Introduce a new "printf format" for timestamps

2017-03-01 Thread Junio C Hamano
Johannes Schindelin  writes:

> So let's introduce the pseudo format "PRItime" (currently simply being
> "lu") so that it is easy later on to change the data type to time_t.

The problem being solved is a good thing to solve, and 

> - printf("author-time %lu\n", ci.author_time);
> + printf("author-time %"PRItime"\n", ci.author_time);

is one of the two ingredients to the solution for this line.  But
the final form would require casting ci.author_time to the type
expected by %PRItime format specifier.  With this change alone, you
could define PRItime to expect an winder type in the next step but
that would be a bad conversion.  IOW, changing only the format
without introducing an explicit cast appears to invite future
mistakes.

It would be better to introduce the timestamp_t we discussed earlier
before (or at) this step, and typedef it to ulong first, and then in
this step, change the above to

printf("author-time %"PRItime"\n", (timestamp_t)ci.author_time);

to keep them in sync.  And at a later step in the series, you can
update definition of PRItime and timestamp_t to make them wider at
the same time, and the changes in this patch like the above line
would not need to be touched again.




Re: [PATCH 2/2] pretty: use fmt_output_email_subject()

2017-03-01 Thread Junio C Hamano
René Scharfe  writes:

> Add the email-style subject prefix (e.g. "Subject: [PATCH] ") directly
> when it's needed instead of letting log_write_email_headers() prepare
> it in a static buffer in advance.  This simplifies storage ownership and
> code flow.

It certainly does.  At first I wondered if there are places other
than log-write-email-headers that sets its own string to pp.subject,
but this patch removes the field from the structure to guarantee
that such a place, if existed, is properly dealt with.  Otherwise,
the resulting code would not compile.

> @@ -1005,6 +1004,8 @@ static void make_cover_letter(struct rev_info *rev, int 
> use_stdout,
>   msg = body;
>   pp.fmt = CMIT_FMT_EMAIL;
>   pp.date_mode.type = DATE_RFC2822;
> + pp.rev = rev;
> + pp.print_email_subject = 1;

These two are always set together?  We'll shortly see that it is not
the case below.

>   pp_user_info(, NULL, , committer, encoding);
>   pp_title_line(, , , encoding, need_8bit_cte);
>   pp_remainder(, , , 0);
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index c9585d475d..f78bb4818d 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -148,7 +148,7 @@ void shortlog_add_commit(struct shortlog *log, struct 
> commit *commit)
>  
>   ctx.fmt = CMIT_FMT_USERFORMAT;
>   ctx.abbrev = log->abbrev;
> - ctx.subject = "";
> + ctx.print_email_subject = 1;

Here we see .rev is left to NULL here, with print_email_subject set
to true.  And in the entire patch this is the only such place.

> diff --git a/pretty.c b/pretty.c
> index 5e683830d9..d0f86f5d85 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1607,8 +1607,9 @@ void pp_title_line(struct pretty_print_context *pp,
>   pp->preserve_subject ? "\n" : " ");
>  
>   strbuf_grow(sb, title.len + 1024);
> - if (pp->subject) {
> - strbuf_addstr(sb, pp->subject);
> + if (pp->print_email_subject) {
> + if (pp->rev)
> + fmt_output_email_subject(sb, pp->rev);

A hidden assumption this code makes is that anybody who does not
want .rev (aka "doing it as part of format-patch that may want
nr/total etc") does not want _any_ "Subject: ".  It obviously holds
true in today's code (the one in shortlog-add-commit is the only one
and it sets an empty string to .subject).

Does the loss of flexibility to the future callers matter, though?
I cannot tell offhand.

Thanks.  Let's see what others think.


Re: Delta compression not so effective

2017-03-01 Thread Marius Storm-Olsen

On 3/1/2017 11:36, Linus Torvalds wrote:

On Wed, Mar 1, 2017 at 5:51 AM, Marius Storm-Olsen  wrote:


When first importing, I disabled gc to avoid any repacking until completed.
When done importing, there was 209GB of all loose objects (~670k files).
With the hopes of quick consolidation, I did a
git -c gc.autoDetach=0 -c gc.reflogExpire=0 \
  -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 \
  -c gc.rerereunresolved=0 -c gc.pruneExpire=now \
  gc --prune
which brought it down to 206GB in a single pack. I then ran
git repack -a -d -F --window=350 --depth=250
which took it down to 203GB, where I'm at right now.


Considering that it was 209GB in loose objects, I don't think it
delta-packed the big objects at all.

I wonder if the big objects end up hitting some size limit that causes
the delta creation to fail.


You're likely on to something here.
I just ran
git verify-pack --verbose 
objects/pack/pack-9473815bc36d20fbcd38021d7454fbe09f791931.idx | sort 
-k3n | tail -n15

and got no blobs with deltas in them.
  feb35d6dc7af8463e038c71cc3893d163d47c31c blob   36841958 36461935 
3259424358
  007b65e603cdcec6644ddc25c2a729a394534927 blob   36845345 36462120 
3341677889
  0727a97f68197c99c63fcdf7254e5867f8512f14 blob   37368646 36983862 
3677338718
  576ce2e0e7045ee36d0370c2365dc730cb435f40 blob   37399203 37014740 
3639613780
  7f6e8b22eed5d8348467d9b0180fc4ae01129052 blob   125296632 83609223 
5045853543
  014b9318d2d969c56d46034a70223554589b3dc4 blob   170113524 6124878 
1118227958
  22d83cb5240872006c01651eb1166c8db62c62d8 blob   170113524 65941491 
1257435955
  292ac84f48a3d5c4de8d12bfb2905e055f9a33b1 blob   170113524 67770601 
1323377446
  2b9329277e379dfbdcd0b452b39c6b0bf3549005 blob   170113524 7656690 
1110571268
  37517efb4818a15ad7bba79b515170b3ee18063b blob   170113524 133083119 
1124352836
  55a4a70500eb3b99735677d0025f33b1bb78624a blob   170113524 6592386 
1398975989
  e669421ea5bf2e733d5bf10cf505904d168de749 blob   170113524 7827942 
1391148047
  e9916da851962265a9d5b099e72f60659a74c144 blob   170113524 73514361 
966299538
  f7bf1313752deb1bae592cc7fc54289aea87ff19 blob   170113524 70756581 
1039814687
  8afc6f2a51f0fa1cc4b03b8d10c70599866804ad blob   248959314 237612609 
606692699


In fact, I don't see a single "deltified" blob until 6355th last line!



For example, we have that HASH_LIMIT  that limits how many hashes
we'll create for the same hash bucket, because there's some quadratic
behavior in the delta algorithm. It triggered with things like big
files that have lots of repeated content.

We also have various memory limits, in particular
'window_memory_limit'. That one should default to 0, but maybe you
limited it at some point in a config file and forgot about it?


Indeed, I did do a
-c pack.threads=20 --window-memory=6g
to 'git repack', since the machine is a 20-core (40 threads) machine 
with 126GB of RAM.


So I guess with these sized objects, even at 6GB per thread, it's not 
enough to get a big enough Window for proper delta-packing?


This repo took >14hr to repack on 20 threads though ("compression" step 
was very fast, but stuck 95% of the time in "writing objects"), so I can 
only imagine how long a pack.threads=1 will take :)


But arent't the blobs sorted by some metric for reasonable delta-pack 
locality, so even with a 6GB window it should have seen ~25 similar 
objects to deltify against?



--
.marius


Re: [PATCH v7 0/3] Conditional config include

2017-03-01 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Nguyễn Thái Ngọc Duy (3):
>   config.txt: clarify multiple key values in include.path
>   config.txt: reflow the second include.path paragraph
>   config: add conditional include

Thanks.  The primary change looked good (it looked good already in
the previous round).

Here is a list of minor suggestions on top.

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2a41e84bab..5faabc7934 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -82,7 +82,7 @@ Includes
 You can include a config file from another by setting the special
 `include.path` variable to the name of the file to be included. The
 variable takes a pathname as its value, and is subject to tilde
-expansion. `include.path` supports multiple key values.
+expansion. `include.path` can be given multiple times.
 
 The included file is expanded immediately, as if its contents had been
 found at the location of the include directive. If the value of the
@@ -96,8 +96,7 @@ Conditional includes
 You can include a config file from another conditionally by setting a
 `includeIf..path` variable to the name of the file to be
 included. The variable's value is treated the same way as
-`include.path`. `includeIf.path` supports multiple key
-values.
+`include.path`. `includeIf..path` can be given multiple times.
 
 The condition starts with a keyword followed by a colon and some data
 whose format and meaning depends on the keyword. Supported keywords


Re: [PATCH v7 1/3] config.txt: clarify multiple key values in include.path

2017-03-01 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> The phrasing in this paragraph may give an impression that you can only
> use it once. Rephrase it a bit.
>
> Helped-by: Philip Oakley 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/config.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 015346c417..4748efbf36 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -79,10 +79,10 @@ escape sequences) are invalid.
>  Includes
>  
>  
> -You can include one config file from another by setting the special
> +You can include a config file from another by setting the special
>  `include.path` variable to the name of the file to be included. The
>  variable takes a pathname as its value, and is subject to tilde
> -expansion.
> +expansion. `include.path` supports multiple key values.

Multiple key values or just simply multiple values?  I think it is
the latter.

But to the intended target audience, I think

`include.path` can be given multiple times.

is easier to understand.  It's not like you can (or want to)
enumerate with "git config --get-all include.path" to learn all the
values (for the single key "include.value"), and it is better not to
lead readers to think of this in terms of  in the first
place (which is already clarified in the text that follows).

>  The
>  included file is expanded immediately, as if its contents had been


Re: Delta compression not so effective

2017-03-01 Thread Linus Torvalds
On Wed, Mar 1, 2017 at 5:51 AM, Marius Storm-Olsen  wrote:
>
> When first importing, I disabled gc to avoid any repacking until completed.
> When done importing, there was 209GB of all loose objects (~670k files).
> With the hopes of quick consolidation, I did a
> git -c gc.autoDetach=0 -c gc.reflogExpire=0 \
>   -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 \
>   -c gc.rerereunresolved=0 -c gc.pruneExpire=now \
>   gc --prune
> which brought it down to 206GB in a single pack. I then ran
> git repack -a -d -F --window=350 --depth=250
> which took it down to 203GB, where I'm at right now.

Considering that it was 209GB in loose objects, I don't think it
delta-packed the big objects at all.

I wonder if the big objects end up hitting some size limit that causes
the delta creation to fail.

For example, we have that HASH_LIMIT  that limits how many hashes
we'll create for the same hash bucket, because there's some quadratic
behavior in the delta algorithm. It triggered with things like big
files that have lots of repeated content.

We also have various memory limits, in particular
'window_memory_limit'. That one should default to 0, but maybe you
limited it at some point in a config file and forgot about it?

 Linus


Re: [BUG] branch renamed to 'HEAD'

2017-03-01 Thread Junio C Hamano
Jacob Keller  writes:

> I didn't find any problems besides what you had already outlined
> before I started reading the series. It looks pretty much like I
> thought it would. I like the idea of saying "I want X" rather than the
> command returning "This was a Y"

Yeah, thanks for reading it through.  Except for one disambiguation
glitch Peff noticed himself, I found the entire series a pleasant
read.


Re: [PATCH 0/6] Use time_t

2017-03-01 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Feb 28, 2017 at 02:27:22PM -0800, Junio C Hamano wrote:
>
>> Jeff King  writes:
>> 
>> > ... We can certainly stick with it for now (it's awkward if you
>> > really do have an entry on Jan 1 1970, but other than that it's an OK
>> > marker). I agree that the most negatively value is probably a saner
>> > choice, but we can switch to it after the dust settles.
>> 
>> I was trying to suggest that we should strive to switch to the most
>> negative or whatever the most implausible value in the new range
>> (and leave it as a possible bug to be fixed if we missed a place
>> that still used "0 is impossible") while doing the ulong to time_t
>> (or timestamp_t that is i64).  
>> 
>> "safer in the short term" wasn't meant to be "let's not spend time
>> to do quality work".  As long as we are switching, we should follow
>> it through.
>
> Sure, I'd be much happier to see it done now. I just didn't want to pile
> on the requirements to the point that step 1 doesn't get done.

Ah, that was what you meant.

I was assuming that we are switching to a longer _signed_ type.  It
felt silly to tell users "you can use timestamps before the epoch
now with this change, but you cannot express time exactly at the
epoch".

I am perfectly OK with switching to a longer _unsigned_ type with
the "0 is impossible" [*1*] intact, aka "safer in the short term",
if we want to it make our first step.  That may be a smaller step,
but still a step in the right direction.


[Footnote]

*1* It could be "-1 is impossible", I didn't actually check.
Funnily enough, ISO C99 uses (time_t)(-1) to signal an error
when returning value from mktime() and time().



Re: Delta compression not so effective

2017-03-01 Thread Junio C Hamano
On Wed, Mar 1, 2017 at 5:51 AM, Marius Storm-Olsen  wrote:
> ... which brought it down to 206GB in a single pack. I then ran
> git repack -a -d -F --window=350 --depth=250
> which took it down to 203GB, where I'm at right now.

Just a hunch. s/F/f/ perhaps?  "-F" does not allow Git to recover from poor
delta-base choice the original importer may have made (and if the
original importer
used fast-import, it is known that its choice of the delta-base is suboptimal).


Re: [PATCH v7 3/3] config: add conditional include

2017-03-01 Thread Ramsay Jones


On 01/03/17 11:26, Nguyễn Thái Ngọc Duy wrote:
> Sometimes a set of repositories want to share configuration settings
> among themselves that are distinct from other such sets of repositories.
> A user may work on two projects, each of which have multiple
> repositories, and use one user.email for one project while using another
> for the other.
> 
> Setting $GIT_DIR/.config works, but if the penalty of forgetting to
> update $GIT_DIR/.config is high (especially when you end up cloning
> often), it may not be the best way to go. Having the settings in
> ~/.gitconfig, which would work for just one set of repositories, would
> not well in such a situation. Having separate ${HOME}s may add more
> problems than it solves.
> 
> Extend the include.path mechanism that lets a config file include
> another config file, so that the inclusion can be done only when some
> conditions hold. Then ~/.gitconfig can say "include config-project-A
> only when working on project-A" for each project A the user works on.
> 
> In this patch, the only supported grouping is based on $GIT_DIR (in
> absolute path), so you would need to group repositories by directory, or
> something like that to take advantage of it.
> 
> We already have include.path for unconditional includes. This patch goes
> with includeIf..path to make it clearer that a condition is
> required. The new config has the same backward compatibility approach as
> include.path: older git versions that don't understand includeIf will
> simply ignore them.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/config.txt  | 64 +
>  config.c  | 92 
> +++
>  t/t1305-config-include.sh | 56 +
>  3 files changed, 212 insertions(+)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1fad746efd..2a41e84bab 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -90,6 +90,59 @@ found at the location of the include directive. If the 
> value of the
>  be relative to the configuration file in which the include directive
>  was found.  See below for examples.
>  
> +Conditional includes
> +
> +
> +You can include a config file from another conditionally by setting a
> +`includeIf..path` variable to the name of the file to be
> +included. The variable's value is treated the same way as
> +`include.path`. `includeIf.path` supports multiple key
^^
s/path/.path/

ATB,
Ramsay Jones


Re: Delta compression not so effective

2017-03-01 Thread Junio C Hamano
On Wed, Mar 1, 2017 at 8:06 AM, Junio C Hamano  wrote:

> Just a hunch. s/F/f/ perhaps?  "-F" does not allow Git to recover from poor

Nah, sorry for the noise. Between -F and -f there shouldn't be any difference.


Re: [PATCH v5 24/24] t1406: new tests for submodule ref store

2017-03-01 Thread Michael Haggerty
On 03/01/2017 01:34 PM, Duy Nguyen wrote:
> On Wed, Mar 1, 2017 at 12:34 AM, Michael Haggerty  
> wrote:
>> On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
>>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>>> ---
>>>  t/t1406-submodule-ref-store.sh (new +x) | 95 
>>> +
>>>  1 file changed, 95 insertions(+)
>>>  create mode 100755 t/t1406-submodule-ref-store.sh
>>>
>>> diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
>>> new file mode 100755
>>> index 0..3b30ba62f
>>> --- /dev/null
>>> +++ b/t/t1406-submodule-ref-store.sh
>>> [...]
>>
>> I haven't actually read this far in the patch series, but I noticed that
>> a test in this file fails:
>>
>>
>> t1406-submodule-ref-store.sh (Wstat: 256 Tests: 15
>> Failed: 1)
>>   Failed test:  10
>>   Non-zero exit status: 1
>>
>> I didn't have time to look into it more; let me know if you can't
>> reproduce it.
> 
> Fantastic. No I couldn't reproduce it, even --valgrind did not
> complain. First step maybe just pushing your branch somewhere so I can
> try out if you're applying the patches via mail (maybe there's some
> changes in the base that affect this). .Otherwise /t1406-* -v -i might
> be enough clue for me to dig in, I hope.

I'm testing c5302654930070135eec9bc1b4ef99b14e0f28ee from Junio's GitHub
fork.

Unfortunately, the test succeeds (every time) when I run just `t1406-*`
or with `-d` or `-i` options, but fails (every time) when run as part of
the whole test suite, so it's a bit tricky to dig deeper.

By trial and error, I found that the test succeeds if I comment out the
"for_each_reflog()" test. By having that test write its results to
`/tmp` where they won't be deleted, I found that the problem is that the
`actual` results are not sorted correctly:

refs/heads/new-master 0x0
refs/heads/master 0x0
HEAD 0x1

I don't know why it's so Heisenbergish.

Michael



Delta compression not so effective

2017-03-01 Thread Marius Storm-Olsen
I have just converted an SVN repo to Git (using SubGit), where I feel 
delta compression has let me down :)


Suffice it to say, this is a "traditional" SVN repo, with an extern/ 
blown out of proportion with many binary check-ins. BUT, even still, I 
would expect Git's delta compression to be quite effective, compared to 
the compression present in SVN. In this case however, the Git repo ends 
up being 46% larger than the SVN DB.


Details - SVN:
Commits: 32988
DB (server) size: 139GB
Branches: 103
Tags: 1088

Details - Git:
$ git count-objects -v
  count: 0
  size: 0
  in-pack: 666515
  packs: 1
  size-pack: 211933109
  prune-packable: 0
  garbage: 0
  size-garbage: 0
$ du -sh .
  203G.

$ java -jar ~/sources/bfg/bfg.jar --delete-folders extern 
--no-blob-protection && \

  git reflog expire --expire=now --all && \
  git gc --prune=now --aggressive
$ git count-objects -v
  count: 0
  size: 0
  in-pack: 495070
  packs: 1
  size-pack: 5765365
  prune-packable: 0
  garbage: 0
  size-garbage: 0
$ du -sh .
  5.6G.

When first importing, I disabled gc to avoid any repacking until 
completed. When done importing, there was 209GB of all loose objects 
(~670k files). With the hopes of quick consolidation, I did a

git -c gc.autoDetach=0 -c gc.reflogExpire=0 \
  -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 \
  -c gc.rerereunresolved=0 -c gc.pruneExpire=now \
  gc --prune
which brought it down to 206GB in a single pack. I then ran
git repack -a -d -F --window=350 --depth=250
which took it down to 203GB, where I'm at right now.

However, this is still miles away from the 139GB in SVN's DB.

Any ideas what's going on, and why my results are so terrible, compared 
to SVN?


Thanks!

--
.marius


Re: [PATCH v5 24/24] t1406: new tests for submodule ref store

2017-03-01 Thread Duy Nguyen
On Wed, Mar 1, 2017 at 12:34 AM, Michael Haggerty  wrote:
> On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>>  t/t1406-submodule-ref-store.sh (new +x) | 95 
>> +
>>  1 file changed, 95 insertions(+)
>>  create mode 100755 t/t1406-submodule-ref-store.sh
>>
>> diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
>> new file mode 100755
>> index 0..3b30ba62f
>> --- /dev/null
>> +++ b/t/t1406-submodule-ref-store.sh
>> [...]
>
> I haven't actually read this far in the patch series, but I noticed that
> a test in this file fails:
>
>
> t1406-submodule-ref-store.sh (Wstat: 256 Tests: 15
> Failed: 1)
>   Failed test:  10
>   Non-zero exit status: 1
>
> I didn't have time to look into it more; let me know if you can't
> reproduce it.

Fantastic. No I couldn't reproduce it, even --valgrind did not
complain. First step maybe just pushing your branch somewhere so I can
try out if you're applying the patches via mail (maybe there's some
changes in the base that affect this). .Otherwise /t1406-* -v -i might
be enough clue for me to dig in, I hope.
-- 
Duy


  1   2   >