Re: [PATCH] git-svn: shorten glob error message

2016-01-22 Thread Victor Leschuk
Hello all,

On 01/14/2016 09:15 PM, Junio C Hamano wrote:
> Eric Wong  writes:
>
>> Error messages should attempt to fit within the confines of
>> an 80-column terminal to avoid compatibility and accessibility
>> problems.  Furthermore the word "directories" can be misleading
>> when used in the context of git refnames.
>>
>> Signed-off-by: Eric Wong 
>> ---
>>Eric Wong  wrote:
>>> I also noticed the "Only one set of wildcard directories" error
>>> message is unnecessary long and "wildcard directories" should
>>> probably be shortened to "wildcards" to avoid wrapping in a terminal.
>>> That will probably be a separate patch for me.
>>
>>There's likely more instances of this in git-svn, but I figured
>>we'll get this one fixed, first.
>>
>>Also pushed to bogomips.org/git-svn.git
>>(commit dc6aa7e61e9d33856f54d63b7acb518383420373)
>>along with Victor's patch.
> Thanks.
>
> I am not sure if it is a good idea to show */*/* as an example in
> the message (that is an anti-example of 'one set of wildcard' by
> having three stars, isn't it?), but that is not a new issue this
> change introduces.

I agree, this should be changed, however I think this should be done in 
separate patch.

Do we have any questions left open before this could be merged into main 
git repo?

--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] git-svn: add support for prefixed globs in config

2015-12-30 Thread Victor Leschuk
Introduce prefixed globs for branches and tags in git-svn.
Globs like 'release_*' allow users to avoid long lines in config like:

branches = branches/{release_20,release_21,release_22,...}

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
Changes from v3:
* Wrapped all test preparations in separate test-cases

 Documentation/git-svn.txt|   5 ++
 perl/Git/SVN/GlobSpec.pm |   9 ++-
 t/t9168-git-svn-prefixed-glob.sh | 142 +++
 3 files changed, 151 insertions(+), 5 deletions(-)
 create mode 100755 t/t9168-git-svn-prefixed-glob.sh

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 0c0f60b..529cffe 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -1034,6 +1034,7 @@ listed below are allowed:
url = http://server.org/svn
fetch = trunk/project-a:refs/remotes/project-a/trunk
branches = branches/*/project-a:refs/remotes/project-a/branches/*
+   branches = branches/release_*:refs/remotes/project-a/branches/release_*
tags = tags/*/project-a:refs/remotes/project-a/tags/*
 
 
@@ -1044,6 +1045,10 @@ independent path component (surrounded by '/' or EOL).   
This
 type of configuration is not automatically created by 'init' and
 should be manually entered with a text-editor or using 'git config'.
 
+Also note that prefixed globs (e.g. 'release_*') match everything after prefix
+but do not match exact prefix. For example:
+'release_*' will match 'release_1' or 'release_v1' but will not match 
'release_'.
+
 It is also possible to fetch a subset of branches or tags by using a
 comma-separated list of names within braces. For example:
 
diff --git a/perl/Git/SVN/GlobSpec.pm b/perl/Git/SVN/GlobSpec.pm
index c95f5d7..a136090 100644
--- a/perl/Git/SVN/GlobSpec.pm
+++ b/perl/Git/SVN/GlobSpec.pm
@@ -11,16 +11,15 @@ sub new {
my $die_msg = "Only one set of wildcard directories " .
"(e.g. '*' or '*/*/*') is supported: '$glob'\n";
for my $part (split(m|/|, $glob)) {
-   if ($part =~ /\*/ && $part ne "*") {
-   die "Invalid pattern in '$glob': $part\n";
-   } elsif ($pattern_ok && $part =~ /[{}]/ &&
+   if ($pattern_ok && $part =~ /[{}]/ &&
 $part !~ /^\{[^{}]+\}/) {
die "Invalid pattern in '$glob': $part\n";
}
-   if ($part eq "*") {
+   if ($part =~ /(\w*)\*/) {
die $die_msg if $state eq "right";
$state = "pattern";
-   push(@patterns, "[^/]*");
+   my $pat = $1 ? "${1}[^/]+" : "[^/]*";
+   push(@patterns, $pat);
} elsif ($pattern_ok && $part =~ /^\{(.*)\}$/) {
die $die_msg if $state eq "right";
$state = "pattern";
diff --git a/t/t9168-git-svn-prefixed-glob.sh b/t/t9168-git-svn-prefixed-glob.sh
new file mode 100755
index 000..1b08e45
--- /dev/null
+++ b/t/t9168-git-svn-prefixed-glob.sh
@@ -0,0 +1,142 @@
+#!/bin/sh
+test_description='git svn globbing refspecs with prefixed globs'
+. ./lib-git-svn.sh
+
+test_expect_success 'prepare test refspec prefixed globbing' '
+   cat >expect.end <trunk/src/a/readme &&
+   echo "goodbye world" >trunk/src/b/readme &&
+   svn_cmd import -m "initial" trunk "$svnrepo"/trunk &&
+   svn_cmd co "$svnrepo" tmp &&
+   (
+   cd tmp &&
+   mkdir branches tags &&
+   svn_cmd add branches tags &&
+   svn_cmd cp trunk branches/b_start &&
+   svn_cmd commit -m "start a new branch" &&
+   svn_cmd up &&
+   echo "hi" >>branches/b_start/src/b/readme &&
+   poke branches/b_start/src/b/readme &&
+   echo "hey" >>branches/b_start/src/a/readme &&
+   poke branches/b_start/src/a/readme &&
+   svn_cmd commit -m "hi" &&
+   svn_cmd up &&
+   svn_cmd cp branches/b_start tags/t_end &&
+   echo "bye" >>tags/t_end/src/b/readme &&
+   poke tags/t_end/src/b/readme &&
+   echo "aye" >>tags/t_end/src/a/readme &&
+   poke tags/t_end/src/a/readme &&
+   svn_cmd commit -m "the end" &&
+ 

[PATCH v2] git-svn: add support for prefixed globs in config

2015-12-17 Thread Victor Leschuk
Introduce prefixed globs for branches and tags in git-svn.
Globs like 'release_*' allow users to avoid long lines in config like:

branches = branches/{release_20,release_21,release_22,...}

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
Changes from v1:
* Joined implementation and test in one patch
* Fixed test code style according to current coding style guide

 Documentation/git-svn.txt|   5 ++
 perl/Git/SVN/GlobSpec.pm |   9 ++-
 t/t9168-git-svn-prefixed-glob.sh | 136 +++
 3 files changed, 145 insertions(+), 5 deletions(-)
 create mode 100755 t/t9168-git-svn-prefixed-glob.sh

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 0c0f60b..529cffe 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -1034,6 +1034,7 @@ listed below are allowed:
url = http://server.org/svn
fetch = trunk/project-a:refs/remotes/project-a/trunk
branches = branches/*/project-a:refs/remotes/project-a/branches/*
+   branches = branches/release_*:refs/remotes/project-a/branches/release_*
tags = tags/*/project-a:refs/remotes/project-a/tags/*
 
 
@@ -1044,6 +1045,10 @@ independent path component (surrounded by '/' or EOL).   
This
 type of configuration is not automatically created by 'init' and
 should be manually entered with a text-editor or using 'git config'.
 
+Also note that prefixed globs (e.g. 'release_*') match everything after prefix
+but do not match exact prefix. For example:
+'release_*' will match 'release_1' or 'release_v1' but will not match 
'release_'.
+
 It is also possible to fetch a subset of branches or tags by using a
 comma-separated list of names within braces. For example:
 
diff --git a/perl/Git/SVN/GlobSpec.pm b/perl/Git/SVN/GlobSpec.pm
index c95f5d7..a136090 100644
--- a/perl/Git/SVN/GlobSpec.pm
+++ b/perl/Git/SVN/GlobSpec.pm
@@ -11,16 +11,15 @@ sub new {
my $die_msg = "Only one set of wildcard directories " .
"(e.g. '*' or '*/*/*') is supported: '$glob'\n";
for my $part (split(m|/|, $glob)) {
-   if ($part =~ /\*/ && $part ne "*") {
-   die "Invalid pattern in '$glob': $part\n";
-   } elsif ($pattern_ok && $part =~ /[{}]/ &&
+   if ($pattern_ok && $part =~ /[{}]/ &&
 $part !~ /^\{[^{}]+\}/) {
die "Invalid pattern in '$glob': $part\n";
}
-   if ($part eq "*") {
+   if ($part =~ /(\w*)\*/) {
die $die_msg if $state eq "right";
$state = "pattern";
-   push(@patterns, "[^/]*");
+   my $pat = $1 ? "${1}[^/]+" : "[^/]*";
+   push(@patterns, $pat);
} elsif ($pattern_ok && $part =~ /^\{(.*)\}$/) {
die $die_msg if $state eq "right";
$state = "pattern";
diff --git a/t/t9168-git-svn-prefixed-glob.sh b/t/t9168-git-svn-prefixed-glob.sh
new file mode 100755
index 000..b8a059b
--- /dev/null
+++ b/t/t9168-git-svn-prefixed-glob.sh
@@ -0,0 +1,136 @@
+#!/bin/sh
+test_description='git svn globbing refspecs with prefixed globs'
+. ./lib-git-svn.sh
+
+cat >expect.end <trunk/src/a/readme &&
+   echo "goodbye world" >trunk/src/b/readme &&
+   svn_cmd import -m "initial" trunk "$svnrepo"/trunk &&
+   svn_cmd co "$svnrepo" tmp &&
+   (
+   cd tmp &&
+   mkdir branches tags &&
+   svn_cmd add branches tags &&
+   svn_cmd cp trunk branches/b_start &&
+   svn_cmd commit -m "start a new branch" &&
+   svn_cmd up &&
+   echo "hi" >>branches/b_start/src/b/readme &&
+   poke branches/b_start/src/b/readme &&
+   echo "hey" >>branches/b_start/src/a/readme &&
+   poke branches/b_start/src/a/readme &&
+   svn_cmd commit -m "hi" &&
+   svn_cmd up &&
+   svn_cmd cp branches/b_start tags/t_end &&
+   echo "bye" >>tags/t_end/src/b/readme &&
+   poke tags/t_end/src/b/readme &&
+   echo "aye" >>tags/t_end/src/a/readme &&
+   poke tags/t_end/src/a/readme &&
+   svn_cmd commit -m "the end" &&
+   echo "byebye" >&

[PATCH v3] git-svn: add support for prefixed globs in config

2015-12-17 Thread Victor Leschuk
Introduce prefixed globs for branches and tags in git-svn.
Globs like 'release_*' allow users to avoid long lines in config like:

branches = branches/{release_20,release_21,release_22,...}

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
Changes from v1 (in v2 I forgot to switch from `` to $() ):
* Joined implementation and test in one patch
* Fixed test code style according to current coding style guide

 Documentation/git-svn.txt|   5 ++
 perl/Git/SVN/GlobSpec.pm |   9 ++-
 t/t9168-git-svn-prefixed-glob.sh | 136 +++
 3 files changed, 145 insertions(+), 5 deletions(-)
 create mode 100755 t/t9168-git-svn-prefixed-glob.sh

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 0c0f60b..529cffe 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -1034,6 +1034,7 @@ listed below are allowed:
url = http://server.org/svn
fetch = trunk/project-a:refs/remotes/project-a/trunk
branches = branches/*/project-a:refs/remotes/project-a/branches/*
+   branches = branches/release_*:refs/remotes/project-a/branches/release_*
tags = tags/*/project-a:refs/remotes/project-a/tags/*
 
 
@@ -1044,6 +1045,10 @@ independent path component (surrounded by '/' or EOL).   
This
 type of configuration is not automatically created by 'init' and
 should be manually entered with a text-editor or using 'git config'.
 
+Also note that prefixed globs (e.g. 'release_*') match everything after prefix
+but do not match exact prefix. For example:
+'release_*' will match 'release_1' or 'release_v1' but will not match 
'release_'.
+
 It is also possible to fetch a subset of branches or tags by using a
 comma-separated list of names within braces. For example:
 
diff --git a/perl/Git/SVN/GlobSpec.pm b/perl/Git/SVN/GlobSpec.pm
index c95f5d7..a136090 100644
--- a/perl/Git/SVN/GlobSpec.pm
+++ b/perl/Git/SVN/GlobSpec.pm
@@ -11,16 +11,15 @@ sub new {
my $die_msg = "Only one set of wildcard directories " .
"(e.g. '*' or '*/*/*') is supported: '$glob'\n";
for my $part (split(m|/|, $glob)) {
-   if ($part =~ /\*/ && $part ne "*") {
-   die "Invalid pattern in '$glob': $part\n";
-   } elsif ($pattern_ok && $part =~ /[{}]/ &&
+   if ($pattern_ok && $part =~ /[{}]/ &&
 $part !~ /^\{[^{}]+\}/) {
die "Invalid pattern in '$glob': $part\n";
}
-   if ($part eq "*") {
+   if ($part =~ /(\w*)\*/) {
die $die_msg if $state eq "right";
$state = "pattern";
-   push(@patterns, "[^/]*");
+   my $pat = $1 ? "${1}[^/]+" : "[^/]*";
+   push(@patterns, $pat);
} elsif ($pattern_ok && $part =~ /^\{(.*)\}$/) {
die $die_msg if $state eq "right";
$state = "pattern";
diff --git a/t/t9168-git-svn-prefixed-glob.sh b/t/t9168-git-svn-prefixed-glob.sh
new file mode 100755
index 000..a5a3227
--- /dev/null
+++ b/t/t9168-git-svn-prefixed-glob.sh
@@ -0,0 +1,136 @@
+#!/bin/sh
+test_description='git svn globbing refspecs with prefixed globs'
+. ./lib-git-svn.sh
+
+cat >expect.end <trunk/src/a/readme &&
+   echo "goodbye world" >trunk/src/b/readme &&
+   svn_cmd import -m "initial" trunk "$svnrepo"/trunk &&
+   svn_cmd co "$svnrepo" tmp &&
+   (
+   cd tmp &&
+   mkdir branches tags &&
+   svn_cmd add branches tags &&
+   svn_cmd cp trunk branches/b_start &&
+   svn_cmd commit -m "start a new branch" &&
+   svn_cmd up &&
+   echo "hi" >>branches/b_start/src/b/readme &&
+   poke branches/b_start/src/b/readme &&
+   echo "hey" >>branches/b_start/src/a/readme &&
+   poke branches/b_start/src/a/readme &&
+   svn_cmd commit -m "hi" &&
+   svn_cmd up &&
+   svn_cmd cp branches/b_start tags/t_end &&
+   echo "bye" >>tags/t_end/src/b/readme &&
+   poke tags/t_end/src/b/readme &&
+   echo "aye" >>tags/t_end/src/a/readme &&
+   poke tags/t_end/src/a/readme &&
+   svn_cmd commit -m "the end" &

Re: [PATCH 2/2] git-svn: test for git-svn prefixed globs

2015-12-16 Thread Victor Leschuk
Hello Eric, sorry, I just copy pasted your old t9108-git-svn-glob.sh, 
changed branch names to be prefixed and added test for "exact" prefix 
match. If it is necessary I can rewrite it according to current guidelines.


On 12/17/2015 12:28 AM, Eric Wong wrote:

Thanks for this work.  Most things look fine with 1/2, comments
on 2/2 below...

Victor Leschuk <vlesc...@gmail.com> wrote:

Add test for git-svn prefixed globs.

Why a separate patch?  Unless there's some documentation purpose
for a regression, usually tests and a feature should be added
atomically in the same commit.


--- /dev/null
+++ b/t/t9168-git-svn-prefixed-glob.sh
@@ -0,0 +1,136 @@
+#!/bin/sh
+test_description='git svn globbing refspecs with prefixed globs'
+. ./lib-git-svn.sh
+
+cat > expect.end <
We prefer redirects in new code to be in the form of ">foo" (no space)
(or ">>foo" for append).

It wasn't in the old tests, either, but Documentation/CodingGuidelines
favors this for new code.


+the end
+hi
+start a new branch
+initial
+EOF

All the setup code be checked for errors with '&&' as well.


+   test "`git rev-parse refs/remotes/tags/t_end~1`" = \
+   "`git rev-parse refs/remotes/branches/b_start`" &&
+   test "`git rev-parse refs/remotes/branches/b_start~2`" = \
+   "`git rev-parse refs/remotes/trunk`" &&

And we prefer $(command) instead of `command` for nestability
as Documentation/CodingGuidelines suggests.

(yeah, most of the old tests don't follow the guidelines, but the
  guidelines also warn against fixup patches for them).

Thanks again.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] git-svn: test for git-svn prefixed globs

2015-12-16 Thread Victor Leschuk
Add test for git-svn prefixed globs.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 t/t9168-git-svn-prefixed-glob.sh | 136 +++
 1 file changed, 136 insertions(+)
 create mode 100755 t/t9168-git-svn-prefixed-glob.sh

diff --git a/t/t9168-git-svn-prefixed-glob.sh b/t/t9168-git-svn-prefixed-glob.sh
new file mode 100755
index 000..979ecd9
--- /dev/null
+++ b/t/t9168-git-svn-prefixed-glob.sh
@@ -0,0 +1,136 @@
+#!/bin/sh
+test_description='git svn globbing refspecs with prefixed globs'
+. ./lib-git-svn.sh
+
+cat > expect.end < trunk/src/a/readme &&
+   echo "goodbye world" > trunk/src/b/readme &&
+   svn_cmd import -m "initial" trunk "$svnrepo"/trunk &&
+   svn_cmd co "$svnrepo" tmp &&
+   (
+   cd tmp &&
+   mkdir branches tags &&
+   svn_cmd add branches tags &&
+   svn_cmd cp trunk branches/b_start &&
+   svn_cmd commit -m "start a new branch" &&
+   svn_cmd up &&
+   echo "hi" >> branches/b_start/src/b/readme &&
+   poke branches/b_start/src/b/readme &&
+   echo "hey" >> branches/b_start/src/a/readme &&
+   poke branches/b_start/src/a/readme &&
+   svn_cmd commit -m "hi" &&
+   svn_cmd up &&
+   svn_cmd cp branches/b_start tags/t_end &&
+   echo "bye" >> tags/t_end/src/b/readme &&
+   poke tags/t_end/src/b/readme &&
+   echo "aye" >> tags/t_end/src/a/readme &&
+   poke tags/t_end/src/a/readme &&
+   svn_cmd commit -m "the end" &&
+   echo "byebye" >> tags/t_end/src/b/readme &&
+   poke tags/t_end/src/b/readme &&
+   svn_cmd commit -m "nothing to see here"
+   ) &&
+   git config --add svn-remote.svn.url "$svnrepo" &&
+   git config --add svn-remote.svn.fetch \
+"trunk/src/a:refs/remotes/trunk" &&
+   git config --add svn-remote.svn.branches \
+"branches/b_*/src/a:refs/remotes/branches/b_*" &&
+   git config --add svn-remote.svn.tags\
+"tags/t_*/src/a:refs/remotes/tags/t_*" &&
+   git svn multi-fetch &&
+   git log --pretty=oneline refs/remotes/tags/t_end | \
+   sed -e "s/^.\{41\}//" > output.end &&
+   test_cmp expect.end output.end &&
+   test "`git rev-parse refs/remotes/tags/t_end~1`" = \
+   "`git rev-parse refs/remotes/branches/b_start`" &&
+   test "`git rev-parse refs/remotes/branches/b_start~2`" = \
+   "`git rev-parse refs/remotes/trunk`" &&
+   test_must_fail git rev-parse refs/remotes/tags/t_end@3
+   '
+
+echo try to try > expect.two
+echo nothing to see here >> expect.two
+cat expect.end >> expect.two
+
+test_expect_success 'test left-hand-side only prefixed globbing' '
+   git config --add svn-remote.two.url "$svnrepo" &&
+   git config --add svn-remote.two.fetch trunk:refs/remotes/two/trunk &&
+   git config --add svn-remote.two.branches \
+"branches/b_*:refs/remotes/two/branches/*" &&
+   git config --add svn-remote.two.tags \
+"tags/t_*:refs/remotes/two/tags/*" &&
+   (
+   cd tmp &&
+   echo "try try" >> tags/t_end/src/b/readme &&
+   poke tags/t_end/src/b/readme &&
+   svn_cmd commit -m "try to try"
+   ) &&
+   git svn fetch two &&
+   test `git rev-list refs/remotes/two/tags/t_end | wc -l` -eq 6 &&
+   test `git rev-list refs/remotes/two/branches/b_start | wc -l` -eq 3 &&
+   test `git rev-parse refs/remotes/two/branches/b_start~2` = \
+`git rev-parse refs/remotes/two/trunk` &&
+   test `git rev-parse refs/remotes/two/tags/t_end~3` = \
+`git rev-parse refs/remotes/two/branches/b_start` &&
+   git log --pretty=oneline refs/remotes/two/tags/t_end | \
+   sed -e "s/^.\{41\}//" > output.two &&
+   test_cmp expect.two output.two
+   '
+
+test_expect_success 'test prefixed globs do not match just prefix' '
+   git config --add svn-remote.three.url "$svnrepo" &&
+   git config --add 

[PATCH 1/2] git-svn: support for prefixed globs in config

2015-12-16 Thread Victor Leschuk
Introduce prefixed globs for branches and tags in git-svn.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/git-svn.txt | 5 +
 perl/Git/SVN/GlobSpec.pm  | 9 -
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 0c0f60b..529cffe 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -1034,6 +1034,7 @@ listed below are allowed:
url = http://server.org/svn
fetch = trunk/project-a:refs/remotes/project-a/trunk
branches = branches/*/project-a:refs/remotes/project-a/branches/*
+   branches = branches/release_*:refs/remotes/project-a/branches/release_*
tags = tags/*/project-a:refs/remotes/project-a/tags/*
 
 
@@ -1044,6 +1045,10 @@ independent path component (surrounded by '/' or EOL).   
This
 type of configuration is not automatically created by 'init' and
 should be manually entered with a text-editor or using 'git config'.
 
+Also note that prefixed globs (e.g. 'release_*') match everything after prefix
+but do not match exact prefix. For example:
+'release_*' will match 'release_1' or 'release_v1' but will not match 
'release_'.
+
 It is also possible to fetch a subset of branches or tags by using a
 comma-separated list of names within braces. For example:
 
diff --git a/perl/Git/SVN/GlobSpec.pm b/perl/Git/SVN/GlobSpec.pm
index c95f5d7..a136090 100644
--- a/perl/Git/SVN/GlobSpec.pm
+++ b/perl/Git/SVN/GlobSpec.pm
@@ -11,16 +11,15 @@ sub new {
my $die_msg = "Only one set of wildcard directories " .
"(e.g. '*' or '*/*/*') is supported: '$glob'\n";
for my $part (split(m|/|, $glob)) {
-   if ($part =~ /\*/ && $part ne "*") {
-   die "Invalid pattern in '$glob': $part\n";
-   } elsif ($pattern_ok && $part =~ /[{}]/ &&
+   if ($pattern_ok && $part =~ /[{}]/ &&
 $part !~ /^\{[^{}]+\}/) {
die "Invalid pattern in '$glob': $part\n";
}
-   if ($part eq "*") {
+   if ($part =~ /(\w*)\*/) {
die $die_msg if $state eq "right";
$state = "pattern";
-   push(@patterns, "[^/]*");
+   my $pat = $1 ? "${1}[^/]+" : "[^/]*";
+   push(@patterns, $pat);
} elsif ($pattern_ok && $part =~ /^\{(.*)\}$/) {
die $die_msg if $state eq "right";
$state = "pattern";
-- 
2.7.0.rc0.21.gb793f61

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] git-svn: add support for prefixed globs in config

2015-12-16 Thread Victor Leschuk
There are existing old SVN repos which use patterns in branch (and tag) names
to indicate some information. For example: branches/release_01, 
branches/release_02, etc,
however non-patterned branches co-exist with them (like branches/dev).

If someone maintains git mirror of such a repo it is reasonable to mirror only
several branches (for example those who match some pattern) and in current 
situation
it leads to messy and error-prone git config like:

branches = branches/{release_20,release_21,release_22,...

It would be useful to have an opportunity to write

branches = branches/release_*

instead of this.

Thus I suggest to add support for such 'prefixed' globs into git-svn.

Victor Leschuk (2):
  Introduce prefixed globs for branches and tags in git-svn.
  Add test for git-svn prefixed globs.

 Documentation/git-svn.txt|   5 ++
 perl/Git/SVN/GlobSpec.pm |   9 ++-
 t/t9168-git-svn-prefixed-glob.sh | 136 +++
 3 files changed, 145 insertions(+), 5 deletions(-)
 create mode 100755 t/t9168-git-svn-prefixed-glob.sh

-- 
2.7.0.rc0.21.gb793f61

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Introduce grep threads param

2015-12-15 Thread Victor Leschuk
 "git grep" can now be configured (or told from the command line)
 how many threads to use when searching in the working tree files.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/config.txt   |  4 +++
 Documentation/git-grep.txt | 12 +
 builtin/grep.c | 49 +++---
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2d06b11..cbf4071 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1450,6 +1450,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads.
+   See `grep.threads` in linkgit:git-grep[1] for more information.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..25e6dc5 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which is using 8 threads for all systems.
+   Default behavior may change in future versions
+   to better suit hardware and circumstances.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.
 
@@ -227,6 +235,10 @@ OPTIONS
effectively showing the whole function in which the match was
found.
 
+--threads ::
+   Number of grep worker threads.
+   See `grep.threads` in 'CONFIGURATION' for more information.
+
 -f ::
Read patterns from , one per line.
 
diff --git a/builtin/grep.c b/builtin/grep.c
index 4229cae..e9aebab 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };
 
-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = 0;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_unlock(_mutex);
 }
 
@@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(num_threads, sizeof(*threads));
+   for (i = 0; i < num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -238,12 +239,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}
 
+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -267,6 +270,12 @@ static int grep_cmd_config(const char *var, const char 
*value, void *cb)
int st = grep_config(var, value, cb);
if (git_color_default_config(var, value, cb) < 0)
st = -1;
+
+   if (!strcmp(var, "grep.threads")) {
+   /* Sanity check of value will be perfomed later */
+   num_threads = git_config_int(var, value);
+   }
+
return st;
 }
 
@@ -294,7 +303,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned 
char *sha1,
}
 
 #ifndef NO_PTHREADS
-   if (use_threads) {
+   if (num_threads) {
add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, path, sha1);
strbuf_release();
return 0;
@@ -323,7 +332,7 @@ static int

Re: [PATCH 1/2] Introduce grep threads param

2015-12-15 Thread Victor Leschuk

Hello Junio,

On 12/15/2015 11:06 PM, Junio C Hamano wrote:

Victor Leschuk <vlesc...@gmail.com> writes:


Subject: Re: [PATCH 1/2] Introduce grep threads param

I'll retitle this to something like

 grep: add --threads= option and grep.threads configuration

while queuing (which I did for v7 earlier).

Ok, thanks.



  "git grep" can now be configured (or told from the command line)
  how many threads to use when searching in the working tree files.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
...
+grep.threads::
+   Number of grep worker threads.

"Number of grep worker threads to use"?

Accepted, will change.



+   See `grep.threads` in linkgit:git-grep[1] for more information.
...
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which is using 8 threads for all systems.
+   Default behavior may change in future versions
+   to better suit hardware and circumstances.

The last sentence is too noisy.  Perhaps drop it and phrase it like
this instead?

 grep.threads::
 Number of grep worker threads to use.  If unset (or set to 0),
 to 0), 8 threads are used by default (for now).

Agree.



diff --git a/builtin/grep.c b/builtin/grep.c
index 4229cae..e9aebab 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
  };
  
-static int use_threads = 1;

+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = 0;

Please do not initialize static to 0 (or NULL).

Ok.



@@ -267,6 +270,12 @@ static int grep_cmd_config(const char *var, const char 
*value, void *cb)
int st = grep_config(var, value, cb);
if (git_color_default_config(var, value, cb) < 0)
st = -1;
+
+   if (!strcmp(var, "grep.threads")) {
+   /* Sanity check of value will be perfomed later */

Hmm, is that a good design?

A user may hear "invalid number of threads specified (-4)" later,
but if that came from "grep.threads", especially when the user did
not say "--threads=-4" from the command line, would she know to
check her configuration file?

If she had "grep.threads=Yes" in her configuration, we would
helpfully tell her that 'Yes' given to grep.threads is not a valid
integer.  Shouldn't we do the same for '-4' given to grep.threads,
too?

if (!strcmp(var, "grep.threads")) {
num_threads = git_config_int(var, value);
if (num_threads < 0)
die(_("invalid number of threads specified (%d) for 
%s"),
num_threads, var);
}

perhaps.
When I was doing this I looked at other code and saw that exact message 
"Invalid number of threads..." is used in other parts of git and is 
present in 'po' translations. Thus I decided to use exactly the same 
message in order not to create numerous almost similar localizations 
(which we should do if we use two different messages in different 
places). Do you think we need to create two more different messages (and 
translations, I can prepare Russian and French) and create two checks 
for cmd and config?



@@ -817,14 +827,23 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
}
  
  #ifndef NO_PTHREADS

-   if (list.nr || cached || online_cpus() == 1)
-   use_threads = 0;
+   if (list.nr || cached || online_cpus() == 1 || show_in_pager) {
+   /* Can not multi-thread object lookup */
+   num_threads = 0;

Removing 'use_threads = 0' from an earlier part and moving the check
to show_in_pager is a good idea, but it invalidates this comment.
The earlier three (actually two and a half) are "cannot" cases,
i.e. the object layer is not easily threaded without locking, and
when you have a single core, you do not truly run multiple
operations at the same time, but as [PATCH 2/2] does, threading in
"grep" is not about CPU alone, so that is why I am demoting it to
just a half ;-).  But show_in_pager is "we do not want to", I think.

In any case, this comment and "User didn't specify" below are not
telling the reader something very much useful.  You probably should
remove them.

Ok, will remove comments.



+   }
+   else if (num_threads == 0) {
+   /* User didn't specify value, or just wants default behavior */
+   num_threads = GREP_NUM_THREADS_DEFAULT;
+   }
+   else if (num_threads < 0) {
+   die(_("invalid number of threads specified (%d)"), num_threads);
+   }

Many unnecessary braces.

I put braces to make code look more unified. I had to put braces here:

+   else if (num_threads == 0) {
+   /* User didn't specify valu

[PATCH v8 0/2] Add git-grep threads param

2015-12-15 Thread Victor Leschuk
Introducing v8 of git-grep threads param patch.
Patch is now split in 2 parts - 1/2 is actually renewed v6 version (see list of 
changes below),
2/2 removes dependency on online_cpus() - as we discussed with Eric this is 
rather 
significant change in default behavior and should be placed into separate patch.

Here is list of changes since v6 ($gmane/281160):

  * Fixed broken t7811: moved all threads_num setup to 1 place (for -O option 
it was in wrong place)
  * Fixed 'invalid number of threads' message so that it could be translated
  * Got rid of grep_threads_config() - its too trivial to be separate function
  * Fixed xcalloc() args (sizeof(pthread_t) -> sizeof(*threads)) to correspond 
to general git style
  * Improved commit message (in 2/2) to explain why online_cpus() is now not 
used in threads_num setup
  * The full param documentation was moved into single place (grep.threads 
description in git-grep.txt) and is referenced from other places. Also made few 
language improvements in documentation.
  * Style improvements: splitted too long lines

Victor Leschuk (2):
  "git grep" can now be configured (or told from the command line)
how many threads to use when searching in the working tree files.
  Number of threads now doesn't depend on online_cpus(), e.g. if
specific number is not configured GREP_NUM_THREADS_DEFAULT (8)
threads will be used even on 1-core CPU.

 Documentation/config.txt   |  4 +++
 Documentation/git-grep.txt | 12 +
 builtin/grep.c | 49 +++---
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 51 insertions(+), 15 deletions(-)

-- 
2.6.3.369.g3e7f205.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Get rid of online_cpus() when determining grep threads num

2015-12-15 Thread Victor Leschuk
Number of threads now doesn't depend on online_cpus(),
e.g. if specific number is not configured GREP_NUM_THREADS_DEFAULT (8)
threads will be used even on 1-core CPU.

Reason: multithreading can improve performance even on single core machines
as IO is also a major factor here. Using multiple threads can significantly
boost grep performance when working on slow filesystems (or repo isn't cached)
or through network (for example repo is located on NFS).

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 builtin/grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index e9aebab..1315905 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -827,7 +827,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
}
 
 #ifndef NO_PTHREADS
-   if (list.nr || cached || online_cpus() == 1 || show_in_pager) {
+   if (list.nr || cached || show_in_pager) {
/* Can not multi-thread object lookup */
num_threads = 0;
}
-- 
2.6.3.369.g3e7f205.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7] Add git-grep threads param

2015-12-13 Thread Victor Leschuk
 "git grep" can now be configured (or told from the command line)
 how many threads to use when searching in the working tree files.

 Changes to default behavior: number of threads now doesn't depend
 on online_cpus(), e.g. if specific number is not configured
 GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU.

 Reason: multithreading can improve performance even on single core machines
 as IO is also a major factor here. Using multiple threads can significantly
 boost grep performance when working on slow filesystems (or repo isn't cached)
 or through network (for example repo is located on NFS).

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
History of changes from the first version ($gmane/280053/:
* Param renamed from threads-num to threads
* Short version of '--threads' cmd key was removed
* Made num_threads 'decision-tree' more obvious 
  and easy to edit for future use ($gmane/280089)
* Moved option description to more suitable place in documentation 
($gmane/280188)
* Hid threads param from 'external' grep.c, made it private for 
builtin/grep.c ($gmane/280188)
* Improved num_threads 'decision-tree', got rid of dependency on 
online_cpus ($gmane/280299)
* Improved param documentation ($gmane/280299)
* Fixed broken t7811: moved all threads_num setup to 1 place (for -O 
option it was in wrong place) ($gmane/281160)
* Fixed 'invalid number of threads' message so that it could be 
translated ($gmane/281160)
* Got rid of grep_threads_config() - its too trivial to be separate 
function ($gmane/281160)
* Fixed xcalloc() args (sizeof(pthread_t) -> sizeof(*threads)) to 
correspond to general git style ($gmane/281160)
* Improved commit message to explain why online_cpus() is now not used 
in threads_num setup ($gmane/281160)


 Documentation/config.txt   |  7 ++
 Documentation/git-grep.txt | 15 
 builtin/grep.c | 42 ++
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2d06b11..687b9ad 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1450,6 +1450,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..8222a83 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.
 
@@ -227,6 +235,13 @@ OPTIONS
effectively showing the whole function in which the match was
found.
 
+--threads ::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 -f ::
Read patterns from , one per line.
 
diff --git a/builtin/grep.c b/builtin/grep.c
index 4229cae..756b6af 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };
 
-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = 0;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one produce

Re: [PATCH v7] Add git-grep threads param

2015-12-13 Thread Victor Leschuk

Hello Eric:

Hmm, v7 doesn't seem to address any of the v6 review comments here[1].
Was that review merely overlooked or did you disagree with the
reviewer?

[1]: http://article.gmane.org/gmane.comp.version-control.git/281817

Sorry, it looks like I missed that letter. My bad.



History of changes from the first version ($gmane/280053/:

It's generally more helpful to reviewers if you provide a link to the
previous version and describe the changes since that version. (If you
prefer, it's also okay to describe all changes, but they should be
organized by version so that it's obvious which changes were made in
which version.)
I will prepare v8 and list changes starting from v6. Please kindly 
disregard this one.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v6] Add git-grep threads param

2015-11-30 Thread Victor Leschuk
Hello all, 

does anybody have time to review this patch?

PS Sorry for bothering =)

--
Best Regards,
Victor

From: Victor Leschuk [vlesc...@gmail.com]
Sent: Wednesday, November 11, 2015 03:52
To: git@vger.kernel.org
Cc: Victor Leschuk
Subject: [PATCH v6] Add git-grep threads param

"git grep" can now be configured (or told from the command line)
 how many threads to use when searching in the working tree files.

 Changes to default behavior: number of threads now doesn't depend
 on online_cpus(), e.g. if specific number is not configured
 GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
History of changes from the first version ($gmane/280053/:
* Param renamed from threads-num to threads
* Short version of '--threads' cmd key was removed
* Made num_threads 'decision-tree' more obvious
  and easy to edit for future use ($gmane/280089)
* Moved option description to more suitable place in documentation 
($gmane/280188)
* Hid threads param from 'external' grep.c, made it private for 
builtin/grep.c ($gmane/280188)
* Improved num_threads 'decision-tree', got rid of dependency on 
online_cpus ($gmane/280299)
* Improved param documentation ($gmane/280299)


 Documentation/config.txt   |  7 +
 Documentation/git-grep.txt | 15 ++
 builtin/grep.c | 50 +++---
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..5084e36 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.

+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..8222a83 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.

+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.

@@ -227,6 +235,13 @@ OPTIONS
effectively showing the whole function in which the match was
found.

+--threads ::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 -f ::
Read patterns from , one per line.

diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..f0e3dfb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };

-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = 0;

 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;

 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;

 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }

 static inline void grep_unlock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_unlock(_mutex);
 }

@@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
  

Re: What's cooking in git.git (Nov 2015, #03; Fri, 20)

2015-11-23 Thread Victor Leschuk


On 11/20/2015 05:09 PM, Jeff King wrote:

* vl/grep-configurable-threads (2015-11-01) 1 commit
  - grep: add --threads= option and grep.threads configuration

  "git grep" can now be configured (or told from the command line)
  how many threads to use when searching in the working tree files.

  I haven't reviewed v6 yet. More eyes are welcome.
Actually v6 includes only changes to special meaning of "0" (0 is now 
default behavior - 8 threads), little corrections to documentation and 
getting rid of online_cpus() in decision-tree. All as discussed in 
comments for v4/5.


--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v6] Add git-grep threads param

2015-11-16 Thread Victor Leschuk
Hello all,

The earlier version of this patch is already included in /pu branch, however as 
we all agreed ($gmane/280299) we have changed the default behavior and the 
meaning of "0". The question is: what is the right way to include changes from 
patch v6 (this one) into already merged patch to pu?

Thanks.
--
Best Regards,
Victor
____
From: Victor Leschuk [vlesc...@gmail.com]
Sent: Wednesday, November 11, 2015 03:52
To: git@vger.kernel.org
Cc: Victor Leschuk
Subject: [PATCH v6] Add git-grep threads param

"git grep" can now be configured (or told from the command line)
 how many threads to use when searching in the working tree files.

 Changes to default behavior: number of threads now doesn't depend
 on online_cpus(), e.g. if specific number is not configured
 GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
History of changes from the first version ($gmane/280053/:
* Param renamed from threads-num to threads
* Short version of '--threads' cmd key was removed
* Made num_threads 'decision-tree' more obvious
  and easy to edit for future use ($gmane/280089)
* Moved option description to more suitable place in documentation 
($gmane/280188)
* Hid threads param from 'external' grep.c, made it private for 
builtin/grep.c ($gmane/280188)
* Improved num_threads 'decision-tree', got rid of dependency on 
online_cpus ($gmane/280299)
* Improved param documentation ($gmane/280299)


 Documentation/config.txt   |  7 +
 Documentation/git-grep.txt | 15 ++
 builtin/grep.c | 50 +++---
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..5084e36 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.

+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..8222a83 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.

+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.

@@ -227,6 +235,13 @@ OPTIONS
effectively showing the whole function in which the match was
found.

+--threads ::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 -f ::
Read patterns from , one per line.

diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..f0e3dfb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };

-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = 0;

 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;

 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;

 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }

 static inline void grep_unl

[PATCH v6] Add git-grep threads param

2015-11-11 Thread Victor Leschuk
"git grep" can now be configured (or told from the command line)
 how many threads to use when searching in the working tree files.

 Changes to default behavior: number of threads now doesn't depend
 on online_cpus(), e.g. if specific number is not configured
 GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU. 

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
History of changes from the first version ($gmane/280053/:
* Param renamed from threads-num to threads
* Short version of '--threads' cmd key was removed
* Made num_threads 'decision-tree' more obvious 
  and easy to edit for future use ($gmane/280089)
* Moved option description to more suitable place in documentation 
($gmane/280188)
* Hid threads param from 'external' grep.c, made it private for 
builtin/grep.c ($gmane/280188)
* Improved num_threads 'decision-tree', got rid of dependency on 
online_cpus ($gmane/280299)
* Improved param documentation ($gmane/280299)


 Documentation/config.txt   |  7 +
 Documentation/git-grep.txt | 15 ++
 builtin/grep.c | 50 +++---
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..5084e36 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..8222a83 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.
 
@@ -227,6 +235,13 @@ OPTIONS
effectively showing the whole function in which the match was
found.
 
+--threads ::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suit hardware and circumstances.
+
 -f ::
Read patterns from , one per line.
 
diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..f0e3dfb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };
 
-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = 0;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_unlock(_mutex);
 }
 
@@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(num_threads, sizeof(pthread_t));
+   for (i = 0; i < num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -238,12 

[PATCH v5] Add git-grep threads param

2015-11-10 Thread Victor Leschuk
"git grep" can now be configured (or told from the command line)
 how many threads to use when searching in the working tree files.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/config.txt   |  7 +
 Documentation/git-grep.txt | 15 ++
 builtin/grep.c | 50 +++---
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..467fa7b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suite hardware and circumstances.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..91027b6 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,13 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suite hardware and circumstances.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.
 
@@ -227,6 +235,13 @@ OPTIONS
effectively showing the whole function in which the match was
found.
 
+--threads ::
+   Number of grep worker threads, use it to tune up performance on
+   your machines. Leave it unset (or set to 0) for default behavior,
+   which for now is using 8 threads for all systems.
+   Default behavior can be changed in future versions
+   to better suite hardware and circumstances.
+
 -f ::
Read patterns from , one per line.
 
diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..f0e3dfb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };
 
-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = 0;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_unlock(_mutex);
 }
 
@@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(num_threads, sizeof(pthread_t));
+   for (i = 0; i < num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -238,12 +239,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}
 
+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -262,10 +265,19 @@ static int wait_all(void)
 }
 #endif
 
+static int grep_threads_config(const char *var, const char *value, void *cb)
+{
+   if (!strcmp(var, "grep.threads"))
+   num_threads = git_config_int(var, value); /* Sanity check of 
value will be perfomed later */
+   return 0;
+}
+
 static int grep_cmd_config(const char *

RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk

Correct.

I think adding the option (both to command line and to config file) is good, as 
long as the IO issues are documented. And default to just the fixed number of 
threads for now - and with the option, maybe people can then more easily try 
out different scenarios and maybe improve on the particular choice of fixed 
number of threads.

> Or maybe the default should be something that isn't even described by any 
> particular fixed number.

> For example, maybe the default that value could be something quite dynamic: 
> start off with a single thread (or a very low thread number) and just set a 
> timer. After 0.1s, if CPU usage is low, start more threads. After another 
> 0.1s, if that improved things, maybe > we could add still more threads...

> Note that "CPU usage is low" can be hard to get portably, but we could 
> approximate it with "how much work did we actually get done". If we only 
> grepped a couple of files, that might be because of IO issues. And if speed 
> does not improve when we move from a 
> single thread to, say, four threads, then we should probably *not* increase 
> the thread number again at 0.2s.

> So I think there are many possible avenues to explore that might be 
> interesting. I do *not* think that "online_cpus()" is one of them, > except 
> perhaps as a very rough measure of "is this a beefy system or not" (but even 
> that is questionable - 32 CPUs is definitely 
> likely "very beefy, so use lots of threads", but even 8 CPUs might still be 
> just a phone, and I'm not sure that tells you a lot, really.
Linus

Here is my version of note for Documentaion:

Number of grep worker threads, use it to tune up performance on
your machines. Leave it unset or set to "0" if you want to use default 
number
(currently default number is 8 for all systems, however this behavior 
can
 be changed in future versions to better suite your hardware and 
circumstances).


--
Victor

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk

 > if (list.nr || cached)
>num_threads = 1;
 >  if (!num_threads)
> num_threads = GREP_NUM_THREADS_DEFAULT;

>  and then later, instead of use_threads, do:

 >  if (num_threads <= 1) {
... do single-threaded version ...
  > } else {
... do multi-threaded version ...
  > }

 > That matches the logic in builtin/pack-objects.c.

Maybe use the simplest version (and keep num_numbers == 0 also as flag for all 
other checks in code like if(num_flags)  ):

if (list.nr || cached )
  num_threads = 0; // do not use threads
else if (num_threads == 0)
  num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;
else if (num_threads < 0)
  die(...)

// here we are num_threads > zero, so do nothing


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk
On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk
<vlesc...@accesssoftek.com> wrote:
>
> Maybe use the simplest version (and keep num_numbers == 0 also as flag for 
> all other checks in code like if(num_flags)  ):
>
> if (list.nr || cached )
>   num_threads = 0; // do not use threads
> else if (num_threads == 0)
>   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;

> I will say this AGAIN.

> The number of threads is *not* about the number of CPU's. Stop this
craziness. It's wrong.

Actually I have never said the nCPUs played main role in it. The patch is 
intended to provide user ability to change this threads number according to 
their needs and to touch as small amount of other code as possible.

> The number of threads is about parallelism. Yes, CPU's is a small part
> of it. But as mentioned earlier, the *big* wins are for slow
> filesystems, NFS in particular. On NFS, even if you have things
> cached, the cache revalidation is going to cause network traffic
> almost every time. Being able to have several of those outstanding is
> a big deal.

> So stop with the "online_cpus()" stuff. And don't base your benchmarks
> purely on the CPU-bound case. Because the CPU-bound case is the case
> that is already generally so good that few people will care all *that*
> deeply.

I have performed a cold-cached FS test (in previous thread to minimize the CPU 
part in the results) and it showed high correlation between speed and 
thread_num. Isn't it what you said? Even on systems with small number of cores 
we can gain profit of multithreading. That's I why I suggest this param to be 
customizable and not HARDCODED.

We need to create a clear text for the documentation that this number should 
not based on number of CPU-s only. Currently do not mention anything on it.

--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk
> Why don't we leave it at 8, then? That's the conservative choice, and
> once we have --threads, people can easily experiment with different
> values and we can follow-up with a change to the default if need be.

I'd propose the following:

if (list.nr || cached) {

num_threads = 0; /* Can not multi-thread object lookup */   

}   

else if (num_threads < 0 && online_cpus() <= 1) {   

num_threads = 0; /* User didn't set threading option and we have <= 1 
of hardware cores */  
}   

else if (num_threads == 0) {

num_threads = GREP_NUM_THREADS_DEFAULT; /* User explicitly choose 
default behavior */   
}   

else if (num_threads < 0) {  /* Actually this one should be checked earlier 
so no need to double check here */  
 
die(_("Ivalid number of threads specified (%d)"), num_threads)  

} 

--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk
.

> In the meantime I'd argue for just getting rid of the online_cpu's
> check, because

>  (a) I think it's actively misleading

>  (b) the threaded grep probably doesn't hurt much even on a single
> CPU, and the _potential_ upside from IO could easily dwarf the cost.

>  (c) do developers actually have single-core machines any more?

  Linus

So as far as I understood your point at current moment would be better to leave 
online_cpus() check behind,
keep the default threads value (and do so for other threaded programs, in 
separate patches of course).

After that we may focus (in future) on developing smarter heuristics for 
parallelity-relationed params.

Also we should specify in documentation that number of your CPUs may not the 
optimal value and customer should find his own best values based on other 
circumstances.

Correct?

--
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk


From: Jeff King [p...@peff.net]
Sent: Tuesday, November 03, 2015 22:40
To: Junio C Hamano
Cc: Victor Leschuk; git@vger.kernel.org; Victor Leschuk; 
torva...@linux-foundation.org; j...@keeping.me.uk
Subject: Re: [PATCH v4] Add git-grep threads param

On Tue, Nov 03, 2015 at 09:22:09AM -0800, Junio C Hamano wrote:

> > +grep.threads::
> > +   Number of grep worker threads, use it to tune up performance on
> > +   multicore machines. Default value is 8. Set to 0 to disable threading.
> > +
>
> I am not enthused by this "Set to 0 to disable".  As Zero is
> magical, it would be more useful if 1 meant that threading is not
> used (i.e. there is only 1 worker), and 0 meant that we would
> automatically pick some reasonable parallelism for you (and we
> promise that the our choice would not be outrageously wrong), or
> something like that.

> Not just useful, but consistent with other parts of git, like
> pack.threads, where "0" already means "autodetect".

Hello Peff and Junio,

Yeah do also think it would be more reasonable to use "0" for "autodetect" 
default value. However chat this autodetect value should be?

For index index-pack and pack-objects we use ncpus() for this, however 
according to my tests this wouldn't an Ideal for all cases. Maybe it should be 
something like ncpus()*2, 
anyway before it we even used hard-coded 8 for all systems...

In this case we use "1" as "Do not use threads" and die on all negative numbers 
during parsing.

Agreed?
--
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-03 Thread Victor Leschuk
>> do we have any objections on this patch?

> The question you should be asking is "do we have any support".

Hello all, CCing participated reviewers. As Junio  has correctly mentioned: "Do 
we have any support for including this functionality?"

I think this kind of customization can be useful in tuning up performance as 
confirmed by conducted tests. And in most cases having anything hard-coded is 
worse than giving users opportunity to change it, isn't it?
--
Best Regards,
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-01 Thread Victor Leschuk
Hello all,

do we have any objections on this patch?

--
Best Regards,
Victor

From: Victor Leschuk [vlesc...@gmail.com]
Sent: Tuesday, October 27, 2015 14:22
To: git@vger.kernel.org
Cc: Victor Leschuk
Subject: [PATCH v4] Add git-grep threads param

Make number of git-grep worker threads a configuration parameter.
According to several tests on systems with different number of CPU cores
the hard-coded number of 8 threads is not optimal for all systems:
tuning this parameter can significantly speed up grep performance.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/config.txt   |  4 +++
 Documentation/git-grep.txt |  9 ++
 builtin/grep.c | 56 --
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..1dd2a61 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.

+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8. Set to 0 to disable threading.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..e766596 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.

+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8. Set to 0 to disable threading.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.

@@ -227,6 +232,10 @@ OPTIONS
effectively showing the whole function in which the match was
found.

+--threads ::
+   Set number of worker threads to . Default is 8.
+   Set to 0 to disable threading.
+
 -f ::
Read patterns from , one per line.

diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..694553e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };

-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = -1;

 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;

 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;

 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }

 static inline void grep_unlock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_unlock(_mutex);
 }

@@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}

-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(num_threads, sizeof(pthread_t));
+   for (i = 0; i < num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -238,12 +239,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();

-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}

+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -262,10 +265,22 @@ static int wait_all(void)
 }
 #endif

+static int grep_threads_config(const char *var, const char *value, void *cb)
+{
+   if (!strcmp(var, "grep.threads")) {
+   num_threads = git_config_int(var, value);
+   if (num_threads < 0)
+   die("Invalid number of threads specified (%d)", 
num_threads);
+   }
+   return 0;
+}
+
 static int grep_cmd_config(const char *var, const char *value, vo

RE: [PATCH 3/6] Facilitate debugging Git executables in tests with gdb

2015-10-31 Thread Victor Leschuk


> >   +if test -n "$TEST_GDB_GIT"
> > +then
> > +   exec gdb -args "${GIT_EXEC_PATH}/@@PROG@@" "$@"
> Maybe we could make $TEST_GDB_GIT not just a boolean flag? It would be useful
> to contain "gdb" executable name. It would allow to set path to GDB when it
> is not in $PATH, set different debuggers (for example, I usually use cgdb),
> or even set it to /path/to/gdb_wrapper.sh which could contain different gdb
> options and tunings.

> Sure, as long as TEST_GDB_GIT=1 still works. Why don't you make an add-on
> patch and submit it?

Hello Johannes,

Sure, I will prepare the patch as soon as this one is included in master.
--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: Segfault when doing "git diff"

2015-10-28 Thread Victor Leschuk



On 10/28/2015 02:58 PM, Mathias L. Baumann wrote:

Hello dear git devs,

I just stumbled upon a segfault when doing just "git diff" in my repo.

I managed to create a minimal repo setup where the bug is reproducable.

The problem seems to be a mix of having an untracked submodule and 
having set an alternates file for one submodule.


Attached you'll find my setup that will reproduce the problem. Simply 
run  'git diff' in bugtest1.


In case the attachment is refused, I also uploaded it here:

http://supraverse.net/bugdemo.tar.gz

cheers,

--Marenz

Hello Marenz,

I have just tried to reproduce segfault with the provided archive:

[del@del-debian bugtest1 (master)]$ git diff
diff --git a/submodules/bugtest2 b/submodules/bugtest2
--- a/submodules/bugtest2
+++ b/submodules/bugtest2
@@ -1 +1 @@
-Subproject commit cd0b9ee2946d2df3626943347332a4d86f93b126
+Subproject commit cd0b9ee2946d2df3626943347332a4d86f93b126-dirty

No segfault occured. I am using

git version 2.6.2.308.g3b8f10c

Could you please specify which version of git you are using and also try 
to reproduce it with latest 2.6.2?


--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: Segfault when doing "git diff"

2015-10-28 Thread Victor Leschuk



On 10/28/2015 04:35 PM, Mathias L. Baumann wrote:

I was using the latest git version 2.6.2 already.
I suspect it is due to a .gitconfig. This is what is probably required:

➜  ~  cat .gitconfig
[diff]
submodule = log

Yep, that did the trick.

The segfault is from

sha1_file.c:

/* add the alternate entry */
 *alt_odb_tail = ent; /* <= alt_obd_tail is NULL here */
alt_odb_tail = &(ent->next);
ent->next = NULL;

Will try to take a closer look at it.

--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] Facilitate debugging Git executables in tests with gdb

2015-10-28 Thread Victor Leschuk


  
+if test -n "$TEST_GDB_GIT"

+then
+   exec gdb -args "${GIT_EXEC_PATH}/@@PROG@@" "$@"
Maybe we could make $TEST_GDB_GIT not just a boolean flag? It would be 
useful to contain "gdb" executable name. It would allow to set path to 
GDB when it is not in $PATH, set different debuggers (for example, I 
usually use cgdb), or even set it to /path/to/gdb_wrapper.sh which could 
contain different gdb options and tunings.


--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3] Add git-grep threads param

2015-10-27 Thread Victor Leschuk
Hello Linus, 

>> According to several tests on systems with different number of CPU cores
>> the hard-coded number of 8 threads is not optimal for all systems:

> Did you also compare cold-cache filesystem performance?

>  One of the reasons for doing threaded grep is for CPU scaling. But another 
> is for IO scaling. If your git tree is over NFS, doing grep eight threads at 
> a time if likely going to make things much faster even if you are on a single 
> CPU.

Yes, I have performed tests on cold-cache FS and it looks like number of 
threads affects performance. Here are the results for grepping linux kernel 
repo on a 4-core machine (similar test was conducted on 8-core machine):

Threads: 4 Time: 39.13
Threads: 8 Time: 34.39
Threads: 16 Time: 31.46
Threads: 32 Time: 27.40

Here is test scenario:

#!/bin/bash
TIMEFORMAT=%R
GIT=/home/del/git-dev/bin/git
TESTS=10
for n in 4 8 16 32; do
echo -n "Threads: $n Time: "
for i in $(seq 1 $TESTS); do
echo 3 > /proc/sys/vm/drop_caches
time $GIT grep --threads $n -e '#define' --and \( -e MAX_PATH -e 
PATH_MAX \)  >/dev/null
done 2>&1 | awk -v ntests=${TESTS} '{sum+=$1} END{printf "%.2f\n", 
sum/ntests}'
done

Note: With hot-cache grepping with 4 threads gives fastest results on both 
4-core and 8-core machines.

Thus I think it can be useful for users to be able to tune the threads number 
according to their needs.

--
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] Add git-grep threads param

2015-10-27 Thread Victor Leschuk
Make number of git-grep worker threads a configuration parameter.
According to several tests on systems with different number of CPU cores
the hard-coded number of 8 threads is not optimal for all systems:
tuning this parameter can significantly speed up grep performance.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/config.txt   |  4 +++
 Documentation/git-grep.txt |  9 ++
 builtin/grep.c | 56 --
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..1dd2a61 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8. Set to 0 to disable threading.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..e766596 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8. Set to 0 to disable threading.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.
 
@@ -227,6 +232,10 @@ OPTIONS
effectively showing the whole function in which the match was
found.
 
+--threads ::
+   Set number of worker threads to . Default is 8.
+   Set to 0 to disable threading.
+
 -f ::
Read patterns from , one per line.
 
diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..694553e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };
 
-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = -1;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_unlock(_mutex);
 }
 
@@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(num_threads, sizeof(pthread_t));
+   for (i = 0; i < num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -238,12 +239,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}
 
+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -262,10 +265,22 @@ static int wait_all(void)
 }
 #endif
 
+static int grep_threads_config(const char *var, const char *value, void *cb)
+{
+   if (!strcmp(var, "grep.threads")) {
+   num_threads = git_config_int(var, value);
+   if (num_threads < 0)
+   die("Invalid number of threads specified (%d)", 
num_threads);
+   }
+   return 0;
+}
+
 static int grep_cmd_config(const char *var, const char *value, void *cb)
 {
int st = grep_config(var, value, cb);
-   if (git_color_default_config(var, value, cb) < 0)
+   if (grep_threads_config(var, value, cb) < 0)
+   st = -1;
+   else if (git_color_default_config(var, value, cb) < 0)
st = -1;
return st;
 

RE: [PATCH v3] Add git-grep threads param

2015-10-27 Thread Victor Leschuk
Hello John,

>> This thought also crossed my mind, however we already pass grep_opt to
>> start_threads() function, so I think passing it to wait_all() is not
>> that ugly, and kind of symmetric. And I do not like the idea of
>> duplicating same information in different places. What do you think?

> The grep_opt in start_threads() is being passed through to run(), so it
> seems slightly different to me.  If the threads were being setup in
> grep.c (as opposed to builtin/grep.c) then I'd agree that it belongs in
> grep_opt, but since this is local to this particular user of the grep
> infrastructure adding num_threads to the grep_opt structure at all feels
> wrong to me.

> Note that I wasn't suggesting passing num_threads as a parameter to
> wait_all(), but rather having it as global state that is accessed by
> wait_all() in the same way as the `threads` array.

> If we rename use_threads to num_threads and just use that, then we only
> have the information in one place don't we?

Yeah, I understood your idea. So we parse config_value directly to 

static int num_threads; /* old use_threads */

And use it internally in builtin/grep.c. I think you are right.

Looks like grep_cmd_config() is the right place to parse it. Something like:

--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -267,6 +267,8 @@ static int wait_all(struct grep_opt *opt)
 static int grep_cmd_config(const char *var, const char *value, void *cb)
 {
int st = grep_config(var, value, cb);
+   if (thread_config(var, value, cb) < 0)
+   st = -1;
if (git_color_default_config(var, value, cb) < 0)
st = -1;
return st;

What do you think?

--
Best Regards,
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3] Add git-grep threads param

2015-10-26 Thread Victor Leschuk
Hello John,

see comments inline.

>> @@ -22,6 +22,7 @@ SYNOPSIS
>>  [--color[=] | --no-color]
>>  [--break] [--heading] [-p | --show-function]
>>  [-A ] [-B ] [-C ]
>> +[--threads ]

> Is this the best place for this option?  I know the current list isn't
> sorted in any particular way, but here you're splitting up the set of
> context options (`-A`, `-B`, `-C` and `-W`).

Agree, I'll move the option both here and in documentation.

>> -static int wait_all(void)
>> +static int wait_all(struct grep_opt *opt)

> I'm not sure passing a grep_opt in here is the cleanest way to do this.
> Options are a UI concept and all we care about here is the number of
> threads.

> Since `threads` is a global, shouldn't the number of threads be a global
> as well?  Could we reuse `use_threads` here (possibly renaming it
> `num_threads`)?

This thought also crossed my mind, however we already pass grep_opt to 
start_threads() function,
so I think passing it to wait_all() is not that ugly, and kind of symmetric. 
And I do not like the idea
of duplicating same information in different places. What do you think?

--
Best Regards,
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Add git-grep threads param

2015-10-26 Thread Victor Leschuk
Make number of git-grep worker threads a configuration parameter.
According to several tests on systems with different number of CPU cores
the hard-coded number of 8 threads is not optimal for all systems:
tuning this parameter can significantly speed up grep performance.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/config.txt   |  4 
 Documentation/git-grep.txt |  4 
 builtin/grep.c | 34 ++
 contrib/completion/git-completion.bash |  1 +
 grep.c | 10 ++
 grep.h |  2 ++
 6 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..1c95587 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..fbd4f83 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -22,6 +22,7 @@ SYNOPSIS
   [--color[=] | --no-color]
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
+  [--threads ]
   [-W | --function-context]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
@@ -220,6 +221,9 @@ OPTIONS
Show  leading lines, and place a line containing
`--` between contiguous groups of matches.
 
+--threads ::
+   Set number of worker threads to . Default is 8.
+
 -W::
 --function-context::
Show the surrounding text from the previous line containing a
diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..5ef1b07 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -27,8 +27,7 @@ static char const * const grep_usage[] = {
 static int use_threads = 1;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(opt->num_threads, sizeof(pthread_t));
+   for (i = 0; i < opt->num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt)
}
 }
 
-static int wait_all(void)
+static int wait_all(struct grep_opt *opt)
 {
int hit = 0;
int i;
@@ -238,12 +238,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < opt->num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}
 
+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -256,7 +258,7 @@ static int wait_all(void)
 }
 #else /* !NO_PTHREADS */
 
-static int wait_all(void)
+static int wait_all(struct grep_opt *opt)
 {
return 0;
 }
@@ -702,6 +704,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
N_("show  context lines before matches")),
OPT_INTEGER('A', "after-context", _context,
N_("show  context lines after matches")),
+   OPT_INTEGER(0, "threads", _threads,
+   N_("use  worker threads")),
OPT_NUMBER_CALLBACK(, N_("shortcut for -C NUM"),
context_callback),
OPT_BOOL('p', "show-function", ,
@@ -832,8 +836,22 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
}
 
 #ifndef NO_PTHREADS
-   if (list.nr || cached || online_cpus() == 1)
+   if (!opt.num_threads) {
+   use_threads = 0; /* User explicitely told not to use threads */
+   }
+   else if (list.nr || cached) {
+   use_threads = 0; /* Can not multi-thread object lookup */
+   }
+   else if (opt.num_threads >= 0) {
+   use_threads = 1; /* User explicitely set the number of threads 
*/
+   }
+   else if

Re: git --literal-pathspecs add -u says "fatal: pathspec ':/' did not match any files"

2015-10-24 Thread Victor Leschuk

Hello Noam,

The problem is that in the absence of explicit argument we set the list 
of files to special path ":/" which means repo root:


if ((0 < addremove_explicit || take_worktree_changes) && !argc) {
static const char *whole[2] = { ":/", NULL };
argc = 1;
argv = whole;
}

And after that we treat it as regular file


if (!seen[i] && path[0] &&
((pathspec.items[i].magic &
  (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
 !file_exists(path))) {  /* < 
file_exists() here just checks lstat() result */
Maybe it'll make sense to modify file_exists() to treat ":/" specially. 
Something like that:


diff --git a/dir.c b/dir.c
index 109ceea..6cae3b9 100644
--- a/dir.c
+++ b/dir.c
@@ -2103,6 +2103,10 @@ int read_directory(struct dir_struct *dir, const 
char *path, int len, const stru

 int file_exists(const char *f)
 {
struct stat sb;
+   if (!strcmp(f, ":/")) {
+   /* ":/" - root dir, always exists */
+   return 1;
+   }
return lstat(f, ) == 0;
 }

--
Victor

On 10/24/2015 02:39 AM, Noam Postavsky wrote:

~/tmp/tmprepo$ git init
Initialized empty Git repository in /home/npostavs/tmp/tmprepo/.git/
~/tmp/tmprepo$ git --literal-pathspecs add -u
fatal: pathspec ':/' did not match any files
~/tmp/tmprepo$ git --version
git version 2.6.1

It was reported[1] that 2.0.2 and several following versions also fail
with the same error; I found that version 1.9.5 succeeds.

Adding a "." argument:

git --literal-pathspecs add -u .

succeeds in all versions.

[1]: https://github.com/magit/magit/issues/2354#issuecomment-150665961
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Add git-grep threads-num param

2015-10-23 Thread Victor Leschuk
It's a follow up to "[PATCH] Add git-grep threads-num param":

Make number of git-grep worker threads a configuration parameter.
I have run several tests on systems with different number of CPU cores.
It appeared that the hard-coded number 8 lowers performance on both of my 
systems:
on my 4-core and 8-core systems the thread number of 4 worked about 20% faster 
than
default 8. So I think it is better to allow users tune this parameter.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/config.txt   |  4 
 Documentation/git-grep.txt |  4 
 builtin/grep.c | 20 
 contrib/completion/git-completion.bash |  1 +
 grep.c | 11 +++
 grep.h |  2 ++
 6 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..1c95587 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..fbd4f83 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -22,6 +22,7 @@ SYNOPSIS
   [--color[=] | --no-color]
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
+  [--threads ]
   [-W | --function-context]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
@@ -220,6 +221,9 @@ OPTIONS
Show  leading lines, and place a line containing
`--` between contiguous groups of matches.
 
+--threads ::
+   Set number of worker threads to . Default is 8.
+
 -W::
 --function-context::
Show the surrounding text from the previous line containing a
diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..3950725 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -27,8 +27,7 @@ static char const * const grep_usage[] = {
 static int use_threads = 1;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(opt->num_threads, sizeof(pthread_t));
+   for (i = 0; i < opt->num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt)
}
 }
 
-static int wait_all(void)
+static int wait_all(struct grep_opt *opt)
 {
int hit = 0;
int i;
@@ -238,12 +238,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < opt->num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}
 
+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -256,7 +258,7 @@ static int wait_all(void)
 }
 #else /* !NO_PTHREADS */
 
-static int wait_all(void)
+static int wait_all(struct grep_opt *opt)
 {
return 0;
 }
@@ -702,6 +704,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
N_("show  context lines before matches")),
OPT_INTEGER('A', "after-context", _context,
N_("show  context lines after matches")),
+   OPT_INTEGER(0, "threads", _threads,
+   N_("use  worker threads")),
OPT_NUMBER_CALLBACK(, N_("shortcut for -C NUM"),
context_callback),
OPT_BOOL('p', "show-function", ,
@@ -832,7 +836,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
}
 
 #ifndef NO_PTHREADS
-   if (list.nr || cached || online_cpus() == 1)
+   if (list.nr || cached || online_cpus() == 1 || opt.num_threads <= 1)
use_threads = 0;
 #else
use_threads = 0;
@@ -910,7 +914,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
}
 

RE: [PATCH] Add git-grep threads-num param

2015-10-22 Thread Victor Leschuk
Hello John,

I haven't noticed the --threads option in pack-objects, I will fix the patch to 
make naming more uniform. Do you have any comments regarding the functionality?

--
Best Regards,
Victor

From: John Keeping [j...@keeping.me.uk]
Sent: Thursday, October 22, 2015 7:23 AM
To: Victor Leschuk
Cc: git@vger.kernel.org; Victor Leschuk
Subject: Re: [PATCH] Add git-grep threads-num param

On Thu, Oct 22, 2015 at 04:23:56PM +0300, Victor Leschuk wrote:
> Hello all, I suggest we make number of git-grep worker threads a configuration
> parameter. I have run several tests on systems with different number of CPU 
> cores.
> It appeared that the hard-coded number 8 lowers performance on both of my 
> systems:
> on my 4-core and 8-core systems the thread number of 4 worked about 20% 
> faster than
> default 8. So I think it is better to allow users tune this parameter.

For git-pack-objects we call the command-line parameter "--threads" and
the config variable "pack.threads".  Is there a reason not to use the
same name here (i.e. "--threads" and "grep.threads")?

> Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
> ---
>  Documentation/config.txt   |  4 
>  Documentation/git-grep.txt |  5 +
>  builtin/grep.c | 20 +---
>  contrib/completion/git-completion.bash |  1 +
>  grep.c | 15 +++
>  grep.h |  4 
>  6 files changed, 42 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 391a0c3..c3df20c 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1447,6 +1447,10 @@ grep.extendedRegexp::
>   option is ignored when the 'grep.patternType' option is set to a value
>   other than 'default'.
>
> +grep.threadsNum::
> + Number of grep worker threads, use it to tune up performance on
> + multicore machines. Default value is 8.
> +
>  gpg.program::
>   Use this custom program instead of "gpg" found on $PATH when
>   making or verifying a PGP signature. The program must support the
> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 4a44d6d..e9ca265 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -22,6 +22,7 @@ SYNOPSIS
>  [--color[=] | --no-color]
>  [--break] [--heading] [-p | --show-function]
>  [-A ] [-B ] [-C ]
> +[-t ]
>  [-W | --function-context]
>  [-f ] [-e] 
>  [--and|--or|--not|(|)|-e ...]
> @@ -220,6 +221,10 @@ OPTIONS
>   Show  leading lines, and place a line containing
>   `--` between contiguous groups of matches.
>
> +-t ::
> +--threads-num ::
> + Set number of worker threads to . Default is 8.
> +
>  -W::
>  --function-context::
>   Show the surrounding text from the previous line containing a
> diff --git a/builtin/grep.c b/builtin/grep.c
> index d04f440..9b4fc47 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -27,8 +27,7 @@ static char const * const grep_usage[] = {
>  static int use_threads = 1;
>
>  #ifndef NO_PTHREADS
> -#define THREADS 8
> -static pthread_t threads[THREADS];
> +static pthread_t *threads;
>
>  /* We use one producer thread and THREADS consumer
>   * threads. The producer adds struct work_items to 'todo' and the
> @@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt)
>   strbuf_init([i].out, 0);
>   }
>
> - for (i = 0; i < ARRAY_SIZE(threads); i++) {
> + threads = xmalloc(sizeof(pthread_t) * opt->num_threads);
> + for (i = 0; i < opt->num_threads; i++) {
>   int err;
>   struct grep_opt *o = grep_opt_dup(opt);
>   o->output = strbuf_out;
> @@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt)
>   }
>  }
>
> -static int wait_all(void)
> +static int wait_all(struct grep_opt *opt)
>  {
>   int hit = 0;
>   int i;
> @@ -238,12 +238,14 @@ static int wait_all(void)
>   pthread_cond_broadcast(_add);
>   grep_unlock();
>
> - for (i = 0; i < ARRAY_SIZE(threads); i++) {
> + for (i = 0; i < opt->num_threads; i++) {
>   void *h;
>   pthread_join(threads[i], );
>   hit |= (int) (intptr_t) h;
>   }
>
> + free(threads);
> +
>   pthread_mutex_destroy(_mutex);
>   pthread_mutex_destroy(_read_mutex);
>   pthread_mutex_destroy(_attr_mutex);
> @@ -256,7 +258,7 @@ static int wait_all(void)
>  }
>  #

[PATCH] Add git-grep threads-num param

2015-10-22 Thread Victor Leschuk
Hello all, I suggest we make number of git-grep worker threads a configuration
parameter. I have run several tests on systems with different number of CPU 
cores.
It appeared that the hard-coded number 8 lowers performance on both of my 
systems:
on my 4-core and 8-core systems the thread number of 4 worked about 20% faster 
than
default 8. So I think it is better to allow users tune this parameter.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/config.txt   |  4 
 Documentation/git-grep.txt |  5 +
 builtin/grep.c | 20 +---
 contrib/completion/git-completion.bash |  1 +
 grep.c | 15 +++
 grep.h |  4 
 6 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..c3df20c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threadsNum::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..e9ca265 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -22,6 +22,7 @@ SYNOPSIS
   [--color[=] | --no-color]
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
+  [-t ]
   [-W | --function-context]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
@@ -220,6 +221,10 @@ OPTIONS
Show  leading lines, and place a line containing
`--` between contiguous groups of matches.
 
+-t ::
+--threads-num ::
+   Set number of worker threads to . Default is 8.
+
 -W::
 --function-context::
Show the surrounding text from the previous line containing a
diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..9b4fc47 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -27,8 +27,7 @@ static char const * const grep_usage[] = {
 static int use_threads = 1;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xmalloc(sizeof(pthread_t) * opt->num_threads);
+   for (i = 0; i < opt->num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt)
}
 }
 
-static int wait_all(void)
+static int wait_all(struct grep_opt *opt)
 {
int hit = 0;
int i;
@@ -238,12 +238,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < opt->num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}
 
+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -256,7 +258,7 @@ static int wait_all(void)
 }
 #else /* !NO_PTHREADS */
 
-static int wait_all(void)
+static int wait_all(struct grep_opt *opt)
 {
return 0;
 }
@@ -702,6 +704,10 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
N_("show  context lines before matches")),
OPT_INTEGER('A', "after-context", _context,
N_("show  context lines after matches")),
+#ifndef NO_PTHREADS
+   OPT_INTEGER('t', "threads-num", _threads,
+   N_("use  worker threads")),
+#endif /* !NO_PTHREADS */
OPT_NUMBER_CALLBACK(, N_("shortcut for -C NUM"),
context_callback),
OPT_BOOL('p', "show-function", ,
@@ -910,7 +916,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
}
 
if (use_threads)
-   hit |= wait_all();
+   hit |= wait_all();
if (hit && show_in_pager)
run_pager(, prefix);
free_grep_patterns();
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
i

RE: thread-utils: build with NO_PTHREADS fails

2015-10-14 Thread Victor Leschuk
Hello Junio, 

sorry that was my fault, I was building it wrong way (defined NO_PTHREADS in 
CFLAGS variable, not as separate make variable). Sorry for the false alarm.

--
Best Regards,
Victor

From: Junio C Hamano [jch2...@gmail.com] On Behalf Of Junio C Hamano 
[gits...@pobox.com]
Sent: Monday, October 12, 2015 10:55 AM
To: Victor Leschuk
Cc: git@vger.kernel.org; vlesc...@gmail.com
Subject: Re: thread-utils: build with NO_PTHREADS fails

Junio C Hamano <gits...@pobox.com> writes:

> Victor Leschuk <vlesc...@accesssoftek.com> writes:
>
>> I think that no one tried it for a long time but I needed a
>> single-threaded git version for debug purpose. I tried to build
>> with -DNO_PTHREADS and thread-utils.c failed to compile.
>>
>> In brief the situation is the following:
>>
>> in header file we have something like that:
>>
>>
>> #ifndef NO_PTHREAD
>> extern int online_cpus(void);
>>
>> #else
>> #define online_cpus() 1
>> #endif // NO_PTHREAD
>>
>> and in *.c file:
>>
>>
>> int online_cpus(void)
>> {
>> // ...
>> }
>
> Yeah, that is obviously incorrect.
> ...

Well, no, I spoke too early.  I do not see there is much wrong here.

There is this bit in the Makefile:

ifdef NO_PTHREADS
BASIC_CFLAGS += -DNO_PTHREADS
else
BASIC_CFLAGS += $(PTHREAD_CFLAGS)
EXTLIBS += $(PTHREAD_LIBS)
LIB_OBJS += thread-utils.o
endif

The source file thread-utils.c is not compiled to thread-utils.o if
you say NO_PTHREADS, and the resulting libgit.a does not of course
have it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


thread-utils: build with NO_PTHREADS fails

2015-10-11 Thread Victor Leschuk
Hello all,

I think that no one tried it for a long time but I needed a single-threaded git 
version for debug purpose. I tried to build with -DNO_PTHREADS and 
thread-utils.c failed to compile.

In brief the situation is the following:

in header file we have something like that:


#ifndef NO_PTHREAD
extern int online_cpus(void);

#else
#define online_cpus() 1
#endif // NO_PTHREAD

and in *.c file:


int online_cpus(void)
{
// ...
}

So the compilation fails with: 

test.c:3:21: error: macro "online_cpus" passed 1 arguments, but takes just 0
 int online_cpus(void)

That's a tiny issue, but maybe we could apply a straight-forward solution (see 
attached diff)? If you agree I'll prepare a properly-formatted [PATCH] submit.


--
Best Regards,
Victordiff --git a/thread-utils.c b/thread-utils.c
index a2135e0..f3e90fb 100644
--- a/thread-utils.c
+++ b/thread-utils.c
@@ -20,6 +20,7 @@
 
 int online_cpus(void)
 {
+#ifndef NO_PTHREADS
 #ifdef _SC_NPROCESSORS_ONLN
 	long ncpus;
 #endif
@@ -58,11 +59,13 @@ int online_cpus(void)
 		return (int)ncpus;
 #endif
 
+#endif
 	return 1;
 }
 
 int init_recursive_mutex(pthread_mutex_t *m)
 {
+#ifndef NO_PTHREADS
 	pthread_mutexattr_t a;
 	int ret;
 
@@ -74,4 +77,7 @@ int init_recursive_mutex(pthread_mutex_t *m)
 		pthread_mutexattr_destroy();
 	}
 	return ret;
+#else
+	return 0;
+#endif
 }
diff --git a/thread-utils.h b/thread-utils.h
index d9a769d..6fb98c3 100644
--- a/thread-utils.h
+++ b/thread-utils.h
@@ -7,9 +7,5 @@
 extern int online_cpus(void);
 extern int init_recursive_mutex(pthread_mutex_t*);
 
-#else
-
-#define online_cpus() 1
-
 #endif
 #endif /* THREAD_COMPAT_H */


RE: [PATCH] git-svn: make batch mode optional for git-cat-file

2015-10-11 Thread Victor Leschuk
Hello Eric,

Thanks for all the advices. I have played with several repositories (both on 
32bit and 64bit machines). You were correct most of the memory if used by 
mapped files and yes it doesn't cause any problems, even a 32bit machine with 
500Mb of memory works normally with a heavy loaded git-cat-file.

Thanks also for the advice to use git gc config options, I tested gc.auto=0 and 
it lead to the same behavior as my setting MALLOC_LIMIT, however it is more 
correct way to get this effect =)

I agree that we shouldn't worry about mapped files.

--
Best Regards,
Victor

From: Eric Wong [normalper...@yhbt.net]
Sent: Wednesday, September 23, 2015 12:22 PM
To: Victor Leschuk
Cc: Junio C Hamano; git@vger.kernel.org
Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file

Victor Leschuk <vlesc...@accesssoftek.com> wrote:
> Hello Eric, thanks for looking into it.
>
> >> git-cat-file has outgrown the parent perl process several times
> >> (git-cat-file - ~3-4Gb, perl - 400Mb).
>
> > Ugh, that sucks.
> > Even the 400Mb size of Perl annoys me greatly and I'd work
> > on fixing it if I had more time.
>
> I was going to look at this problem also, but first I'd like to improve the 
> situation with cat-file as on large repos it is larger problem. By the way, 
> what direction would you suggest to begin with?

See below :)



> > > git-cat-file has outgrown the parent perl process several times
> > > (git-cat-file - ~3-4Gb, perl - 400Mb).
>
> > How much of that is anonymous memory, though?
>
> Haven't measured on this particular repo: didn't redo the 2 week
> experiment =) However I checked on a smaller test repo and anon memory
> is about 12M out of 40M total. Most of memory is really taken by
> mmaped *.pack and *idx files.

If it's mmap-ed files, that physical memory is only used on-demand
and can be dropped at any time because it's backed by disk.

In other words, I would not worry about any file-backed mmap at all
(unless you're on 32-bit, but I think git has workarounds for that)

Do you still have that giant repo around?

Are the combined size of the pack + idx files are at least 3-4 GB?

This should cat all the blobs in history without re-running git-svn:

git log --all --raw -r --no-abbrev | \
  awk '/^:/ {print $3; print $4}' | git cat-file --batch

git log actually keeps growing, but the cat-file process shouldn't
use anonymous memory much if you inspect it with pmap.

> Actually I accidentally found out that if I export GIT_MALLOC_LIMIT
> variable set to several megabytes it has the following effect:

>  * git-svn.perl launches git-gc
>  * git-gc can't allocate enough memory and thus doesn't create any pack files
>  * git-cat-file works only with pure blob object, not packs, and it's
> memory usage doesn't grow larger than 4-5M
>
> It gave me a thought that maybe we could get rid of "git gc" calls
> after each commit in perl code and just perform one large gc operation
> at the end. It will cost disk space during clone but save us memory.
> What do you think?

You can set gc.auto to zero in your $GIT_CONFIG to disable gc.
The "git gc" calls were added because unpacked repos were growing
too large and caused problems for other people.

Perhaps play with some other pack* options documented in
Documentation/config to limit maximum pack size/depth.

Is this a 32-bit or 64-bit system?

> As for your suggestion regarding periodic restart of batch process
> inside git-cat-file, I think we could give it a try, I can prepare a
> patch and run some tests.

I am not sure if we need it for git-svn.

In another project, the only reason I've found to restart
"cat-file --batch" is in case the repo got repacked and old packs
got unlinked, cat-file would hold a reference onto the old file
and suck up space.   It might be better if "cat-file --batch" learned
to detect unlinked files and then munmap + close them.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] git-svn: make batch mode optional for git-cat-file

2015-09-23 Thread Victor Leschuk
Hello Eric, thanks for looking into it.

>> git-cat-file has outgrown the parent perl process several times
>> (git-cat-file - ~3-4Gb, perl - 400Mb).

> Ugh, that sucks.
> Even the 400Mb size of Perl annoys me greatly and I'd work
> on fixing it if I had more time.

I was going to look at this problem also, but first I'd like to improve the 
situation with cat-file as on large repos it is larger problem. By the way, 
what direction would you suggest to begin with?

> A few more questions:

> * What is the largest file that existed in that repo?

About 2.5M

> * Did you try "MALLOC_MMAP_THRESHOLD_" with glibc?

Have just tried it on a smaller repo (which takes about 1 hour to clone and RSS 
grows from 4M to 40M during the process. Unfortunately there is no much of an 
effect: max RSS is 41M with default settings and 38M with 
MALLOC_MMAP_THRESHOLD_=131072.

> If alloc.c is the culprit, I would consider to transparently restart
"cat-file --batch" once it grows to a certain size or after a certain
number of requests are made to it.

alloc.c interface is not used in cat-file at all, only direct calls to xmalloc 
and xrealloc from wrapper.c, and also xmmap() from sha1_file.c.

> > git-cat-file has outgrown the parent perl process several times
> > (git-cat-file - ~3-4Gb, perl - 400Mb).

> How much of that is anonymous memory, though?

Haven't measured on this particular repo: didn't redo the 2 week experiment =) 
However I checked on a smaller test repo and anon memory is about 12M out of 
40M total. Most of memory is really taken by mmaped *.pack and *idx files.

Actually I accidentally found out that if I export GIT_MALLOC_LIMIT variable 
set to several megabytes it has the following effect:
 * git-svn.perl launches git-gc
 * git-gc can't allocate enough memory and thus doesn't create any pack files
 * git-cat-file works only with pure blob object, not packs, and it's memory 
usage doesn't grow larger than 4-5M

It gave me a thought that maybe we could get rid of "git gc" calls after each 
commit in perl code and just perform one large gc operation at the end. It will 
cost disk space during clone but save us memory. What do you think?

As for your suggestion regarding periodic restart of batch process inside 
git-cat-file, I think we could give it a try, I can prepare a patch and run 
some tests.

--
Best Regards,
Victor

From: Eric Wong [normalper...@yhbt.net]
Sent: Tuesday, September 22, 2015 5:35 PM
To: Victor Leschuk
Cc: Junio C Hamano; git@vger.kernel.org
Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file

Eric Wong <normalper...@yhbt.net> wrote:
> Victor Leschuk <vlesc...@accesssoftek.com> wrote:
> > The thing is that git-cat-file keeps growing during work when running
> > in "batch" mode. See the figure attached: it is for cloning a rather
> > small repo (1 hour to clone about ~14000 revisions). However the clone
> > of a large repo (~28 revisions) took about 2 weeks and
> > git-cat-file has outgrown the parent perl process several times
> > (git-cat-file - ~3-4Gb, perl - 400Mb).

How much of that is anonymous memory, though?
(pmap $PID_OF_GIT_CAT_FILE)

Running the following on the Linux kernel tree I had lying around:

(for i in $(seq 100 200); do git ls-files | sed -e "s/^/HEAD~$i:/"; done)|\
  git cat-file --batch >/dev/null

Reveals about 510M RSS in top, but pmap says less than 20M of that
is anonymous.  So the rest are mmap-ed packfiles; that RSS gets
transparently released back to the kernel under memory pressure.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git-svn: Why not use git-fast-import?

2015-09-22 Thread Victor Leschuk
Hello all, 

I've been playing with git-svn for some time already and as it seems to me 
there are two most important problems which make it hard to use in production 
for large repositories. Very low speed and large memory footprint of 
synchronization with SVN repos (I am talking about clone and fetch operations 
mostly). I was wondering why the git-fast-import is not used for these 
purposes? Are there any serious limitations which make it impossible?

I have found several alternative solutions which use git-fast-import but they 
all do only the initial import of a repo. I have looked through the 
documentation and didn't see why fast-import can't be used to sync an existing 
repo after the import. Am I missing something?

Thanks in advance for clarification.

--
Best Regards,
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] git-svn: make batch mode optional for git-cat-file

2015-09-22 Thread Victor Leschuk
As for your remark regarding the option naming: 

> An option whose name begins with no- looks somewhat strange.  You
can even say --no-no-cat-file-batch from the command line, I
suspect.

We already do have some of these: 'no-metadata', 'no-checkout', 
'no-auth-cache'. So I was just following the existing convention. Do you think 
we need to change it and stick with --catch-file-batch=1/--cat-file-batch=0 ?

--
Best Regards,
Victor

From: Victor Leschuk
Sent: Monday, September 21, 2015 3:03 PM
To: Junio C Hamano
Cc: git@vger.kernel.org
Subject: RE: [PATCH] git-svn: make batch mode optional for git-cat-file

Hello Junio,

thanks for your review. First of all I'd like to apologize for sending the 
patch without description. Actually I was in a hurry and sent it by accident: I 
planned to edit the mail before sending...

Here is the detailed description:

Last week we had a quick discussion in this mailing list: 
http://thread.gmane.org/gmane.comp.version-control.git/278021 .

The thing is that git-cat-file keeps growing during work when running in 
"batch" mode. See the figure attached: it is for cloning a rather small repo (1 
hour to clone about ~14000 revisions). However the clone of a large repo 
(~28 revisions) took about 2 weeks and git-cat-file has outgrown the parent 
perl process several times (git-cat-file - ~3-4Gb, perl - 400Mb).

What was done:
 * I have run it under valgrind and mtrace and haven't found any memory leaks
 * Found the source of most number of memory reallocations 
(batch_object_write() function (strbuf_expand -> realloc)) - tried to make the 
streambuf object static and avoid reallocs - didn't help
 * Tried preloading other allocators than standard glibc - no significant 
difference

After that I replaced the batch mode with separate cat-file calls for each blob 
and it didn't have any impact on clone performance on real code repositories. 
However I created a fake test repo with large number of small files (~10 bytes 
each): here is how I created it https://bitbucket.org/vleschuk/svngenrepo

And on this artificial test repo it really slowed down the process. So I 
decided to suggest to make the batch mode optional to let the user "tune" the 
process and created a patch for this.

As for your code-style notes, I agree with them, will adjust the code.

--
Best Regards,
Victor

From: Junio C Hamano [jch2...@gmail.com] On Behalf Of Junio C Hamano 
[gits...@pobox.com]
Sent: Monday, September 21, 2015 11:25 AM
To: Victor Leschuk
Cc: git@vger.kernel.org; Victor Leschuk
Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file

Victor Leschuk <vlesc...@gmail.com> writes:

> Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
> ---

Before the S-o-b line is a good place to explain why this is a good
change to have.  Please use it.

>  git-svn.perl |  1 +
>  perl/Git.pm  | 41 -
>  2 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/git-svn.perl b/git-svn.perl
> index 36f7240..b793c26 100755
> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -139,6 +139,7 @@ my %fc_opts = ( 'follow-parent|follow!' => 
> \$Git::SVN::_follow_parent,
>   'use-log-author' => \$Git::SVN::_use_log_author,
>   'add-author-from' => \$Git::SVN::_add_author_from,
>   'localtime' => \$Git::SVN::_localtime,
> + 'no-cat-file-batch' => sub { $Git::no_cat_file_batch = 1; },

An option whose name begins with no- looks somewhat strange.  You
can even say --no-no-cat-file-batch from the command line, I
suspect.

Why not give an option 'cat-file-batch' that sets the variable
$Git::cat_file_batch to false, and initialize the variable to true
to keep existing users who do not pass the option happy?

>   %remote_opts );
>
>  my ($_trunk, @_tags, @_branches, $_stdlayout);
> diff --git a/perl/Git.pm b/perl/Git.pm
> index 19ef081..69e5293 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -107,6 +107,7 @@ use Fcntl qw(SEEK_SET SEEK_CUR);
>  use Time::Local qw(timegm);
>  }
>
> +our $no_cat_file_batch = 0;
>
>  =head1 CONSTRUCTORS
>
> @@ -1012,6 +1013,10 @@ returns the number of bytes printed.
>  =cut
>
>  sub cat_blob {
> + (1 == $no_cat_file_batch) ? _cat_blob_cmd(@_) : _cat_blob_batch(@_);

Discard "1 ==" here.  You are clearly using the variable as a
boolean, so writing this as

$cat_file_batch ? _cat_blob_batch(@_) : _cat_blob_cmd(@_);

or better yet

if ($cat_file_batch) {
_cat_blob_batch(@_);
} else {
_cat_blob_cmd(@_);
}

would be more natural.

> +}
> +
> +sub _cat_blob_batch {
>   my ($self, $sha1, $fh) = @_;
>
>   $self->_open_cat_blob_if_n

[PATCH] git-svn: make batch mode optional for git-cat-file

2015-09-21 Thread Victor Leschuk

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 git-svn.perl |  1 +
 perl/Git.pm  | 41 -
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/git-svn.perl b/git-svn.perl
index 36f7240..b793c26 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -139,6 +139,7 @@ my %fc_opts = ( 'follow-parent|follow!' => \$Git::SVN::_follow_parent,
 		'use-log-author' => \$Git::SVN::_use_log_author,
 		'add-author-from' => \$Git::SVN::_add_author_from,
 		'localtime' => \$Git::SVN::_localtime,
+		'no-cat-file-batch' => sub { $Git::no_cat_file_batch = 1; },
 		%remote_opts );
 
 my ($_trunk, @_tags, @_branches, $_stdlayout);
diff --git a/perl/Git.pm b/perl/Git.pm
index 19ef081..69e5293 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -107,6 +107,7 @@ use Fcntl qw(SEEK_SET SEEK_CUR);
 use Time::Local qw(timegm);
 }
 
+our $no_cat_file_batch = 0;
 
 =head1 CONSTRUCTORS
 
@@ -1012,6 +1013,10 @@ returns the number of bytes printed.
 =cut
 
 sub cat_blob {
+	(1 == $no_cat_file_batch) ? _cat_blob_cmd(@_) : _cat_blob_batch(@_);
+}
+
+sub _cat_blob_batch {
 	my ($self, $sha1, $fh) = @_;
 
 	$self->_open_cat_blob_if_needed();
@@ -1072,7 +1077,7 @@ sub cat_blob {
 sub _open_cat_blob_if_needed {
 	my ($self) = @_;
 
-	return if defined($self->{cat_blob_pid});
+	return if ( defined($self->{cat_blob_pid}) || 1 == $no_cat_file_batch );
 
 	($self->{cat_blob_pid}, $self->{cat_blob_in},
 	 $self->{cat_blob_out}, $self->{cat_blob_ctx}) =
@@ -1090,6 +1095,40 @@ sub _close_cat_blob {
 	delete @$self{@vars};
 }
 
+sub _cat_blob_cmd {
+	my ($self, $sha1, $fh) = @_;
+
+	my $size = $self->command_oneline('cat-file', '-s', $sha1);
+
+	if (!defined $size) {
+		carp "cat-file couldn't detect object size";
+		return -1;
+	}
+
+	my ($in, $c) = $self->command_output_pipe('cat-file', 'blob', $sha1);
+
+	my $blob;
+	my $bytesLeft = $size;
+
+	while (1) {
+		last unless $bytesLeft;
+
+		my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
+		my $read = read($in, $blob, $bytesToRead);
+		unless (defined($read)) {
+			$self->command_close_pipe($in, $c);
+			throw Error::Simple("in pipe went bad");
+		}
+		unless (print $fh $blob) {
+			$self->command_close_pipe($in, $c);
+			throw Error::Simple("couldn't write to passed in filehandle");
+		}
+		$bytesLeft -= $read;
+	}
+
+	$self->command_close_pipe($in, $c);
+	return $size;
+}
 
 =item credential_read( FILEHANDLE )
 


RE: [PATCH] git-svn: make batch mode optional for git-cat-file

2015-09-21 Thread Victor Leschuk
Hello Junio,

thanks for your review. First of all I'd like to apologize for sending the 
patch without description. Actually I was in a hurry and sent it by accident: I 
planned to edit the mail before sending... 

Here is the detailed description: 

Last week we had a quick discussion in this mailing list: 
http://thread.gmane.org/gmane.comp.version-control.git/278021 .

The thing is that git-cat-file keeps growing during work when running in 
"batch" mode. See the figure attached: it is for cloning a rather small repo (1 
hour to clone about ~14000 revisions). However the clone of a large repo 
(~28 revisions) took about 2 weeks and git-cat-file has outgrown the parent 
perl process several times (git-cat-file - ~3-4Gb, perl - 400Mb).

What was done:
 * I have run it under valgrind and mtrace and haven't found any memory leaks
 * Found the source of most number of memory reallocations 
(batch_object_write() function (strbuf_expand -> realloc)) - tried to make the 
streambuf object static and avoid reallocs - didn't help
 * Tried preloading other allocators than standard glibc - no significant 
difference

After that I replaced the batch mode with separate cat-file calls for each blob 
and it didn't have any impact on clone performance on real code repositories. 
However I created a fake test repo with large number of small files (~10 bytes 
each): here is how I created it https://bitbucket.org/vleschuk/svngenrepo

And on this artificial test repo it really slowed down the process. So I 
decided to suggest to make the batch mode optional to let the user "tune" the 
process and created a patch for this. 

As for your code-style notes, I agree with them, will adjust the code.

--
Best Regards,
Victor

From: Junio C Hamano [jch2...@gmail.com] On Behalf Of Junio C Hamano 
[gits...@pobox.com]
Sent: Monday, September 21, 2015 11:25 AM
To: Victor Leschuk
Cc: git@vger.kernel.org; Victor Leschuk
Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file

Victor Leschuk <vlesc...@gmail.com> writes:

> Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
> ---

Before the S-o-b line is a good place to explain why this is a good
change to have.  Please use it.

>  git-svn.perl |  1 +
>  perl/Git.pm  | 41 -
>  2 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/git-svn.perl b/git-svn.perl
> index 36f7240..b793c26 100755
> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -139,6 +139,7 @@ my %fc_opts = ( 'follow-parent|follow!' => 
> \$Git::SVN::_follow_parent,
>   'use-log-author' => \$Git::SVN::_use_log_author,
>   'add-author-from' => \$Git::SVN::_add_author_from,
>   'localtime' => \$Git::SVN::_localtime,
> + 'no-cat-file-batch' => sub { $Git::no_cat_file_batch = 1; },

An option whose name begins with no- looks somewhat strange.  You
can even say --no-no-cat-file-batch from the command line, I
suspect.

Why not give an option 'cat-file-batch' that sets the variable
$Git::cat_file_batch to false, and initialize the variable to true
to keep existing users who do not pass the option happy?

>   %remote_opts );
>
>  my ($_trunk, @_tags, @_branches, $_stdlayout);
> diff --git a/perl/Git.pm b/perl/Git.pm
> index 19ef081..69e5293 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -107,6 +107,7 @@ use Fcntl qw(SEEK_SET SEEK_CUR);
>  use Time::Local qw(timegm);
>  }
>
> +our $no_cat_file_batch = 0;
>
>  =head1 CONSTRUCTORS
>
> @@ -1012,6 +1013,10 @@ returns the number of bytes printed.
>  =cut
>
>  sub cat_blob {
> + (1 == $no_cat_file_batch) ? _cat_blob_cmd(@_) : _cat_blob_batch(@_);

Discard "1 ==" here.  You are clearly using the variable as a
boolean, so writing this as

$cat_file_batch ? _cat_blob_batch(@_) : _cat_blob_cmd(@_);

or better yet

if ($cat_file_batch) {
_cat_blob_batch(@_);
} else {
_cat_blob_cmd(@_);
}

would be more natural.

> +}
> +
> +sub _cat_blob_batch {
>   my ($self, $sha1, $fh) = @_;
>
>   $self->_open_cat_blob_if_needed();
> @@ -1072,7 +1077,7 @@ sub cat_blob {
>  sub _open_cat_blob_if_needed {
>   my ($self) = @_;
>
> - return if defined($self->{cat_blob_pid});
> + return if ( defined($self->{cat_blob_pid}) || 1 == $no_cat_file_batch );

Likewise.

return if (!$cat_file_batch);
return if defined($self->{cat_blob_pid});

> +sub _cat_blob_cmd {
> + my ($self, $sha1, $fh) = @_;
> +...

The biggest thing that is missing from this patch is the explanation
of why this is a good thing to do.  The batch interface was invented
because people found that it was wasteful to spawn a new cat-file
process e

git-svn: cat-file memory usage

2015-09-16 Thread Victor Leschuk
Hello all,

We are currently getting acquainted with git-svn tool and have experienced few 
problems with it. The main issue is memory usage during "git svn clone": on 
large repositories the perl and git processes are using significant amount of 
memory. 

I have conducted several tests with different repositories. I have created 
mirrors of Trac project (http://trac.edgewall.org/ - rather small repo, ~14000 
commits) and FreeBSD base repo (~28 commits). Here is the summary of my 
tests (to eliminate network issues all clones were performed for file:// repos):

 * git svn clone of trac  takes about 1 hour 
 * git svn clone of FreeBSD has already taken more than 3 days and still 
running (currently has cloned about 40% of revisions)
 * git cat-file process memory footprint keeps growing during the clone process 
(see figure attached)

The main issue here is git cat-file consuming memory. The attached figure is 
for small repository which takes about an hour to clone, however on my another 
machine where FreeBSD clone is currently running the git cat-file has already 
taken more than 1Gb of memory (RSS) and has overgrown the parent perl process 
(~300-400 Mb). 

I have valgrind'ed the git-cat-file (which is running is --batch mode during 
the whole clone) and found no serious leaks (about 100 bytes definitely 
leaked), so all memory is carefully freed, but the heap usage grows maybe due 
to fragmentation or smth else. When I looked through the code I found out that 
most of heap allocations are called from batch_object_write() function 
(strbuf_expand -> realloc).

So I have found two possible workarounds for the issue: 

 * Set GIT_ALLOC_LIMIT variable - it does reduce the memory footprint but slows 
down the process
 * In perl code do not run git cat-file in batch mode (in 
Git::SVN::apply_textdelta) but rather run it as separate commands each time

   my $size = $self->command_oneline('cat-file', '-s', $sha1);
   # .
   my ($in, $c) = $self->command_output_pipe('cat-file', 'blob', $sha1);

The second approach doesn't slow down the whole process at all (~72 minutes to 
clone repo both with --batch mode and without).

So the question is: what would be the correct approach to fight the problem 
with cat-file memory usage: maybe we should get rid of batch mode in perl code, 
or somehow tune allocation policy in C code?

Please let me know your thoughts. 

--
Best Regards,
Victor Leschuk

RE: git-svn: cat-file memory usage

2015-09-16 Thread Victor Leschuk
Hello Jeff, thanks for the advice.

Unfortunately using patch didn't change the situation. I will run some tests 
with alternate allocators (looking at jemalloc and tcmalloc). As for alternate 
tools: as far as I understood svn2git calls 'git svn' itself. So I assume it 
can't fix the memory usage or speed up clone process... Correct me if I'm wrong.

Reposurgeon looks interesting... Will give it a try. 

Btw, what do you think of getting rid of batch mode for clone/fetch in perl 
code. It really hardly has any impact on performance but reduces memory usage a 
lot.

--
Best Regards,
Victor

From: Jeff King [p...@peff.net]
Sent: Wednesday, September 16, 2015 4:56 AM
To: Victor Leschuk
Cc: git@vger.kernel.org
Subject: Re: git-svn: cat-file memory usage

On Wed, Sep 16, 2015 at 04:00:48AM -0700, Victor Leschuk wrote:

>  * git svn clone of trac  takes about 1 hour
>  * git svn clone of FreeBSD has already taken more than 3 days and
>  still running (currently has cloned about 40% of revisions)

I haven't worked with git-svn in a long time, but I doubt that it is the
fastest way to do a large repository import. You might want to look into
a tool like svn2git or reposurgeon to do the initial import.

> I have valgrind'ed the git-cat-file (which is running is --batch mode
> during the whole clone) and found no serious leaks (about 100 bytes
> definitely leaked), so all memory is carefully freed, but the heap
> usage grows maybe due to fragmentation or smth else. When I looked
> through the code I found out that most of heap allocations are called
> from batch_object_write() function (strbuf_expand -> realloc).

Certainly we will call strbuf_expand once per object. I would have
expected we would call read_sha1_file(), too. It looks like we always
try to stream blobs, but I think we have to fall back to reading the
whole object if there are deltas.

You can try this patch, which will reuse the same strbuf over and over:

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 07baad1..73f338c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -256,7 +256,7 @@ static void print_object_or_die(struct batch_options *opt, 
struct expand_data *d
 static void batch_object_write(const char *obj_name, struct batch_options *opt,
   struct expand_data *data)
 {
-   struct strbuf buf = STRBUF_INIT;
+   static struct strbuf buf = STRBUF_INIT;

if (sha1_object_info_extended(data->sha1, >info, 
LOOKUP_REPLACE_OBJECT) < 0) {
printf("%s missing\n", obj_name ? obj_name : 
sha1_to_hex(data->sha1));
@@ -264,10 +264,10 @@ static void batch_object_write(const char *obj_name, 
struct batch_options *opt,
return;
}

+   strbuf_reset();
strbuf_expand(, opt->format, expand_format, data);
strbuf_addch(, '\n');
batch_write(opt, buf.buf, buf.len);
-   strbuf_release();

if (opt->print_contents) {
print_object_or_die(opt, data);

That will reduce your reallocs due to strbuf_expand, though I'm doubtful
that it will solve the problem (and if it does, I think the right
solution is probably to look into using a better allocator than what
your system malloc() is providing).

>  * In perl code do not run git cat-file in batch mode (in
>  Git::SVN::apply_textdelta) but rather run it as separate commands
>  each time
>
>my $size = $self->command_oneline('cat-file', '-s', $sha1);
># .
>my ($in, $c) = $self->command_output_pipe('cat-file', 'blob', $sha1);
>
> The second approach doesn't slow down the whole process at all (~72
> minutes to clone repo both with --batch mode and without).

I'm surprised the startup cost of the process doesn't make an impact,
but maybe it gets lost in the noise of the rest of the work (AFAICT, the
point of this cat-file is to retrieve a blob, apply a delta to it, and
then write out the resulting object; that write is probably a lot more
expensive).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html